
QSAR ANALYSIS DO
Function:
To perform an analysis, which becomes the currently selected analysis.
Format:
QSAR ANALYSIS DO mode row_expr col_expr analysis_type [inputs] [tailor_options] new_analysis_name where [{batch_options}]
Arguments:
- mode = INTERACTIVE or BATCH
- row_expr = rows to analyze
- col_expr = column expression (all columns you plan to use later must be listed here)
- analysis_type =
- inputs = specific for analysis_type
- tailor_options = specific for analysis_type. These are listed on the terminal and can be modified until you type the end-loop character (|).
- new_analysis_name = unique name for the file into which results are saved. This name is requested after the analysis is complete. Entering the end-loop character (|) deletes the analysis instead of saving it.
- where = name of the machine or batch queue on which the batch job will run
- batch_options = batch-specific options, see the NETBATCH SUBMIT command for details
Remarks:
In INTERACTIVE mode, the new name is requested once the analysis is complete and becomes the selected analysis. In batch mode, the currently selected analysis is not affected.
QSAR ANALYSIS DO FACTOR
Function:
To seek the simplest possible linear expression equivalent to the data table, as two matrices, scores and loadings, whose multiplication yields the original data values.
Menubar:
MSS: QSAR >>> Factor Analysis...
Format:
QSAR ANALYSIS DO mode row_expr col_expr FACTOR_ANALYSIS [tailor_options] ...
Arguments:
- mode = INTERACTIVE or BATCH
- row_expr = rows to analyze
- col_expr = column expression (all columns you plan to use later must be listed here)
- tailor_options =
- BOOTSTRAPPING = number of random samples to draw to estimate the final model parameter stability
- COMPONENTS = number of factors to extract. If 0, all possible factors are extracted.
- FACTORING_METHOD =
- NIPALS = much faster method, especially on larger data sets
- DIAGONALIZATION = the conventional method
- ROTATION_OF_COMPONENTS = QUARTIMAX, VARIMAX, NONE. When set to NONE, FACTOR performs a principal components analysis.
- SCALING_METHOD =
- AUTOSCALE = every column is placed on an equal footing
- COMFA_STD = considers a field as a whole (it is not appropriate to scale individual field values)
- NONE = no scaling method
- USER = weights for field and columns are selected
Remarks:
The scores may be considered as a transformation of the original data so that as much as possible of its variance has moved to the first few columns, or factors.
Tailor Variables:
You may alter the characteristics of the analysis via the commands:
Additional Information:
QSAR ANALYSIS DO HIERARCHICAL
Function:
To seek a clustering based on distance between rows, with results available as a dendrogram showing the hierarchy of clusters at increasing level of detail.
Menubar:
MSS: QSAR >>> Hierarchical Clustering...
Format:
QSAR ANALYSIS DO mode row_expr col_expr HIERARCHICAL [tailor options] ...
Arguments:
- mode = INTERACTIVE or BATCH
- row_expr = rows to analyze
- col_expr = column expression (all columns you plan to use later must be listed here)
- tailor options =
- CLUSTERING_METHOD = how to identify clusters: AVERAGE, COMPLETE, MEDIAN or SINGLE
- INPUT_FORM = whether the table itself represents a NORMAL_TABLE or (if square) is already a DISTANCE_MATRIX
- LEVEL_FOR_COLUMNS <5>
- MOD_ANGLES = NO/YES, whether columns are angles
- SCALING_METHOD =
- AUTOSCALE = every column is placed on an equal footing
- COMFA_STD = considers field as a whole (it is not appropriate to scale individual field values)
- NONE = no scaling
- USER_SPECIFIED = weights for field and columns will be prompted for
Remarks:
If the command TAILOR SET QSAR ANALYSIS_OUTPUT has been set to COLUMN_ONLY, a cluster column representing a "slice" of the dendrogram at the level of the variable set by TAILOR SET HIER LEVEL_FOR_COLUMNS will be appended to the table, and the analysis file will be deleted. This conserves memory at the cost of discarding potentially useful information. It should be avoided unless a large number of redundant analyses are going to be performed.
Note: In the Hierarchical Clustering Analysis Dialog, this only takes effect if the more powerful interactive Output Clusters to Spreadsheet... check box is off; if this check box is on, direct output to the table is overridden and the analysis file is retained.
Additional Information:
QSAR ANALYSIS DO JARVIS-PATRICK
Function:
To cluster a set of compounds (or other objects represented by rows in a SYBYL table) based on the number of nearest neighbors held in common among the different rows. The descriptor columns you select determine the distances used to evaluate what "nearest" means. The algorithm compares the K nearest neighbors for each row to the K nearest neighbors for every other row. Those pairs of rows which have each other as neighbors and which have more neighbors in common than the given threshold k (k < or = K) are put into the same cluster.
Note: Jarvis-Patrick clustering requires a Selector license.
Menubar:
MSS: QSAR >>> Jarvis-Patrick Clustering
MSS: Edit >>> Select Rows >>> Selector...
Then set the Form Diversity Clusters by option menu to Reciprocal NN.
Format
QSAR ANALYSIS DO mode row_expr col_expr [weight_col] JARVIS-PATRICK #nearest_neighbors similarity_threshold [tailor_options] ...
Arguments
- mode = INTERACTIVE or BATCH
- row_expr = rows to analyze
- col_expr = column expression (all columns you plan to use later must be listed here)
- weight_col = column which contains the weighting values. This argument is used only if the command TAILOR SET QSAR_ANALYSIS WEIGHT_ROWS is set to WEIGHT_BY_COLUMN_VALUE.
- #nearest_neighbors (K) = how far out the nearest neighbor list is to extend.
- similarity_threshold (k) = how many neighbors (or votes, if the command TAILOR SET JARVIS WEIGHTED_VOTING is set to YES) have to be shared for a given row pair to be clustered together. As this approaches K, more singletons are generated and clusters become more compact.
- tailor_options =
- SCALING_METHOD =
- AUTOSCALE = every column is placed on an equal footing
- COMFA_STD = considers a field as a whole (it is not appropriate to scale individual field values)
- NONE = no scaling method
- USER = weights for field and columns are selected
Remarks
Additional Information:
QSAR ANALYSIS DO PLS
Function:
To seek a linear expression relating the column variance in the target properties (Y-block) to the variations in explanatory properties (X-block) so as to minimize the sum of squares of deviations from the model produced. PLS may be regarded as a superset of conventional ("ordinary") least squares regression.
Menubar:
MSS: QSAR >>> Partial Least Squares...
Format:
QSAR ANALYSIS DO mode row_expr col_expr PLS [weight_col] dependent_col [tailor_options] ...
Arguments:
- mode = INTERACTIVE or BATCH
- row_expr = rows to analyze
- col_expr = column expression (all columns you plan to use later must be listed here)
- weight_col = column which contains the weighting values. This argument is used only if the command TAILOR SET QSAR_ANALYSIS WEIGHT_ROWS is set to WEIGHT_BY_COLUMN_VALUE.
- dependent_col = a subset of the original col_expr
- tailor_options =
- BOOTSTRAPPING = number of random samples to draw to estimate the final model parameter stability
- CENTERING = NO/YES, whether to force the intercept (constant, offset) of the QSAR to have a value of 0.0
- COMPONENTS = model complexity. For the final model, redo with COMPONENTS set to the optimal number of components identified through the crossvalidated runs.
- CROSSVALIDATION = number of subgroups to draw to test model predictive power; use several crossvalidation groups to determine the optimal number of components. Unless the table is very large (such as CoMFA fields), set this value to the number of rows. For the final model, use ANALYSIS REDO with crossvalidation = 0.
- EPSILON = convergence criterion
- ITERATION = maximum number of iterations PLS performs
- SCALING_METHOD =
- AUTOSCALE = every column is placed on an equal footing
- COMFA_STD = considers a field as a whole (it is not appropriate to scale individual field values)
- NONE = no scaling method
- USER = weights for field and columns are selected
Remarks:
Some settings as well as the number of selected columns and rows affect memory requirements and the runtime. As a guide, the product of COMPONENTS, CROSSVALIDATION, # rows and # columns after any MINIMUM_SIGMA correction (if greater than 10000), are written out to the text window. The amount of time an analysis takes depends strongly on the local system configuration.
Note that the tailor variable MINIMUM_SIGMA is accessible via the command TAILOR SET QSAR_ANALYSIS.
The results printed after crossvalidation include the optimal number of components, which corresponds to the maximum value of the crossvalidated R squared (also printed). The residuals and predicted values are those of the COMPONENTS setting, and generally not the same as the optimal number of components.
Additional Information:
QSAR ANALYSIS DO RNN
Function:
RNN (Reciprocal Nearest Neighbors) is used by Selector to do rapid and memory-efficient clustering analyses. Instead of working from a complete distance matrix, which would require 8xN2 bytes of memory, it builds up a dendrogram by fusion of subgraphs built from each row's nearest neighbor. Reciprocal nearest neighbors (i.e., rows which have each other's nearest neighbor) are combined to form new clusters. Distances are then recalculated between the new row generated and other rows in the table, to see if new pairs of reciprocal nearest neighbors are produced. Clustering halts when the specified level of clustering has been reached. Output is as an column of cluster IDs appended to the default table.
Note: Use of this option requires a Selector license.
Menubar:
MSS: Edit >>> Select Rows >>> Selector...
Then set the Form Diversity Clusters by option menu to Reciprocal NN.
Format:
QSAR ANALYSIS DO mode row_expr col_expr RNN
Arguments:
- mode = INTERACTIVE or BATCH
- row_expr = rows to analyze
- col_expr = column expression (all columns you plan to use later must be listed here)
Additional Information:
QSAR ANALYSIS DO SIMCA
Function:
To develop a model which relates column data to known categories in a table based on calculating a separate principle component for each category.
Menubar:
MSS: QSAR >>> SIMCA Analysis...
Format:
QSAR ANALYSIS DO mode row_expr col_expr SIMCA_ANALYSIS category_col [tailor_options] ...
Arguments:
- mode = INTERACTIVE or BATCH
- row_expr = rows to analyze
- col_expr = column expression (all columns you plan to use later must be listed here)
- category_col = a single column which contains the integer category designations for each row, must be a subset of the original col_expr
- tailor_options =
- SCALING_METHOD =
- AUTOSCALE = every column is placed on an equal footing
- COMFA_STD = considers a field as a whole (It is not appropriate to scale individual field values.)
- NONE = no scaling applied
Remarks:
The results which appear after a SIMCA run shows the correct number of rows predicted for each category. This may be used to estimate the accuracy of the model.
Additional Information:


Copyright © 1999, Tripos Inc. All rights
reserved.