QSAR ANALYSIS DO

Page 181 out of 225 pages in this book.
Tripos Bookshelf > QSAR > QSAR Command > QSAR ANALYSIS

QSAR ANALYSIS DO

Function:

To perform an analysis, which becomes the currently selected analysis.

Format:

QSAR ANALYSIS DO mode row_expr col_expr analysis_type [inputs] [tailor_options] new_analysis_name where [{batch_options}]

Arguments:

mode = INTERACTIVE or BATCH
row_expr = rows to analyze
col_expr = column expression (all columns you plan to use later must be listed here)
analysis_type =

FACTOR
HIERARCHICAL
JARVIS-PATRICK

PLS
RNN
SIMCA
inputs = specific for analysis_type
tailor_options = specific for analysis_type. These are listed on the terminal and can be modified until you type the end-loop character (|).
new_analysis_name = unique name for the file into which results are saved. This name is requested after the analysis is complete. Entering the end-loop character (|) deletes the analysis instead of saving it.
where = name of the machine or batch queue on which the batch job will run
batch_options = batch-specific options, see the NETBATCH SUBMIT command for details

`FACTOR`	`HIERARCHICAL`	`JARVIS-PATRICK`
`PLS`	`RNN`	`SIMCA`

Remarks:

INTERACTIVE

QSAR ANALYSIS DO FACTOR

Function:

To seek the simplest possible linear expression equivalent to the data table, as two matrices, scores and loadings, whose multiplication yields the original data values.

Menubar:

MSS: QSAR >>> Factor Analysis...

Format:

QSAR ANALYSIS DO mode row_expr col_expr FACTOR_ANALYSIS [tailor_options] ...

Arguments:

mode = INTERACTIVE or BATCH
row_expr = rows to analyze
col_expr = column expression (all columns you plan to use later must be listed here)
tailor_options =
- BOOTSTRAPPING = number of random samples to draw to estimate the final model parameter stability
- COMPONENTS = number of factors to extract. If 0, all possible factors are extracted.
- FACTORING_METHOD =
  - NIPALS = much faster method, especially on larger data sets
  - DIAGONALIZATION = the conventional method
- ROTATION_OF_COMPONENTS = QUARTIMAX, VARIMAX, NONE. When set to NONE, FACTOR performs a principal components analysis.
- SCALING_METHOD =
  - AUTOSCALE = every column is placed on an equal footing
  - COMFA_STD = considers a field as a whole (it is not appropriate to scale individual field values)
  - NONE = no scaling method
  - USER = weights for field and columns are selected

Remarks:

The scores may be considered as a transformation of the original data so that as much as possible of its variance has moved to the first few columns, or factors.

Tailor Variables:

You may alter the characteristics of the analysis via the commands:

Additional Information:

Factor Analysis Dialog.
TAILOR SET FACTOR command.
TAILOR SET QSAR_ANALYSIS command.
Principal Components Analysis Tutorial.
Theoretical introduction to Factor Analysis.
Suggested Reading list for further background information.

Back to QSAR ANALYSIS DO.

QSAR ANALYSIS DO HIERARCHICAL

Function:

To seek a clustering based on distance between rows, with results available as a dendrogram showing the hierarchy of clusters at increasing level of detail.

Menubar:

MSS: QSAR >>> Hierarchical Clustering...

Format:

QSAR ANALYSIS DO mode row_expr col_expr HIERARCHICAL [tailor options] ...

Arguments:

mode = INTERACTIVE or BATCH
row_expr = rows to analyze
col_expr = column expression (all columns you plan to use later must be listed here)
tailor options =
- CLUSTERING_METHOD = how to identify clusters: AVERAGE, COMPLETE, MEDIAN or SINGLE
- INPUT_FORM = whether the table itself represents a NORMAL_TABLE or (if square) is already a DISTANCE_MATRIX
- LEVEL_FOR_COLUMNS <5>
- MOD_ANGLES = NO/YES, whether columns are angles
- SCALING_METHOD =
  - AUTOSCALE = every column is placed on an equal footing
  - COMFA_STD = considers field as a whole (it is not appropriate to scale individual field values)
  - NONE = no scaling
  - USER_SPECIFIED = weights for field and columns will be prompted for

Remarks:

TAILOR SET QSAR ANALYSIS_OUTPUT

COLUMN_ONLY

TAILOR SET HIER LEVEL_FOR_COLUMNS

Note:

Hierarchical Clustering Analysis Dialog

Output Clusters to Spreadsheet...

Additional Information:

Hierarchical Clustering Analysis Dialog.
TAILOR SET HIER command.
TAILOR SET QSAR_ANALYSIS command.
Hierarchical Clustering Tutorial.
Clustering Methods Tutorial.
Hierarchical Analysis Tutorial.
Theoretical introduction to Hierarchical Cluster Analysis.
Suggested Reading list for further background information.

Back to QSAR ANALYSIS DO.

QSAR ANALYSIS DO JARVIS-PATRICK

Function:

and

Note:

Menubar:

MSS: QSAR >>> Jarvis-Patrick Clustering

MSS: Edit >>> Select Rows >>> Selector...

Form Diversity Clusters

Reciprocal NN

Format

QSAR ANALYSIS DO mode row_expr col_expr [weight_col] JARVIS-PATRICK #nearest_neighbors similarity_threshold [tailor_options] ...

Arguments

mode = INTERACTIVE or BATCH
row_expr = rows to analyze
col_expr = column expression (all columns you plan to use later must be listed here)
weight_col = column which contains the weighting values. This argument is used only if the command TAILOR SET QSAR_ANALYSIS WEIGHT_ROWS is set to WEIGHT_BY_COLUMN_VALUE.
#nearest_neighbors (K) = how far out the nearest neighbor list is to extend.
similarity_threshold (k) = how many neighbors (or votes, if the command TAILOR SET JARVIS WEIGHTED_VOTING is set to YES) have to be shared for a given row pair to be clustered together. As this approaches K, more singletons are generated and clusters become more compact.
tailor_options =
- SCALING_METHOD =
  - AUTOSCALE = every column is placed on an equal footing
  - COMFA_STD = considers a field as a whole (it is not appropriate to scale individual field values)
  - NONE = no scaling method
  - USER = weights for field and columns are selected

Remarks

TAILOR SET JARVIS WEIGHTED_VOTING

NO

Additional Information:

Jarvis-Patrick Clustering Dialog.
TAILOR SET JARVIS command.
TAILOR SET QSAR_ANALYSIS command.
Theoretical introduction to Jarvis-Patrick clustering.

Back to QSAR ANALYSIS DO.

QSAR ANALYSIS DO PLS

Function:

To seek a linear expression relating the column variance in the target properties (Y-block) to the variations in explanatory properties (X-block) so as to minimize the sum of squares of deviations from the model produced. PLS may be regarded as a superset of conventional ("ordinary") least squares regression.

Menubar:

MSS: QSAR >>> Partial Least Squares...

Format:

QSAR ANALYSIS DO mode row_expr col_expr PLS [weight_col] dependent_col [tailor_options] ...

Arguments:

mode = INTERACTIVE or BATCH
row_expr = rows to analyze
col_expr = column expression (all columns you plan to use later must be listed here)
weight_col = column which contains the weighting values. This argument is used only if the command TAILOR SET QSAR_ANALYSIS WEIGHT_ROWS is set to WEIGHT_BY_COLUMN_VALUE.
dependent_col = a subset of the original col_expr
tailor_options =
- BOOTSTRAPPING = number of random samples to draw to estimate the final model parameter stability
- CENTERING = NO/YES, whether to force the intercept (constant, offset) of the QSAR to have a value of 0.0
- COMPONENTS = model complexity. For the final model, redo with COMPONENTS set to the optimal number of components identified through the crossvalidated runs.
- CROSSVALIDATION = number of subgroups to draw to test model predictive power; use several crossvalidation groups to determine the optimal number of components. Unless the table is very large (such as CoMFA fields), set this value to the number of rows. For the final model, use ANALYSIS REDO with crossvalidation = 0.
- EPSILON = convergence criterion
- ITERATION = maximum number of iterations PLS performs
- SCALING_METHOD =
  - AUTOSCALE = every column is placed on an equal footing
  - COMFA_STD = considers a field as a whole (it is not appropriate to scale individual field values)
  - NONE = no scaling method
  - USER = weights for field and columns are selected

Remarks:

COMPONENTS

CROSSVALIDATION

MINIMUM_SIGMA

TAILOR SET QSAR_ANALYSIS

COMPONENTS

Additional Information:

Partial Least Squares Analysis Dialog.
TAILOR SET PLS command.
TAILOR SET QSAR_ANALYSIS command.
SAMPLS Tutorial.
Theoretical introduction to Partial Least Squares.
Suggested Reading list for further background information.

Back to QSAR ANALYSIS DO.

QSAR ANALYSIS DO RNN

Function:

Note:

Menubar:

MSS: Edit >>> Select Rows >>> Selector...

Form Diversity Clusters by

Reciprocal NN

Format:

QSAR ANALYSIS DO mode row_expr col_expr RNN

Arguments:

mode = INTERACTIVE or BATCH
row_expr = rows to analyze
col_expr = column expression (all columns you plan to use later must be listed here)

Additional Information:

TAILOR SET HIER command.
TAILOR SET QSAR_ANALYSIS command.

Back to QSAR ANALYSIS DO.

QSAR ANALYSIS DO SIMCA

Function:

To develop a model which relates column data to known categories in a table based on calculating a separate principle component for each category.

Menubar:

MSS: QSAR >>> SIMCA Analysis...

Format:

QSAR ANALYSIS DO mode row_expr col_expr SIMCA_ANALYSIS category_col [tailor_options] ...

Arguments:

mode = INTERACTIVE or BATCH
row_expr = rows to analyze
col_expr = column expression (all columns you plan to use later must be listed here)
category_col = a single column which contains the integer category designations for each row, must be a subset of the original col_expr
tailor_options =
- SCALING_METHOD =
  - AUTOSCALE = every column is placed on an equal footing
  - COMFA_STD = considers a field as a whole (It is not appropriate to scale individual field values.)
  - NONE = no scaling applied

Remarks:

The results which appear after a SIMCA run shows the correct number of rows predicted for each category. This may be used to estimate the accuracy of the model.

Additional Information:

SIMCA Analysis Dialog.
TAILOR SET SIMCA command.
TAILOR SET QSAR_ANALYSIS command.
Theoretical introduction to SIMCA analysis.
Suggested Reading list for further background information.

Back to QSAR ANALYSIS DO.