Hierarchical Clustering Analysis Dialog

Page 157 out of 225 pages in this book.
Tripos Bookshelf > QSAR > QSAR Graphical Interface > QSAR Analyses Dialogs

5.4.2 Hierarchical Clustering Analysis Dialog

Function:

To seek a clustering based on distance between points in the Euclidean space, with results available as a dendrogram showing the hierarchy of clusters at increasing level of detail.

License Requirements:

This application requires a QSAR license ("QSAR").

Menubar:

MSS: QSAR >>> Hierarchical Clustering...

Reanalyze QSAR

Dialog Description:

Figure 30 Hierarchical Clustering Analysis dialog

Columns to use [field and push button]

In the field specify the column(s) to use for this analysis. If columns are preselected in the spreadsheet their column names are listed here when you open the dialog, if they fit in the field; if the names are too long, a list of column numbers is shown instead. You can use column numbers, expressions (e.g., 3:5) or minimal unambiguous names when entering or changing column selections.

[...]

Column Selection Dialog

Molecular Spreadsheet Manual

Input Data

Normal [radio button]

Select this radio button when the input data are ordinary data values.

TAILOR SET HIER INPUT_FORM NORMAL

TAILOR SET HIER MOD_ANGLES NO

Inter-Row Distances [radio button]

Select this radio button when the input data cells contain distances among rows. The data matrix must then be diagonally symmetric.

TAILOR SET HIER INPUT_FORM DISTANCE

TAILOR SET HIER MOD_ANGLES NO

Bond Angles [radio button]

Select this radio button when the input data are torsional angles, such that a value of 359 is as close to 5 as it is to 353.

TAILOR SET HIER INPUT_FORM NORMAL

TAILOR SET HIER MOD_ANGLES YES

Parameters

Method [option menu]

Single

Average

Complete

Median

Single takes the distance between two clusters to be the smallest distance between their constituent points, producing dendrograms in which singletons sequentially coalesce into the first-formed cluster.
Complete takes the inter-cluster distance to be the greatest separation between their elements, producing dendrograms with multiple, compact root clusters and minimizing the generation of singletons.
Average is an intermediate which uses the average of all pairwise distances between points in two clusters
Median takes the distances between points within clusters into consideration.

Algorithmic Description of Hierarchical Cluster Analysis

TAILOR SET HIER CLUSTERING_METHOD

Scaling [option menu]

Select an option to control the relative weights for individual columns in the hierarchical clustering analysis.

Autoscale Centers each variable and rescales to unit variance. Recommended if no CoMFA columns are involved in the analysis. Uses this as the default if no CoMFA column has been selected.
CoMFA Standard Scales fields as if they were single variables to prevent scalar variables from being overwhelmed. Recommended if any CoMFA columns have been selected.
Specify Writes the CoMFA Standard weights for each variable and field to the terminal window for you to modify.
None No weighting; emphasizes variables with a larger numerical spread.

TAILOR SET QSAR_ANALYSIS SCALING

Filtering [check box and field]

Note:

Filtering

In the field, enter the minimum value of energy variation a CoMFA lattice point must have to be included in the hierarchical clustering analysis.

TAILOR SET QSAR_ANALYSIS MINIMUM_SIGMA

Actual Column Count... [push button]

Press this button to calculate the number of descriptors columns which would be dropped (if any), as well as number of columns which would be used if the analysis were performed with the current dialog settings. The values obtained are written in the textport window.

Filtering

Note

Building Spreadsheet for CoMFA

Batch Settings

Machine [option menu]

Local

Analysis name [field]

Enter the name of the analysis. It will be used for future reference and for any batch job.

Netbatch [push button]

Netbatch Dialog

Run in Batch [check box]

Check this box to perform the analysis in batch mode rather than in interactive mode.

Show Dendrogram [check box and option menu]

Check the box to create a dendrogram for the new analysis before you are prompted whether or not to save it permanently. The menu is (re)set by default to a display area which does not already contain a graph, or to D1 if graphs are posted in all four display areas. It does not check for occupancy of the corresponding molecule areas.

By setting your screen to quartered mode, you can compare your results with existing hierarchical analyses to help decide whether or not to keep the new analysis. If you opt to not save the analysis, the dendrogram will be deleted.

Output Clusters to Spreadsheet [check box]

Generate Cluster Column(s) Dialog

When this check box is off,

TAILOR SET QSAR_ANALYSIS ANALYSIS_OUTPUT

NORMAL_FILE

COLUMN_ONLY

TAILOR SET HIER LEVEL_FOR_COLUMNS

Note:

Additional Information:

QSAR ANALYSIS DO ... HIER command.
QSAR ANALYSIS REDO command.
TAILOR SET HIER command.
TAILOR SET QSAR_ANALYSIS command.
Hierarchical Clustering Tutorial.
Clustering Methods Tutorial.
Hierarchical Analysis Tutorial.
Theoretical introduction to Hierarchical Cluster Analysis.
Suggested Reading list for further background information.