Page 33 out of 225 pages in this book.
Tripos Bookshelf  >  QSAR  >  QSAR Tutorials  >  Clustering Methods Tutorial

2.3.4 Examining the Dendrograms

1. At the start of the hierarchical clustering algorithm, each row is considered a cluster unto itself. Let's call these singleton clusters B, F, H, L, P, and Z to distinguish them from the corresponding points Bob, Flo, Herb, Liz, Pat, and Zeke. At each step thereafter, those clusters which are closest together1 will be merged.

2. The methods diverge in the next step, because they have different definitions of how far clusters are from each other. Consider Herb, which is at the bottom center of the Methods graph.

Figure 8 Complete Clustering

Figure 9 Single Clustering

Figure 10 Average Clustering

3. The second clustering step for Complete linkage assigned Liz as the fourth point from the left. What is the next clustering level? The distance from Pat to either Flo or Bob is 2.256, whereas the distance between Pat and Zeke is 2.250. Hence clusters P and Z are consolidated to create C3.

4. Under the Single linkage method, the third step (the third horizontal bar moving up from the bottom of the dendrogram) entails adding another point to S2. Which point is it?

    
        	 Pick: 
        	 ROW 4 LIZ

5. The progression for Average clustering is less intuitive. In this case, the third clustering level after A2 was formed entailed pairing up Liz with some element not in A2. Which one?

    
        	 Pick: 
        	 ROW 6 ZEKE


            


1

This discussion is cast in terms of distance for ease of discussion. In fact, however, the hierarchical clustering algorithm actually uses similarity to determine which agglomeration to do next. Similarity is inverse to distance where distance is well-defined, but, unlike distance, can be used where the triangle inequality may not apply, e.g., for fingerprint descriptors.

Copyright © 1999, Tripos Inc. All rights reserved.