
4.5 QSAR Techniques
In this section, we have tried to begin each topic with qualitative information about the nature of a technique. The last section of each topic is a formal algorithmic description of the most important aspects of the calculations being performed.
4.5.1 Bootstrapping
In many SYBYL/QSAR analyses, confidence intervals (mean and standard deviation) for the parameters to be estimated can be calculated by a modern validation method, the bootstrap. The name is derived from the old saying about pulling yourself up by your own bootstraps. The idea is to simulate a statistical sampling procedure by assuming that the original data set is the true population and generating many new data sets from it. These new data sets (called bootstrap samplings) are of the same size as the original data set and are obtained by randomly choosing samples (rows) from the original data, repeated selection of the same row being allowed. The statistical calculation is performed on each of these bootstrap samplings, new values being calculated for each of the parameters to be estimated. The difference between the parameters calculated from the original data set and the average of the parameters calculated from the many bootstrap samplings is a measure of the bias of the original calculation. The calculated variance of the parameter estimates reflects the accuracy with which any of the parameters can be estimated from the input data.
A diagram of this process is shown in Figure 21.
Because bootstrapping requires the analysis to be repeated N times, where N is the number of bootstrap samplings, it is computationally intensive and is not done unless the BOOTSTRAP variable is set from 0 (no bootstrapping done) to N with the TAILOR SET QSAR command or with the QSAR ANALYSIS DO command.
You must decide for yourself how large N should be. Although values for N of 100 or more have been recommended, we have had good results with an N of 10.
Figure 21 The Bootstrapping Process


Copyright © 1999, Tripos Inc. All rights
reserved.