QSAR Techniques

Page 126 out of 225 pages in this book.
Tripos Bookshelf > QSAR > QSAR Theory > QSAR Techniques

4.5 QSAR Techniques

In this section, we have tried to begin each topic with qualitative information about the nature of a technique. The last section of each topic is a formal algorithmic description of the most important aspects of the calculations being performed.

4.5.1 Bootstrapping

In many SYBYL/QSAR analyses, confidence intervals (mean and standard deviation) for the parameters to be estimated can be calculated by a modern validation method, the bootstrap. The name is derived from the old saying about pulling yourself up by your own bootstraps. The idea is to simulate a statistical sampling procedure by assuming that the original data set is the true population and generating many new data sets from it. These new data sets (called bootstrap samplings) are of the same size as the original data set and are obtained by randomly choosing samples (rows) from the original data, repeated selection of the same row being allowed. The statistical calculation is performed on each of these bootstrap samplings, new values being calculated for each of the parameters to be estimated. The difference between the parameters calculated from the original data set and the average of the parameters calculated from the many bootstrap samplings is a measure of the bias of the original calculation. The calculated variance of the parameter estimates reflects the accuracy with which any of the parameters can be estimated from the input data.

Figure 21

BOOTSTRAP

TAILOR SET QSAR

QSAR ANALYSIS DO

You must decide for yourself how large N should be. Although values for N of 100 or more have been recommended, we have had good results with an N of 10.

Figure 21 The Bootstrapping Process