
4.3.4.2 Predicting Properties
Prediction of the target property value for an untested molecule is straightforward. After construction and charge calculation, the model is aligned and saved into the database, and MSS: QSAR >>> Predict (QSAR ANALYSIS PREDICT) yields the desired value. A warning is issued if any field values are higher or lower than any of the values used to derive the CoMFA model. (You can visualize such extreme values by retrieving the field with QSAR COMFA FIELD RETRIEVE RANGE_CHECK.)
The overall predictive performance of a model is often reported as its predictive r2 value, defined analogously to the q2 by comparing the accuracy of a series of predictions with the variation in all experimentally known target property data. If both experimental and predicted values are placed in different columns of the same table, the predictive r2 value may be calculated automatically using MSS: QSAR >>> Other >>> Predictive r2 (QSAR ANALYSIS RSQUARED_PREDICT).
How much confidence should be placed in a prediction? In the cases we know of so far, the predictions are about as accurate as the crossvalidation tests, which would imply that the RMS error of a prediction is similar to the RMS error of crossvalidation, reported as the first line of output in each PLS run. Of course, this inference cannot possibly be true for all possible molecules and models. Like any other empirical model derived from relatively few observations, a CoMFA model must have a limited range of applicability. One would not expect the steroid binding models [Ref. 9] to yield accurate binding predictions for argon, for polyethylene, or even for some steroid-like molecules such as diethylstilbestrol or perhydrophenanthrene. The range of molecules used to derive these models, while diverse in terms of steroid structures, cannot encompass all of chemistry.
How do you change a molecular structure to obtain a better target property value with CoMFA? Although we have no first-hand experience, it appears to us that this should be a straightforward process, since trial structures are so easily modeled and their target property so easily predicted. In the steric graphs, steric bulk should be moved closer to the regions of negative coefficients and farther from the regions of positive coefficients, and in the electrostatic graphs, positive charge should be moved closer to the regions of positive coefficients and farther from the regions of negative coefficients. (Assuming that higher values of the target property are desired.) To help assure that one has the right sign, MSS: QSAR >>> Graph QSAR (QSAR COMFA FIELD GRAPH) provides some explanatory information following the histogram which accompanies the USER_SPECIFIED selection mode. Another useful procedure is to examine graphs which clearly distinguish between molecules of high and low target properties, while showing the molecules being selected in a field graph (the CoMFA tutorial exemplifies this procedure). If more effective compounds cannot be identified by extrapolation of the CoMFA-identified trends, bolder exploratory changes may be considered, in the direction of spatial regions not yet greatly affected.


Copyright © 1999, Tripos Inc. All rights
reserved.