Applicability Domain
The Applicability Domain (AD) of a QSAR is the physico-chemical, structural or biological space, knowledge or information on which the training set of the model has been developed, and for which it is applicable to make predictions for new compounds.
The purpose of AD is to state whether the model's assumptions are met. In general, this is the case for interpolation rather than for extrapolation. Although up to now there is no single generally accepted algorithm for determining the AD, there exists a rather systematic approach for defining interpolation regions[1]. The process involves the removal of outliers and a probability density distribution method using kernel-weighted sampling.
To investigate the AD of a training set of chemicals one can directly analyse properties of the multivariate descriptor space of the training compounds or more indirectly via distance (or similarity) metrics. When using distance metrics care should be taken to use an orthogonal and significant vector space. This can be achieved by different means of feature selection and successive principle components analysis.
Notes
- ↑ Jaworska J, Nikolova-Jeliazkova N, Aldenberg T: QSAR applicabilty domain estimation by projection of the training set descriptor space: a review. Altern Lab Anim 2005, 33(5):445-459