Editing Quantitative structure–activity relationship (section)

== Evaluation of the quality of QSAR models ==

QSAR modeling produces predictive [[statistical model|model]]s derived from application of statistical tools correlating [[biological activity]] (including desirable therapeutic effect and undesirable side effects) or physico-chemical properties in QSPR models of chemicals (drugs/toxicants/environmental pollutants) with descriptors representative of [[molecular geometry|molecular structure]] or [[molecular property|properties]]. QSARs are being applied in many disciplines, for example: [[risk assessment]], toxicity prediction, and regulatory decisions<ref name="Tong_2005">{{cite journal | vauthors = Tong W, Hong H, Xie Q, Shi L, Fang H, Perkins R | title = Assessing QSAR Limitations – A Regulatory Perspective | journal = Current Computer-Aided Drug Design | volume = 1 | issue = 2 | pages = 195–205 |date=April 2005 | doi = 10.2174/1573409053585663 | url = https://zenodo.org/record/1235864 }}</ref> in addition to [[drug discovery]] and [[drug development|lead optimization]].<ref name="pmid13677480">{{cite journal | vauthors = Dearden JC | title = In silico prediction of drug toxicity | journal = Journal of Computer-Aided Molecular Design | volume = 17 | issue = 2–4 | pages = 119–27 | year = 2003 | pmid = 13677480 | doi = 10.1023/A:1025361621494 | bibcode = 2003JCAMD..17..119D | s2cid = 21518449 }}</ref> Obtaining a good quality QSAR model depends on many factors, such as the quality of input data, the choice of descriptors and statistical methods for modeling and for validation. Any QSAR modeling should ultimately lead to statistically robust and predictive models capable of making accurate and reliable predictions of the modeled response of new compounds.

For validation of QSAR models, usually various strategies are adopted:<ref name="isbn3-527-30044-9">{{cite book | vauthors =  Wold S, Eriksson L | editor = Waterbeemd, Han van de | title = Chemometric methods in molecular design | publisher = VCH | location = Weinheim | year = 1995 | pages = 309–318 | chapter = Statistical validation of QSAR results | isbn = 978-3-527-30044-0 }}</ref>
# internal validation or [[iterative cross-validation (statistics)|cross-validation]] (actually, while extracting data, cross validation is a measure of model robustness, the more a model is robust (higher q2) the less data extraction perturb the original model);
# external validation by splitting the available data set into training set for model development and prediction set for model predictivity check;
# blind external validation by application of model on new external data and
# data randomization or Y-scrambling for verifying the absence of chance correlation between the response and the modeling descriptors.

The success of any QSAR model depends on accuracy of the input data, selection of appropriate descriptors and statistical tools, and most importantly validation of the developed model. Validation is the process by which the reliability and relevance of a procedure are established for a specific purpose; for QSAR models validation must be mainly for robustness, prediction performances and [[applicability domain]] (AD) of the models.<ref name = "Tropsha_2003"/><ref name = "Gramatica_2007"/><ref name="Chirico_Gramatica_2012"/><ref name=Roy2007>{{cite journal | vauthors = Roy K | title = On some aspects of validation of predictive quantitative structure-activity relationship models | journal = Expert Opinion on Drug Discovery | volume = 2 | issue = 12 | pages = 1567–77 | date = Dec 2007 | pmid =  23488901| doi = 10.1517/17460441.2.12.1567 | s2cid = 21305783 }}</ref><ref name = "Sahigara_2012">{{cite journal |last1=Sahigara |first1=Faizan |last2=Mansouri |first2=Kamel |last3=Ballabio |first3=Davide |last4=Mauri |first4=Andrea |last5=Consonni |first5=Viviana |last6=Todeschini |first6=Roberto |title=Comparison of Different Approaches to Define the Applicability Domain of QSAR Models |journal=Molecules |date=2012 |volume=17 |issue=5 |pages=4791–4810 |doi=10.3390/molecules17054791|pmid=22534664 |pmc=6268288 |doi-access=free }}</ref>

Some validation methodologies can be problematic. For example, ''leave one-out'' cross-validation generally leads to an overestimation of predictive capacity. Even with external validation, it is difficult to determine whether the selection of training and test sets was manipulated to maximize the predictive capacity of the model being published.

Different aspects of validation of QSAR models that need attention include methods of selection of training set compounds,<ref>{{cite journal | vauthors = Leonard JT, Roy K | title = On selection of training and test sets for the development of predictive QSAR models | journal = QSAR & Combinatorial Science | volume = 25 | issue = 3 | pages =  235–251| year = 2006 | doi = 10.1002/qsar.200510161 }}</ref> setting training set size<ref>{{cite journal | vauthors = Roy PP, Leonard JT, Roy K | title = Exploring the impact of size of training sets for the development of predictive QSAR models | journal = Chemometrics and Intelligent Laboratory Systems | volume = 90 | issue = 1 | pages = 31–42 | year = 2008 | doi = 10.1016/j.chemolab.2007.07.004 }}</ref> and impact of variable selection<ref name="pmid17933600">{{cite journal | vauthors = Put R, Vander Heyden Y | title = Review on modelling aspects in reversed-phase liquid chromatographic quantitative structure-retention relationships | journal = Analytica Chimica Acta | volume = 602 | issue = 2 | pages = 164–72 | date = Oct 2007 | pmid = 17933600 | doi = 10.1016/j.aca.2007.09.014 }}</ref>  for training set models for determining the quality of prediction. Development of novel validation parameters for judging quality of QSAR models is also important.<ref name="Chirico_Gramatica_2012"/><ref name="Roy_2009">{{cite journal | vauthors = Pratim Roy P, Paul S, Mitra I, Roy K | title = On two novel parameters for validation of predictive QSAR models | journal = Molecules | volume = 14 | issue = 5 | pages = 1660–701 | year = 2009 | pmid = 19471190 | pmc = 6254296 | doi = 10.3390/molecules14051660 | doi-access = free }}</ref><ref name="pmid21800825">{{cite journal | vauthors = Chirico N, Gramatica P | title = Real external predictivity of QSAR models: how to evaluate it? Comparison of different validation criteria and proposal of using the concordance correlation coefficient | journal = Journal of Chemical Information and Modeling | volume = 51 | issue = 9 | pages = 2320–35 | date = Sep 2011 | pmid = 21800825 | doi = 10.1021/ci200211n }}</ref>