Editing Quantitative structure–activity relationship (section)

== Application ==

=== Chemical ===

One of the first historical QSAR applications was to predict [[boiling point]]s.<ref name="isbn0-85626-454-7">{{cite book | vauthors = Rouvray DH, Bonchev D | title = Chemical graph theory: introduction and fundamentals | publisher = Abacus Press | location = Tunbridge Wells, Kent, England | year = 1991 | isbn = 978-0-85626-454-2 }}</ref>

It is well known for instance that within a particular [[chemical classification|family]] of [[chemical compound]]s, especially of [[organic chemistry]], that there are strong [[correlation]]s between structure and observed properties. A simple example is the relationship between the number of carbons in [[alkanes]] and their [[boiling point]]s. There is a clear trend in the increase of boiling point with an increase in the number carbons, and this serves as a means for predicting the boiling points of [[higher alkanes]].

A still very interesting application is the [[Hammett equation]], [[Taft equation]] and [[Acid dissociation constant|pKa prediction]] methods.<ref name="RMC_2013">{{cite encyclopedia | last = Fraczkiewicz | first = R | encyclopedia = Reference Module in Chemistry, Molecular Sciences and Chemical Engineering [Online] | editor-last = Reedijk | editor-first = J | volume = 5 | publisher = Elsevier | location = Amsterdam, the Netherlands | year = 2013 | doi = 10.1016/B978-0-12-409547-2.02610-X | title = Reference Module in Chemistry, Molecular Sciences and Chemical Engineering | isbn = 9780124095472 | chapter = In Silico Prediction of Ionization }}</ref>

=== Biological ===

The biological activity of molecules is usually measured in [[assay]]s to establish the level of inhibition of particular [[signal transduction]] or [[metabolic pathway]]s. [[Drug discovery]] often involves the use of QSAR to identify chemical structures that could have good inhibitory effects on specific [[biological target|targets]] and have low [[toxicity]] (non-specific activity).  Of special interest is the prediction of [[partition coefficient]] log ''P'', which is an important measure used in identifying "[[druglikeness]]" according to [[Lipinski's Rule of Five]].{{cn|date=March 2024}}

While many quantitative structure activity relationship analyses involve the interactions of a family of molecules with an [[enzyme]] or [[receptor (biochemistry)|receptor]] binding site, QSAR can also be used to study the interactions between the [[structural domain]]s of proteins. Protein-protein interactions can be quantitatively analyzed for structural variations resulted from [[site-directed mutagenesis]].<ref name="pmid12668435">{{cite journal | vauthors = Freyhult EK, Andersson K, Gustafsson MG | title = Structural modeling extends QSAR analysis of antibody-lysozyme interactions to 3D-QSAR | journal = Biophysical Journal | volume = 84 | issue = 4 | pages = 2264–72 | date = Apr 2003 | pmid = 12668435 | pmc = 1302793 | doi = 10.1016/S0006-3495(03)75032-2 | bibcode = 2003BpJ....84.2264F }}</ref>

It is part of the [[machine learning]] method to reduce the risk for a SAR paradox, especially taking into account that only a finite amount of data is available (see also [[Minimum-variance unbiased estimator|MVUE]]). In general, all QSAR problems can be divided into [[Coding (social sciences)|coding]]<ref name="isbn3-527-29913-0">{{cite book | vauthors = Timmerman H, Todeschini R, Consonni V, Mannhold R, Kubinyi H | title = Handbook of Molecular Descriptors | publisher = Wiley-VCH | location = Weinheim | year = 2002 | isbn = 978-3-527-29913-3 }}</ref>
and [[learning]].<ref name="isbn0-471-05669-3">{{cite book |vauthors=Duda RO, Hart PW, Stork DG | title = Pattern classification | publisher = John Wiley & Sons | location = Chichester | year = 2001 | isbn = 978-0-471-05669-0 }}</ref>

=== Applications ===

(Q)SAR models have been used for [[risk management]]. QSARS are suggested by regulatory authorities; in the [[European Union]], QSARs are suggested by the [[Registration, Evaluation, Authorisation and Restriction of Chemicals|REACH]] regulation, where "REACH" abbreviates "Registration, Evaluation, Authorisation and Restriction of Chemicals". Regulatory application of QSAR methods includes ''in silico'' toxicological assessment of genotoxic impurities.<ref>{{Cite journal|last1=Fioravanzo|first1=E.|last2=Bassan|first2=A.|last3=Pavan|first3=M.|last4=Mostrag-Szlichtyng|first4=A.|last5=Worth|first5=A. P.|date=2012-04-01|title=Role of in silico genotoxicity tools in the regulatory assessment of pharmaceutical impurities|journal=SAR and QSAR in Environmental Research|volume=23|issue=3–4|pages=257–277|doi=10.1080/1062936X.2012.657236|issn=1062-936X|pmid=22369620|s2cid=2714861}}</ref> Commonly used QSAR assessment software such as DEREK or CASE Ultra (MultiCASE) is used to genotoxicity of impurity according to ICH M7.<ref>ICH M7 Assessment and control of DNA reactive (mutagenic) impurities in pharmaceuticals to limit potential carcinogenic risk - Scientific guideline [https://www.ema.europa.eu/en/ich-m7-assessment-control-dna-reactive-mutagenic-impurities-pharmaceuticals-limit-potential]</ref>

The chemical descriptor space whose [[convex hull]] is generated by a particular training set of chemicals is called the training set's [[applicability domain]]. Prediction of properties of novel chemicals that are located outside the applicability domain uses [[extrapolation]], and so is less reliable (on average) than prediction within the applicability domain. The assessment of the reliability of QSAR predictions remains a research topic.{{cn|date=March 2024}}

The QSAR equations can be used to predict biological activities of newer molecules before their synthesis.

Examples of machine learning tools for QSAR modeling include:<ref name="pmid25448759">{{cite journal | vauthors = Lavecchia A | title = Machine-learning approaches in drug discovery: methods and applications | journal = Drug Discovery Today | volume = 20 | issue = 3 | pages = 318–31 | date = Mar 2015 | pmid = 25448759 | doi = 10.1016/j.drudis.2014.10.012 }}</ref>

{| class="wikitable"
|-
! S.No. !! Name !! Algorithms !! External link
|-
| 1. || R || RF, SVM, Naïve Bayesian, and ANN || {{cite web | url = http://www.r-project.org/ | title = R: The R Project for Statistical Computing }}
|-
| 2. || libSVM || SVM || {{cite web | url = https://www.csie.ntu.edu.tw/~cjlin/libsvm/ | title = LIBSVM -- A Library for Support Vector Machines }}
|-
| 3. || Orange || RF, SVM, and Naïve Bayesian || {{cite web | url = http://www.ailab.si/orange/ | title = Orange Data Mining }}
|-
| 4. || RapidMiner || SVM, RF, Naïve Bayes, DT, ANN, and k-NN || {{cite web | url = http://rapid-i.com/ | title = RapidMiner &#124; #1 Open Source Predictive Analytics Platform }}
|-
| 5. || Weka || RF, SVM, and Naïve Bayes || {{cite web | url = http://www.cs.waikato.ac.nz/ml/weka/ | title = Weka 3 - Data Mining with Open Source Machine Learning Software in Java | access-date = 2016-03-24 | archive-date = 2011-10-28 | archive-url = https://web.archive.org/web/20111028090649/http://www.cs.waikato.ac.nz/ml/weka/ | url-status = dead }}
|-
| 6. || Knime || DT, Naïve Bayes, and SVM || {{cite web | url = http://www.knime.org/ | title = KNIME &#124; Open for Innovation }}
|-
| 7. || AZOrange<ref name="pmid21798025">{{cite journal | vauthors = Stålring JC, Carlsson LA, Almeida P, Boyer S | title = AZOrange - High performance open source machine learning for QSAR modeling in a graphical programming environment | journal = Journal of Cheminformatics | volume = 3 | pages = 28 | year = 2011 | pmid = 21798025 | pmc = 3158423 | doi = 10.1186/1758-2946-3-28 | doi-access = free }}</ref> || RT, SVM, ANN, and RF || {{cite web | url = https://github.com/AZcompTox/AZOrange | title = AZCompTox/AZOrange: AstraZeneca add-ons to Orange. | work = GitHub | date = 2018-09-19 }}
|-
| 8. || Tanagra || SVM, RF, Naïve Bayes, and DT || {{cite web | url = http://eric.univ-lyon2.fr/~ricco/tanagra/en/tanagra.html | title = TANAGRA - A free DATA MINING software for teaching and research | access-date = 2016-03-24 | archive-date = 2017-12-19 | archive-url = https://web.archive.org/web/20171219194223/http://eric.univ-lyon2.fr/~ricco/tanagra/en/tanagra.html | url-status = dead }}
|-
| 9. || Elki || k-NN || {{cite web | url = http://elki.dbs.ifi.lmu.de/++ | title = ELKI Data Mining Framework | archive-url = https://web.archive.org/web/20161119100656/http://elki.dbs.ifi.lmu.de/ | archive-date = 2016-11-19 | url-status = dead }}
|-
| 10. || MALLET ||   || {{cite web | url = http://mallet.cs.umass.edu/ | title = MALLET homepage }}
|-
| 11. || MOA ||   || {{cite web | url = http://moa.cms.waikato.ac.nz/+ | title = MOA Massive Online Analysis &#124; Real Time Analytics for Data Streams | archive-url = https://web.archive.org/web/20170619113241/http://moa.cms.waikato.ac.nz/ | archive-date = 2017-06-19 | url-status = dead }}
|-
| 12. || Deep Chem || Logistic Regression, Naive Bayes, RF, ANN, and others || {{cite web|title=DeepChem|url=https://deepchem.io/|website=deepchem.io|access-date=20 October 2017}}
|-
| 13. || alvaModel<ref name="issn1422-0067">{{cite journal |last1=Mauri |first1=Andrea |last2=Bertola |first2=Matteo| title = Alvascience: A New Software Suite for the QSAR Workflow Applied to the Blood–Brain Barrier Permeability | journal = International Journal of Molecular Sciences | volume = 23 | issue= 12882 | year = 2022 |page=12882 | doi = 10.3390/ijms232112882 |pmid=36361669 |pmc=9655980 |doi-access=free }}</ref> || Regression ([[Ordinary least squares|OLS]], [[Partial least squares regression|PLS]], [[K-nearest neighbors algorithm|k-NN]], [[Support-vector machine|SVM]] and Consensus) and Classification ([[Linear discriminant analysis|LDA/QDA]], [[Partial least squares regression|PLS-DA]], [[K-nearest neighbors algorithm|k-NN]], [[Support-vector machine|SVM]] and Consensus) || {{cite web | title=alvaModel: a software tool to create QSAR/QSPR models | url=https://www.alvascience.com/alvamodel/ | website=alvascience.com}}
|-
| 14. ||[[scikit-learn]] ([[Python (programming language)|Python]]) <ref name="sklearn">{{cite journal
|author1=Fabian Pedregosa
|author2=Gaël Varoquaux
|author3=Alexandre Gramfort
|author4=Vincent Michel
|author5=Bertrand Thirion
|author6=Olivier Grisel
|author7=Mathieu Blondel
|author8=Peter Prettenhofer
|author9=Ron Weiss
|author10=Vincent Dubourg
|author11=Jake Vanderplas
|author12=Alexandre Passos
|author13=David Cournapeau
|author14=Matthieu Perrot
|author15=Édouard Duchesnay
|title=scikit-learn: Machine Learning in Python
|journal=Journal of Machine Learning Research
|year=2011
|volume=12
|pages=2825–2830
|url=http://jmlr.org/papers/v12/pedregosa11a.html
}}</ref>|| Logistic Regression, Naive Bayes, kNN, RF, SVM, GP, ANN, and others || {{cite web|title=SciKit-Learn|url=https://scikit-learn.org/stable/index.html#|website=scikit-learn.org|access-date=13 August 2023}}
|-
|15.
|Scikit-Mol<ref>{{Citation |last=Bjerrum |first=Esben Jannik |title=Scikit-Mol brings cheminformatics to Scikit-Learn |date=2023-12-06 |url=https://chemrxiv.org/engage/chemrxiv/article-details/60ef0fc58825826143a82cc0 |access-date=2025-01-17 |language=en |doi=10.26434/chemrxiv-2023-fzqwd |last2=Bachorz |first2=Rafał Adam |last3=Bitton |first3=Adrien |last4=Choung |first4=Oh-hyeon |last5=Chen |first5=Ya |last6=Esposito |first6=Carmen |last7=Ha |first7=Son Viet |last8=Poehlmann |first8=Andreas}}</ref>
|Integration of [[Scikit-learn]] models and [[RDKit]] featurization
|[https://pypi.org/project/scikit-mol/ scikit-mol] on pypi.org
|-
| 16. || scikit-fingerprints<ref>Adamczyk, J., & Ludynia, P. (2024). Scikit-fingerprints: Easy and efficient computation of molecular fingerprints in Python. SoftwareX, 28, 101944. https://doi.org/https://doi.org/10.1016/j.softx.2024.101944</ref> || [[Molecular_descriptor|Molecular fingerprints]], API compatible with [[Scikit-learn]] models || {{cite web|title=scikit-fingerprints|url=https://github.com/scikit-fingerprints/scikit-fingerprints|access-date=29 December 2024}}
|-
| 17. || DTC Lab Tools || Multiple Linear Regression, Partial Least Squares, Applicability Domain, Validation, and others || {{cite web|title=DTCLab Tools|url=https://teqip.jdvu.ac.in/QSAR_Tools/|access-date=12 May 2025}}
|-
| 18. || DTC Lab Supplementary Tools || Quantitative Read-across, q-RASAR, ARKA, Regression and Classification-based ML tools, and others || {{cite web|title=DTCLab Supplementary Tools|url=https://sites.google.com/jadavpuruniversity.in/dtc-lab-software/home/|access-date=12 May 2025}}

|}