Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Quantitative structure–activity relationship
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Application == === Chemical === One of the first historical QSAR applications was to predict [[boiling point]]s.<ref name="isbn0-85626-454-7">{{cite book | vauthors = Rouvray DH, Bonchev D | title = Chemical graph theory: introduction and fundamentals | publisher = Abacus Press | location = Tunbridge Wells, Kent, England | year = 1991 | isbn = 978-0-85626-454-2 }}</ref> It is well known for instance that within a particular [[chemical classification|family]] of [[chemical compound]]s, especially of [[organic chemistry]], that there are strong [[correlation]]s between structure and observed properties. A simple example is the relationship between the number of carbons in [[alkanes]] and their [[boiling point]]s. There is a clear trend in the increase of boiling point with an increase in the number carbons, and this serves as a means for predicting the boiling points of [[higher alkanes]]. A still very interesting application is the [[Hammett equation]], [[Taft equation]] and [[Acid dissociation constant|pKa prediction]] methods.<ref name="RMC_2013">{{cite encyclopedia | last = Fraczkiewicz | first = R | encyclopedia = Reference Module in Chemistry, Molecular Sciences and Chemical Engineering [Online] | editor-last = Reedijk | editor-first = J | volume = 5 | publisher = Elsevier | location = Amsterdam, the Netherlands | year = 2013 | doi = 10.1016/B978-0-12-409547-2.02610-X | title = Reference Module in Chemistry, Molecular Sciences and Chemical Engineering | isbn = 9780124095472 | chapter = In Silico Prediction of Ionization }}</ref> === Biological === The biological activity of molecules is usually measured in [[assay]]s to establish the level of inhibition of particular [[signal transduction]] or [[metabolic pathway]]s. [[Drug discovery]] often involves the use of QSAR to identify chemical structures that could have good inhibitory effects on specific [[biological target|targets]] and have low [[toxicity]] (non-specific activity). Of special interest is the prediction of [[partition coefficient]] log ''P'', which is an important measure used in identifying "[[druglikeness]]" according to [[Lipinski's Rule of Five]].{{cn|date=March 2024}} While many quantitative structure activity relationship analyses involve the interactions of a family of molecules with an [[enzyme]] or [[receptor (biochemistry)|receptor]] binding site, QSAR can also be used to study the interactions between the [[structural domain]]s of proteins. Protein-protein interactions can be quantitatively analyzed for structural variations resulted from [[site-directed mutagenesis]].<ref name="pmid12668435">{{cite journal | vauthors = Freyhult EK, Andersson K, Gustafsson MG | title = Structural modeling extends QSAR analysis of antibody-lysozyme interactions to 3D-QSAR | journal = Biophysical Journal | volume = 84 | issue = 4 | pages = 2264–72 | date = Apr 2003 | pmid = 12668435 | pmc = 1302793 | doi = 10.1016/S0006-3495(03)75032-2 | bibcode = 2003BpJ....84.2264F }}</ref> It is part of the [[machine learning]] method to reduce the risk for a SAR paradox, especially taking into account that only a finite amount of data is available (see also [[Minimum-variance unbiased estimator|MVUE]]). In general, all QSAR problems can be divided into [[Coding (social sciences)|coding]]<ref name="isbn3-527-29913-0">{{cite book | vauthors = Timmerman H, Todeschini R, Consonni V, Mannhold R, Kubinyi H | title = Handbook of Molecular Descriptors | publisher = Wiley-VCH | location = Weinheim | year = 2002 | isbn = 978-3-527-29913-3 }}</ref> and [[learning]].<ref name="isbn0-471-05669-3">{{cite book |vauthors=Duda RO, Hart PW, Stork DG | title = Pattern classification | publisher = John Wiley & Sons | location = Chichester | year = 2001 | isbn = 978-0-471-05669-0 }}</ref> === Applications === (Q)SAR models have been used for [[risk management]]. QSARS are suggested by regulatory authorities; in the [[European Union]], QSARs are suggested by the [[Registration, Evaluation, Authorisation and Restriction of Chemicals|REACH]] regulation, where "REACH" abbreviates "Registration, Evaluation, Authorisation and Restriction of Chemicals". Regulatory application of QSAR methods includes ''in silico'' toxicological assessment of genotoxic impurities.<ref>{{Cite journal|last1=Fioravanzo|first1=E.|last2=Bassan|first2=A.|last3=Pavan|first3=M.|last4=Mostrag-Szlichtyng|first4=A.|last5=Worth|first5=A. P.|date=2012-04-01|title=Role of in silico genotoxicity tools in the regulatory assessment of pharmaceutical impurities|journal=SAR and QSAR in Environmental Research|volume=23|issue=3–4|pages=257–277|doi=10.1080/1062936X.2012.657236|issn=1062-936X|pmid=22369620|s2cid=2714861}}</ref> Commonly used QSAR assessment software such as DEREK or CASE Ultra (MultiCASE) is used to genotoxicity of impurity according to ICH M7.<ref>ICH M7 Assessment and control of DNA reactive (mutagenic) impurities in pharmaceuticals to limit potential carcinogenic risk - Scientific guideline [https://www.ema.europa.eu/en/ich-m7-assessment-control-dna-reactive-mutagenic-impurities-pharmaceuticals-limit-potential]</ref> The chemical descriptor space whose [[convex hull]] is generated by a particular training set of chemicals is called the training set's [[applicability domain]]. Prediction of properties of novel chemicals that are located outside the applicability domain uses [[extrapolation]], and so is less reliable (on average) than prediction within the applicability domain. The assessment of the reliability of QSAR predictions remains a research topic.{{cn|date=March 2024}} The QSAR equations can be used to predict biological activities of newer molecules before their synthesis. Examples of machine learning tools for QSAR modeling include:<ref name="pmid25448759">{{cite journal | vauthors = Lavecchia A | title = Machine-learning approaches in drug discovery: methods and applications | journal = Drug Discovery Today | volume = 20 | issue = 3 | pages = 318–31 | date = Mar 2015 | pmid = 25448759 | doi = 10.1016/j.drudis.2014.10.012 }}</ref> {| class="wikitable" |- ! S.No. !! Name !! Algorithms !! External link |- | 1. || R || RF, SVM, Naïve Bayesian, and ANN || {{cite web | url = http://www.r-project.org/ | title = R: The R Project for Statistical Computing }} |- | 2. || libSVM || SVM || {{cite web | url = https://www.csie.ntu.edu.tw/~cjlin/libsvm/ | title = LIBSVM -- A Library for Support Vector Machines }} |- | 3. || Orange || RF, SVM, and Naïve Bayesian || {{cite web | url = http://www.ailab.si/orange/ | title = Orange Data Mining }} |- | 4. || RapidMiner || SVM, RF, Naïve Bayes, DT, ANN, and k-NN || {{cite web | url = http://rapid-i.com/ | title = RapidMiner | #1 Open Source Predictive Analytics Platform }} |- | 5. || Weka || RF, SVM, and Naïve Bayes || {{cite web | url = http://www.cs.waikato.ac.nz/ml/weka/ | title = Weka 3 - Data Mining with Open Source Machine Learning Software in Java | access-date = 2016-03-24 | archive-date = 2011-10-28 | archive-url = https://web.archive.org/web/20111028090649/http://www.cs.waikato.ac.nz/ml/weka/ | url-status = dead }} |- | 6. || Knime || DT, Naïve Bayes, and SVM || {{cite web | url = http://www.knime.org/ | title = KNIME | Open for Innovation }} |- | 7. || AZOrange<ref name="pmid21798025">{{cite journal | vauthors = Stålring JC, Carlsson LA, Almeida P, Boyer S | title = AZOrange - High performance open source machine learning for QSAR modeling in a graphical programming environment | journal = Journal of Cheminformatics | volume = 3 | pages = 28 | year = 2011 | pmid = 21798025 | pmc = 3158423 | doi = 10.1186/1758-2946-3-28 | doi-access = free }}</ref> || RT, SVM, ANN, and RF || {{cite web | url = https://github.com/AZcompTox/AZOrange | title = AZCompTox/AZOrange: AstraZeneca add-ons to Orange. | work = GitHub | date = 2018-09-19 }} |- | 8. || Tanagra || SVM, RF, Naïve Bayes, and DT || {{cite web | url = http://eric.univ-lyon2.fr/~ricco/tanagra/en/tanagra.html | title = TANAGRA - A free DATA MINING software for teaching and research | access-date = 2016-03-24 | archive-date = 2017-12-19 | archive-url = https://web.archive.org/web/20171219194223/http://eric.univ-lyon2.fr/~ricco/tanagra/en/tanagra.html | url-status = dead }} |- | 9. || Elki || k-NN || {{cite web | url = http://elki.dbs.ifi.lmu.de/++ | title = ELKI Data Mining Framework | archive-url = https://web.archive.org/web/20161119100656/http://elki.dbs.ifi.lmu.de/ | archive-date = 2016-11-19 | url-status = dead }} |- | 10. || MALLET || || {{cite web | url = http://mallet.cs.umass.edu/ | title = MALLET homepage }} |- | 11. || MOA || || {{cite web | url = http://moa.cms.waikato.ac.nz/+ | title = MOA Massive Online Analysis | Real Time Analytics for Data Streams | archive-url = https://web.archive.org/web/20170619113241/http://moa.cms.waikato.ac.nz/ | archive-date = 2017-06-19 | url-status = dead }} |- | 12. || Deep Chem || Logistic Regression, Naive Bayes, RF, ANN, and others || {{cite web|title=DeepChem|url=https://deepchem.io/|website=deepchem.io|access-date=20 October 2017}} |- | 13. || alvaModel<ref name="issn1422-0067">{{cite journal |last1=Mauri |first1=Andrea |last2=Bertola |first2=Matteo| title = Alvascience: A New Software Suite for the QSAR Workflow Applied to the Blood–Brain Barrier Permeability | journal = International Journal of Molecular Sciences | volume = 23 | issue= 12882 | year = 2022 |page=12882 | doi = 10.3390/ijms232112882 |pmid=36361669 |pmc=9655980 |doi-access=free }}</ref> || Regression ([[Ordinary least squares|OLS]], [[Partial least squares regression|PLS]], [[K-nearest neighbors algorithm|k-NN]], [[Support-vector machine|SVM]] and Consensus) and Classification ([[Linear discriminant analysis|LDA/QDA]], [[Partial least squares regression|PLS-DA]], [[K-nearest neighbors algorithm|k-NN]], [[Support-vector machine|SVM]] and Consensus) || {{cite web | title=alvaModel: a software tool to create QSAR/QSPR models | url=https://www.alvascience.com/alvamodel/ | website=alvascience.com}} |- | 14. ||[[scikit-learn]] ([[Python (programming language)|Python]]) <ref name="sklearn">{{cite journal |author1=Fabian Pedregosa |author2=Gaël Varoquaux |author3=Alexandre Gramfort |author4=Vincent Michel |author5=Bertrand Thirion |author6=Olivier Grisel |author7=Mathieu Blondel |author8=Peter Prettenhofer |author9=Ron Weiss |author10=Vincent Dubourg |author11=Jake Vanderplas |author12=Alexandre Passos |author13=David Cournapeau |author14=Matthieu Perrot |author15=Édouard Duchesnay |title=scikit-learn: Machine Learning in Python |journal=Journal of Machine Learning Research |year=2011 |volume=12 |pages=2825–2830 |url=http://jmlr.org/papers/v12/pedregosa11a.html }}</ref>|| Logistic Regression, Naive Bayes, kNN, RF, SVM, GP, ANN, and others || {{cite web|title=SciKit-Learn|url=https://scikit-learn.org/stable/index.html#|website=scikit-learn.org|access-date=13 August 2023}} |- |15. |Scikit-Mol<ref>{{Citation |last=Bjerrum |first=Esben Jannik |title=Scikit-Mol brings cheminformatics to Scikit-Learn |date=2023-12-06 |url=https://chemrxiv.org/engage/chemrxiv/article-details/60ef0fc58825826143a82cc0 |access-date=2025-01-17 |language=en |doi=10.26434/chemrxiv-2023-fzqwd |last2=Bachorz |first2=Rafał Adam |last3=Bitton |first3=Adrien |last4=Choung |first4=Oh-hyeon |last5=Chen |first5=Ya |last6=Esposito |first6=Carmen |last7=Ha |first7=Son Viet |last8=Poehlmann |first8=Andreas}}</ref> |Integration of [[Scikit-learn]] models and [[RDKit]] featurization |[https://pypi.org/project/scikit-mol/ scikit-mol] on pypi.org |- | 16. || scikit-fingerprints<ref>Adamczyk, J., & Ludynia, P. (2024). Scikit-fingerprints: Easy and efficient computation of molecular fingerprints in Python. SoftwareX, 28, 101944. https://doi.org/https://doi.org/10.1016/j.softx.2024.101944</ref> || [[Molecular_descriptor|Molecular fingerprints]], API compatible with [[Scikit-learn]] models || {{cite web|title=scikit-fingerprints|url=https://github.com/scikit-fingerprints/scikit-fingerprints|access-date=29 December 2024}} |- | 17. || DTC Lab Tools || Multiple Linear Regression, Partial Least Squares, Applicability Domain, Validation, and others || {{cite web|title=DTCLab Tools|url=https://teqip.jdvu.ac.in/QSAR_Tools/|access-date=12 May 2025}} |- | 18. || DTC Lab Supplementary Tools || Quantitative Read-across, q-RASAR, ARKA, Regression and Classification-based ML tools, and others || {{cite web|title=DTCLab Supplementary Tools|url=https://sites.google.com/jadavpuruniversity.in/dtc-lab-software/home/|access-date=12 May 2025}} |}
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)