Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Chemometrics
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Techniques== ===Multivariate calibration=== Many chemical problems and applications of chemometrics involve [[calibration]]. The objective is to develop models which can be used to predict properties of interest based on measured properties of the chemical system, such as pressure, flow, temperature, [[infrared spectroscopy|infrared]], [[Raman spectroscopy|Raman]],<ref>{{Cite journal |last1=Barton |first1=Bastian |last2=Thomson |first2=James |last3=Lozano Diz |first3=Enrique |last4=Portela |first4=Raquel |date=September 2022 |title=Chemometrics for Raman Spectroscopy Harmonization |url=http://journals.sagepub.com/doi/10.1177/00037028221094070 |journal=Applied Spectroscopy |language=en |volume=76 |issue=9 |pages=1021–1041 |doi=10.1177/00037028221094070 |pmid=35622984 |bibcode=2022ApSpe..76.1021B |s2cid=249129065 |issn=0003-7028|url-access=subscription }}</ref> [[NMR|NMR spectra]] and [[mass spectrometry|mass spectra]]. Examples include the development of multivariate models relating 1) multi-wavelength spectral response to analyte concentration, 2) molecular descriptors to biological activity, 3) multivariate process conditions/states to final product attributes. The process requires a calibration or training data set, which includes reference values for the properties of interest for prediction, and the measured attributes believed to correspond to these properties. For case 1), for example, one can assemble data from a number of samples, including concentrations for an analyte of interest for each sample (the reference) and the corresponding infrared spectrum of that sample. Multivariate calibration techniques such as partial-least squares regression, or principal component regression (and near countless other methods) are then used to construct a mathematical model that relates the multivariate response (spectrum) to the concentration of the analyte of interest, and such a model can be used to efficiently predict the concentrations of new samples. Techniques in multivariate calibration are often broadly categorized as classical or inverse methods.<ref name="Martens1989" /><ref name="Franke2002">{{cite book |first=J. |last=Franke |editor1-first=John M |editor1-last=Chalmers |chapter=Inverse Least Squares and Classical Least Squares Methods for Quantitative Vibrational Spectroscopy |title=Handbook of Vibrational Spectroscopy |publisher=Wiley |year=2002 |location=New York |isbn=978-0471988472 |doi=10.1002/0470027320.s4603 }}</ref> The principal difference between these approaches is that in classical calibration the models are solved such that they are optimal in describing the measured analytical responses (e.g., spectra) and can therefore be considered optimal descriptors, whereas in inverse methods the models are solved to be optimal in predicting the properties of interest (e.g., concentrations, optimal predictors).<ref name="Brown2004">{{cite journal |first=C. D. |last=Brown |title=Discordance between Net Analyte Signal Theory and Practical Multivariate Calibration |journal=Analytical Chemistry |volume=76 |year=2004 |issue=15 |pages=4364–4373 |doi=10.1021/ac049953w |pmid=15283574}}</ref> Inverse methods usually require less physical knowledge of the chemical system, and at least in theory provide superior predictions in the mean-squared error sense,<ref name="krutchkoff1969">{{cite journal |first=R. G. |last=Krutchkoff |title=Classical and inverse regression methods of calibration in extrapolation |journal=Technometrics |volume=11 |issue=3 |year=1969 |pages=11–15 |doi=10.1080/00401706.1969.10490714 }}</ref><ref name="Kowalski1984">{{cite book |first=W. G. |last=Hunter |chapter=Statistics and chemistry, and the linear calibration problem |title=Chemometrics: mathematics and statistics in chemistry |editor-first=B. R. |editor-last=Kowalski |publisher=Riedel |location=Boston |year=1984 |isbn=978-9027718464 }}</ref><ref name="Tellinghuisen2000">{{cite journal |first=J. |last=Tellinghuisen |title=Inverse vs. classical calibration for small data sets |journal=Fresenius' J. Anal. Chem. |volume=368 |year=2000 |issue=6 |pages=585–588 |doi=10.1007/s002160000556 |pmid=11228707 |s2cid=21166415 }}</ref> and hence inverse approaches tend to be more frequently applied in contemporary multivariate calibration. The main advantages of the use of multivariate calibration techniques is that fast, cheap, or non-destructive analytical measurements (such as optical spectroscopy) can be used to estimate sample properties which would otherwise require time-consuming, expensive or destructive testing (such as [[Liquid chromatography–mass spectrometry|LC-MS]]). Equally important is that multivariate calibration allows for accurate quantitative analysis in the presence of heavy interference by other analytes. The selectivity of the analytical method is provided as much by the mathematical calibration, as the analytical measurement modalities. For example, near-infrared spectra, which are extremely broad and non-selective compared to other analytical techniques (such as infrared or Raman spectra), can often be used successfully in conjunction with carefully developed multivariate calibration methods to predict concentrations of analytes in very complex matrices. ===Classification, pattern recognition, clustering=== Supervised multivariate classification techniques are closely related to multivariate calibration techniques in that a calibration or training set is used to develop a mathematical model capable of classifying future samples. The techniques employed in chemometrics are similar to those used in other fields – multivariate discriminant analysis, logistic regression, neural networks, regression/classification trees. The use of rank reduction techniques in conjunction with these conventional classification methods is routine in chemometrics, for example discriminant analysis on [[principal component analysis|principal components]] or [[partial least squares regression|partial least squares]] scores. A family of techniques, referred to as class-modelling or [[One-class classification|one-class classifiers]], are able to build models for an individual class of interest.<ref>{{Cite journal|last=Oliveri|first=Paolo|date=2017|title=Class-modelling in food analytical chemistry: Development, sampling, optimisation and validation issues – A tutorial|journal=Analytica Chimica Acta|language=en|volume=982|pages=9–19|doi=10.1016/j.aca.2017.05.013|pmid=28734370|bibcode=2017AcAC..982....9O |hdl=11567/881059 |s2cid=10119515 |hdl-access=free}}</ref> Such methods are particularly useful in the case of quality control and authenticity verification of products. Unsupervised classification (also termed [[cluster analysis]]) is also commonly used to discover patterns in complex data sets, and again many of the core techniques used in chemometrics are common to other fields such as machine learning and statistical learning. ===Multivariate curve resolution=== In chemometric parlance, multivariate curve resolution seeks to deconstruct data sets with limited or absent reference information and system knowledge. Some of the earliest work on these techniques was done by Lawton and Sylvestre in the early 1970s.<ref>{{cite journal |last1=Lawton |first1=W. H. |last2=Sylvestre |first2=E. A. |year=1971 |title=Self Modeling Curve Resolution |journal=Technometrics |volume=13 |issue=3 |pages=617–633 |doi=10.1080/00401706.1971.10488823 }}</ref><ref>{{cite journal |last1=Sylvestre |first1=E. A. |last2=Lawton |first2=W. H. |last3=Maggio |first3=M. S. |year=1974 |title=Curve Resolution Using a Postulated Chemical Reaction |journal=Technometrics |volume=16 |issue=3 |pages=353–368 |doi=10.1080/00401706.1974.10489204 }}</ref> These approaches are also called self-modeling mixture analysis, [[blind signal separation|blind source/signal separation]], and spectral unmixing. For example, from a data set comprising fluorescence spectra from a series of samples each containing multiple fluorophores, multivariate curve resolution methods can be used to extract the fluorescence spectra of the individual fluorophores, along with their relative concentrations in each of the samples, essentially unmixing the total fluorescence spectrum into the contributions from the individual components. The problem is usually ill-determined due to rotational ambiguity (many possible solutions can equivalently represent the measured data), so the application of additional constraints is common, such as non-negativity, unimodality, or known interrelationships between the individual components (e.g., kinetic or mass-balance constraints).<ref>{{cite journal |last1=de Juan |first1=A. |last2=Tauler |first2=R. |year=2003 |title=Chemometrics Applied to Unravel Multicomponent Processes and Mixtures. Revisiting Latest Trends in Multivariate Resolution |journal=Analytica Chimica Acta |volume=500 |issue=1–2 |pages=195–210 |doi=10.1016/S0003-2670(03)00724-4 |bibcode=2003AcAC..500..195D }}</ref><ref>{{cite journal |last1=de Juan |first1=A. |last2=Tauler |first2=R. |year=2006 |title=Multivariate Curve Resolution (MCR) from 2000: Progress in Concepts and Applications |journal=Critical Reviews in Analytical Chemistry |volume=36 |issue=3–4 |pages=163–176 |doi=10.1080/10408340600970005 |s2cid=95309963 }}</ref> ===Other techniques=== '''[[design of experiments|Experimental design]]''' remains a core area of study in chemometrics and several monographs are specifically devoted to experimental design in chemical applications.<ref name="Deming1987">{{cite book |first1=S. N. |last1=Deming |first2=S. L. |last2=Morgan |title=Experimental design: a chemometric approach |publisher=Elsevier |year=1987 |isbn=978-0444427342 }}</ref><ref name="Bruns2006">{{cite book |first1=R. E. |last1=Bruns |first2=I. S. |last2=Scarminio |first3=B. |last3=de Barros Neto |title=Statistical design – chemometrics |publisher=Elsevier |location=Amsterdam |year=2006 |isbn=978-0444521811 }}</ref> Sound principles of experimental design have been widely adopted within the chemometrics community, although many complex experiments are purely observational, and there can be little control over the properties and interrelationships of the samples and sample properties. '''[[Signal processing]]''' is also a critical component of almost all chemometric applications, particularly the use of signal pretreatments to condition data prior to calibration or classification. The techniques employed commonly in chemometrics are often closely related to those used in related fields.<ref name="Wentzell2000">{{cite book |first1=P. D. |last1=Wentzell |first2=C. D. |last2=Brown |chapter=Signal Processing in Analytical Chemistry |title=Encyclopedia of Analytical Chemistry |editor-first=R. A. |editor-last=Meyers |publisher=Wiley |year=2000 |pages=9764–9800 }}</ref> Signal pre-processing may affect the way in which outcomes of the final data processing can be interpreted.<ref>{{Cite journal|last1=Oliveri|first1=Paolo|last2=Malegori|first2=Cristina|last3=Simonetti|first3=Remo|last4=Casale|first4=Monica|date=2019|title=The impact of signal pre-processing on the final interpretation of analytical outcomes – A tutorial|journal=Analytica Chimica Acta|language=en|volume=1058|pages=9–17|doi=10.1016/j.aca.2018.10.055|pmid=30851858|bibcode=2019AcAC.1058....9O |s2cid=73727614 }}</ref> '''Performance characterization, and figures of merit''' Like most arenas in the physical sciences, chemometrics is quantitatively oriented, so considerable emphasis is placed on performance characterization, model selection, verification & validation, and [[figure of merit|figures of merit]]. The performance of quantitative models is usually specified by [[root mean squared error]] in predicting the attribute of interest, and the performance of classifiers as a true-positive rate/false-positive rate pairs (or a full ROC curve). A recent report by Olivieri et al. provides a comprehensive overview of figures of merit and uncertainty estimation in multivariate calibration, including multivariate definitions of selectivity, sensitivity, SNR and prediction interval estimation.<ref>{{cite journal |first1=A. C. |last1=Olivieri |first2=N. M. |last2=Faber |first3=J. |last3=Ferre |first4=R. |last4=Boque |first5=J. H. |last5=Kalivas |first6=H. |last6=Mark |title=Guidelines for calibration in analytical chemistry Part 3. Uncertainty estimation and figures of merit for multivariate calibration |journal=Pure and Applied Chemistry |volume=78 |year=2006 |issue=3 |pages=633–650 |doi=10.1351/pac200678030633 |s2cid=50546210 |url=https://zenodo.org/record/894416 |doi-access=free }}</ref> Chemometric model selection usually involves the use of tools such as [[resampling (statistics)|resampling]] (including bootstrap, permutation, cross-validation). '''Multivariate [[statistical process control]] (MSPC)''', modeling and optimization accounts for a substantial amount of historical chemometric development.<ref>{{cite journal |first1=D. L. |last1=Illman |first2=J. B. |last2=Callis |first3=B. R. |last3=Kowalski |title=Process Analytical Chemistry: a new paradigm for analytical chemists |journal=American Laboratory |volume=18 |year=1986 |pages=8–10 }}</ref><ref>{{cite journal |first1=J. F. |last1=MacGregor |first2=T. |last2=Kourti |title=Statistical control of multivariate processes |journal=Control Engineering Practice |volume=3 |year=1995 |issue=3 |pages=403–414 |doi=10.1016/0967-0661(95)00014-L }}</ref><ref>{{cite journal |first1=E. B. |last1=Martin |first2=A. J. |last2=Morris |title=An overview of multivariate statistical process control in continuous and batch process performance monitoring |journal=Transactions of the Institute of Measurement & Control |volume=18 |year=1996 |issue=1 |pages=51–60 |doi=10.1177/014233129601800107 |bibcode=1996TIMC...18...51M |s2cid=120516715 }}</ref> Spectroscopy has been used successfully for online monitoring of manufacturing processes for 30–40 years, and this process data is highly amenable to chemometric modeling. Specifically in terms of MSPC, multiway modeling of batch and continuous processes is increasingly common in industry and remains an active area of research in chemometrics and chemical engineering. Process analytical chemistry as it was originally termed,<ref>{{cite journal |first1=T. |last1=Hirschfeld |first2=J. B. |last2=Callis |first3=B. R. |last3=Kowalski |title=Chemical sensing in process analysis |journal=[[Science (journal)|Science]] |volume=226 |year=1984 |issue=4672 |pages=312–318 |doi=10.1126/science.226.4672.312 |pmid=17749872 |bibcode=1984Sci...226..312H |s2cid=38093353 }}</ref> or the newer term [[process analytical technology]] continues to draw heavily on chemometric methods and MSPC. '''Multiway methods''' are heavily used in chemometric applications.<ref>{{cite book |first1=A. K. |last1=Smilde |first2=R. |last2=Bro |first3=P. |last3=Geladi |title=Multi-way analysis with applications in the chemical sciences |publisher=Wiley |year=2004 }}</ref><ref>{{cite journal |first1=R. |last1=Bro |first2=J. J. |last2=Workman |first3=P. R. |last3=Mobley |first4=B. R. |last4=Kowalski |title=Overview of chemometrics applied to spectroscopy: 1985–95, Part 3—Multiway analysis |journal=Applied Spectroscopy Reviews |volume=32 |year=1997 |issue=3 |pages=237–261 |doi=10.1080/05704929708003315 |bibcode=1997ApSRv..32..237B }}</ref> These are higher-order extensions of more widely used methods. For example, while the analysis of a table (matrix, or second-order array) of data is routine in several fields, multiway methods are applied to data sets that involve 3rd, 4th, or higher-orders. Data of this type is very common in chemistry, for example a liquid-chromatography / mass spectrometry (LC-MS) system generates a large matrix of data (elution time versus m/z) for each sample analyzed. The data across multiple samples thus comprises a [[data cube]]. Batch process modeling involves data sets that have time vs. process variables vs. batch number. The multiway mathematical methods applied to these sorts of problems include [[PARAFAC]], trilinear decomposition, and multiway PLS and PCA. === Chemometrics and Food Science === Over the past decade, multivariate statistical techniques originally developed for analytical chemistry have become widely utilized in food science and technology. Chemometrics is particularly valuable when dealing with large and complex datasets, encompassing diverse sample types, numbers, and responses. These techniques support the authentication of geographical origins, farming systems, and the detection of adulteration in high-value food commodities. <ref name=":0">{{Cite journal |last=Granato |first=Daniel |last2=Putnik |first2=Predrag |last3=Kovačević |first3=Danijela Bursać |last4=Santos |first4=Jânio Sousa |last5=Calado |first5=Verônica |last6=Rocha |first6=Ramon Silva |last7=Cruz |first7=Adriano Gomes Da |last8=Jarvis |first8=Basil |last9=Rodionova |first9=Oxana Ye |last10=Pomerantsev |first10=Alexey |date=2018 |title=Trends in Chemometrics: Food Authentication, Microbiology, and Effects of Processing |url=https://ift.onlinelibrary.wiley.com/doi/10.1111/1541-4337.12341 |journal=Comprehensive Reviews in Food Science and Food Safety |language=en |volume=17 |issue=3 |pages=663–677 |doi=10.1111/1541-4337.12341 |issn=1541-4337|url-access=subscription }}</ref> Overall, chemometrics is an essential tool for addressing complex, multifactorial challenges in food science through a holistic approach. Governmental bodies and industries responsible for food quality monitoring, raw material assessment, and process optimization should incorporate chemometrics, particularly when working with high-dimensional data. To assist users in selecting the most suitable tools, this article outlines practical applications and evaluates the advantages and limitations of commonly used chemometric methods. <ref name=":0" />
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)