Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Calibration (statistics)
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==In classification== {{main article|Probabilistic classification}} Calibration in [[Statistical classification|classification]] means transforming classifier scores into [[Probabilistic classification|class membership probabilities]]. An overview of calibration methods for [[binary classification|two-class]] and [[multiclass classification|multi-class]] classification tasks is given by Gebel (2009).<ref name="Gebel2009">{{cite thesis |type=PhD thesis|first=Martin |last=Gebel |title=Multivariate calibration of classifier scores into the probability space |publisher=University of Dortmund |year=2009 |format=PDF |url=https://d-nb.info/99741989X/34}}</ref> A classifier might separate the classes well, but be poorly calibrated, meaning that the estimated class probabilities are far from the true class probabilities. In this case, a calibration step may help improve the estimated probabilities. A variety of metrics exist that are aimed to measure the extent to which a classifier produces well-calibrated probabilities. Foundational work includes the Expected Calibration Error (ECE).<ref>M.P. Naeini, G. Cooper, and M. Hauskrecht, Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, 2015.</ref> Into the 2020s, variants include the Adaptive Calibration Error (ACE) and the Test-based Calibration Error (TCE), which address limitations of the ECE metric that may arise when classifier scores concentrate on narrow subset of the [0,1] range.<ref>J. Nixon, M.W. Dusenberry, L. Zhang, G. Jerfel, & D. Tran. Measuring Calibration in Deep Learning. In: CVPR workshops (Vol. 2, No. 7), 2019.</ref><ref>T. Matsubara, N. Tax, R. Mudd, & I. Guy. TCE: A Test-Based Approach to Measuring Calibration Error. In: Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence (UAI), PMLR, 2023.</ref> A 2020s advancement in calibration assessment is the introduction of the Estimated Calibration Index (ECI).<ref>Famiglini, Lorenzo, Andrea Campagner, and Federico Cabitza. "Towards a Rigorous Calibration Assessment Framework: Advancements in Metrics, Methods, and Use." ECAI 2023. IOS Press, 2023. 645-652. Doi 10.3233/FAIA230327</ref> The ECI extends the concepts of the Expected Calibration Error (ECE) to provide a more nuanced measure of a model's calibration, particularly addressing overconfidence and underconfidence tendencies. Originally formulated for binary settings, the ECI has been adapted for multiclass settings, offering both local and global insights into model calibration. This framework aims to overcome some of the theoretical and interpretative limitations of existing calibration metrics. Through a series of experiments, Famiglini et al. demonstrate the framework's effectiveness in delivering a more accurate understanding of model calibration levels and discuss strategies for mitigating biases in calibration assessment. An online tool has been proposed to compute both ECE and ECI.<!--<ref>{{Cite web |title=Towards a Rigorous Calibration Assessment Framework |url=https://calibrationassessment.pythonanywhere.com/ |access-date=2024-03-25 |website=Towards a Rigorous Calibration Assessment Framework}}</ref>--><ref>{{Citation |last1=Famiglini |first1=Lorenzo |title=Towards a Rigorous Calibration Assessment Framework: Advancements in Metrics, Methods, and Use |date=2023 |work=ECAI 2023 |pages=645–652 |url=https://ebooks.iospress.nl/doi/10.3233/FAIA230327 |access-date=2024-03-25 |publisher=IOS Press |doi=10.3233/faia230327 |last2=Campagner |first2=Andrea |last3=Cabitza |first3=Federico|series=Frontiers in Artificial Intelligence and Applications |hdl=10281/456604 |isbn=978-1-64368-436-9 |hdl-access=free }}</ref> The following univariate calibration methods exist for transforming classifier scores into [[class membership probabilities]] in the two-class case: * Assignment value approach, see Garczarek (2002)<ref>U. M. Garczarek "[http://eldorado.uni-dortmund.de:8080/FB5/ls7/forschung/2002/Garczarek] {{Webarchive|url=https://web.archive.org/web/20041123190402/http://eldorado.uni-dortmund.de:8080/FB5/ls7/forschung/2002/Garczarek#|date=2004-11-23}}," Classification Rules in Standardized Partition Spaces, Dissertation, Universität Dortmund, 2002</ref> * Bayes approach, see Bennett (2002)<ref>P. N. Bennett, Using asymmetric distributions to improve text classifier probability estimates: A comparison of new and standard parametric methods, Technical Report CMU-CS-02-126, Carnegie Mellon, School of Computer Science, 2002.</ref> * [[Isotonic regression]], see Zadrozny and Elkan (2002)<ref>B. Zadrozny and C. Elkan, Transforming classifier scores into accurate multiclass probability estimates. In: Proceedings of the Eighth International Conference on Knowledge Discovery and Data Mining, 694–699, Edmonton, ACM Press, 2002.</ref> * [[Platt scaling]] (a form of [[logistic regression]]), see Lewis and Gale (1994)<ref>D. D. Lewis and W. A. Gale, A Sequential Algorithm for Training Text classifiers. In: W. B. Croft and C. J. van Rijsbergen (eds.), Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '94), 3–12. New York, Springer-Verlag, 1994.</ref> and Platt (1999)<ref>J. C. Platt, Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: A. J. Smola, P. Bartlett, B. Schölkopf and D. Schuurmans (eds.), Advances in Large Margin Classiers, 61–74. Cambridge, MIT Press, 1999.</ref> * Bayesian Binning into Quantiles (BBQ) calibration, see Naeini, Cooper, Hauskrecht (2015)<ref>Naeini MP, Cooper GF, Hauskrecht M. Obtaining Well Calibrated Probabilities Using Bayesian Binning. Proceedings of the . AAAI Conference on Artificial Intelligence AAAI Conference on Artificial Intelligence. 2015;2015:2901-2907.</ref> * Beta calibration, see Kull, Filho, [[Peter Flach|Flach]] (2017)<ref>Meelis Kull, Telmo Silva Filho, Peter Flach; Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, PMLR 54:623-631, 2017.</ref> ===In probability prediction and forecasting=== {{See also|Scoring rule}} In [[prediction]] and [[forecasting]], a [[Brier score]] is sometimes used to assess prediction accuracy of a set of predictions, specifically that the magnitude of the assigned probabilities track the relative frequency of the observed outcomes. [[Philip E. Tetlock]] employs the term "calibration" in this sense in his 2015 book ''[[Superforecasting]]''.<ref name="Edge-II"> {{cite web|url=https://www.edge.org/conversation/philip_tetlock-edge-master-class-2015-a-short-course-in-superforecasting-class-ii|title=Edge Master Class 2015: A Short Course in Superforecasting, Class II|author=<!--Staff writer(s); no by-line.--> |date=24 August 2015|website=edge.org |publisher=Edge Foundation |accessdate=13 April 2018 |quote=Calibration is when I say there's a 70 percent likelihood of something happening, things happen 70 percent of time. }}</ref> This differs from [[accuracy and precision]]. For example, as expressed by [[Daniel Kahneman]], "if you give all events that happen a probability of .6 and all the events that don't happen a probability of .4, your calibration is perfect but your discrimination is miserable".<ref name="Edge-II" /> In [[meteorology]], in particular, as concerns [[weather forecasting]], a related mode of assessment is known as [[forecast skill]].
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)