Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Entropy (information theory)
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Use in machine learning == [[Machine learning]] techniques arise largely from statistics and also information theory. In general, entropy is a measure of uncertainty and the objective of machine learning is to minimize uncertainty. [[Decision tree learning]] algorithms use relative entropy to determine the decision rules that govern the data at each node.<ref>{{Cite book|last1=Batra|first1=Mridula|last2=Agrawal|first2=Rashmi|title=Nature Inspired Computing|chapter=Comparative Analysis of Decision Tree Algorithms|date=2018|editor-last=Panigrahi|editor-first=Bijaya Ketan|editor2-last=Hoda|editor2-first=M. N.|editor3-last=Sharma|editor3-first=Vinod|editor4-last=Goel|editor4-first=Shivendra|chapter-url=https://link.springer.com/chapter/10.1007/978-981-10-6747-1_4|series=Advances in Intelligent Systems and Computing|volume=652|language=en|location=Singapore|publisher=Springer|pages=31β36|doi=10.1007/978-981-10-6747-1_4|isbn=978-981-10-6747-1|access-date=16 December 2021|archive-date=19 December 2022|archive-url=https://web.archive.org/web/20221219153239/https://link.springer.com/chapter/10.1007/978-981-10-6747-1_4|url-status=live}}</ref> The [[information gain in decision trees]] <math>IG(Y,X)</math>, which is equal to the difference between the entropy of <math>Y</math> and the conditional entropy of <math>Y</math> given <math>X</math>, quantifies the expected information, or the reduction in entropy, from additionally knowing the value of an attribute <math>X</math>. The information gain is used to identify which attributes of the dataset provide the most information and should be used to split the nodes of the tree optimally. [[Bayesian inference]] models often apply the [[principle of maximum entropy]] to obtain [[prior probability]] distributions.<ref>{{Cite journal|last=Jaynes|first=Edwin T.|date=September 1968|title=Prior Probabilities|url=https://ieeexplore.ieee.org/document/4082152|journal=IEEE Transactions on Systems Science and Cybernetics|volume=4|issue=3|pages=227β241|doi=10.1109/TSSC.1968.300117|issn=2168-2887|access-date=16 December 2021|archive-date=16 December 2021|archive-url=https://web.archive.org/web/20211216164659/https://ieeexplore.ieee.org/document/4082152|url-status=live}}</ref> The idea is that the distribution that best represents the current state of knowledge of a system is the one with the largest entropy, and is therefore suitable to be the prior. [[Classification in machine learning]] performed by [[logistic regression]] or [[artificial neural network]]s often employs a standard loss function, called [[cross-entropy]] loss, that minimizes the average cross entropy between ground truth and predicted distributions.<ref>{{Cite book|last1=Rubinstein|first1=Reuven Y.|url=https://books.google.com/books?id=8KgACAAAQBAJ&dq=machine+learning+cross+entropy+loss+introduction&pg=PA1|title=The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation and Machine Learning|last2=Kroese|first2=Dirk P.|date=2013-03-09|publisher=Springer Science & Business Media|isbn=978-1-4757-4321-0|language=en}}</ref> In general, cross entropy is a measure of the differences between two datasets similar to the KL divergence (also known as relative entropy).
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)