Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Minimum description length
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Other systems== Rissanen's was not the first [[information theory|information-theoretic]] approach to learning; as early as 1968 Wallace and Boulton pioneered a related concept called [[minimum message length]] (MML). The difference between MDL and MML is a source of ongoing confusion. Superficially, the methods appear mostly equivalent, but there are some significant differences, especially in interpretation: * MML is a fully subjective Bayesian approach: it starts from the idea that one represents one's beliefs about the data-generating process in the form of a prior distribution. MDL avoids assumptions about the data-generating process. * Both methods make use of ''two-part codes'': the first part always represents the information that one is trying to learn, such as the index of a model class ([[model selection]]) or parameter values ([[parameter estimation]]); the second part is an encoding of the data given the information in the first part. The difference between the methods is that, in the MDL literature, it is advocated that unwanted parameters should be moved to the second part of the code, where they can be represented with the data by using a so-called [[one-part code]], which is often more efficient than a ''two-part code''. In the original description of MML, all parameters are encoded in the first part, so all parameters are learned. * Within the MML framework, each parameter is stated to exactly the precision which results in the optimal overall message length: the preceding example might arise if some parameter was originally considered "possibly useful" to a model but was subsequently found to be unable to help to explain the data (such a parameter will be assigned a code length corresponding to the (Bayesian) prior probability that the parameter would be found to be unhelpful). In the MDL framework, the focus is more on comparing model classes than models, and it is more natural to approach the same question by comparing the class of models that explicitly include such a parameter against some other class that doesn't. The difference lies in the machinery applied to reach the same conclusion.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)