Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
No free lunch theorem
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Implications== {{Further|Problem of induction}} To illustrate one of the counter-intuitive implications of NFL, suppose we fix two supervised learning algorithms, C and D. We then sample a target function f to produce a set of input-output pairs, ''d''. The question is how should we choose whether to train C or D on ''d'', in order to make predictions for what output would be associated with a point lying outside of ''d.'' It is common in almost all of science and statistics to answer this question β to choose between C and D β by running cross-validation on ''d'' with those two algorithms. In other words, to decide whether to generalize from ''d'' with either C or D'','' we see which of them has better out-of-sample performance when tested within ''d''. Since C and D are fixed, this use of cross-validation to choose between them is itself an algorithm, i.e., a way of generalizing from an arbitrary dataset. Call this algorithm A. (Arguably, A is a simplified model of the scientific method itself.) We could also use ''anti''-cross-validation to make our choice. In other words, we could choose between C and D based on which has ''worse'' out-of-sample performance within ''d''. Again, since C and D are fixed, this use of anti-cross-validation is itself an algorithm. Call that algorithm B. NFL tells us (loosely speaking) that B must beat A on just as many target functions (and associated datasets ''d'') as A beats B. In this very specific sense, the scientific method will lose to the "anti" scientific method just as readily as it wins.<ref>{{Cite journal |last=Wolpert |first=David H. |date=December 2013 |title=Ubiquity symposium: Evolutionary computation and the processes of life: what the no free lunch theorems really mean: how to improve search algorithms |url=https://dl.acm.org/doi/10.1145/2555235.2555237 |journal=Ubiquity |language=en |volume=2013 |issue=December |pages=1β15 |doi=10.1145/2555235.2555237 |issn=1530-2180}}</ref> NFL only applies if the target function is chosen from a uniform distribution of all possible functions. If this is not the case, and certain target functions are more likely to be chosen than others, then A may perform better than B overall. The contribution of NFL is that it tells us that choosing an appropriate algorithm requires making assumptions about the kinds of target functions the algorithm is being used for. With no assumptions, no "meta-algorithm", such as the scientific method, performs better than random choice. While some scholars argue that NFL conveys important insight, others argue that NFL is of little relevance to machine learning research.<ref name=whitley/><ref name=carrier/><ref name=goldblum/> If [[Occam's razor]] is correct, for example if sequences of lower [[Kolmogorov complexity]] are more probable than sequences of higher complexity, then (as is observed in real life) some algorithms, such as cross-validation, perform better on average on practical problems (when compared with random choice or with anti-cross-validation).<ref>Lattimore, Tor, and Marcus Hutter. "[https://arxiv.org/abs/1111.3846 No free lunch versus Occamβs razor in supervised learning]." In Algorithmic Probability and Friends. Bayesian Prediction and Artificial Intelligence, pp. 223β235. Springer, Berlin, Heidelberg, 2013.</ref> However, there are major formal challenges in using arguments based on Kolmogorov complexity to establish properties of the real world, since it is uncomputable, and undefined up to an arbitrary additive constant. Partly in recognition of these challenges, it has recently been argued that there are ways to circumvent the no free lunch theorems without invoking Turing machines, by using "meta-induction".<ref name=SCHU19>{{cite book |last1=Schurz |first1=G. |year=2019 |title=Hume's Problem Solved: The Optimality of Meta-Induction | publisher=MIT Press}}</ref><ref name=WOLP23>{{cite journal |last1=Wolpert |first1=D. H. |year=2023 |title=The Implications of the No-Free-Lunch Theorems for Meta-induction |doi=10.1007/s10838-022-09609-2 | journal=Journal for General Philosophy of Science |volume=54 |pages=421β432 |url=https://arxiv.org/abs/2103.11956|arxiv=2103.11956 }}</ref> Moreover, the Kolmogorov complexity of machine learning models can be upper bounded through compressions of their data labeling, and it is possible to produce non-vacuous cross-domain generalization bounds via Kolmogorov complexity.<ref name=goldblum/>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)