Editing No free lunch theorem (section)

==Implications==
{{Further|Problem of induction}}
To illustrate one of the counter-intuitive implications of NFL, suppose we fix two supervised learning algorithms, C and D. We then sample a target function f to produce a set of input-output pairs, ''d''. The question is how should we choose whether to train C or D on ''d'', in order to make predictions for what output would be associated with a point lying outside of ''d.''

It is common in almost all of science and statistics to answer this question – to choose between C and D – by running cross-validation on ''d'' with those two algorithms. In other words, to decide whether to generalize from ''d'' with either C or D'','' we see which of them has better out-of-sample performance when tested within ''d''.

Since C and D are fixed, this use of cross-validation to choose between them is itself an algorithm, i.e., a way of generalizing from an arbitrary dataset. Call this algorithm A. (Arguably, A is a simplified model of the scientific method itself.)

We could also use ''anti''-cross-validation to make our choice. In other words, we could choose between C and D based on which has ''worse'' out-of-sample performance within ''d''. Again, since C and D are fixed, this use of anti-cross-validation is itself an algorithm. Call that algorithm B.

NFL tells us (loosely speaking) that B must beat A on just as many target functions (and associated datasets ''d'') as A beats B. In this very specific sense, the scientific method will lose to the "anti" scientific method just as readily as it wins.<ref>{{Cite journal |last=Wolpert |first=David H. |date=December 2013 |title=Ubiquity symposium: Evolutionary computation and the processes of life: what the no free lunch theorems really mean: how to improve search algorithms |url=https://dl.acm.org/doi/10.1145/2555235.2555237 |journal=Ubiquity |language=en |volume=2013 |issue=December |pages=1–15 |doi=10.1145/2555235.2555237 |issn=1530-2180}}</ref>

NFL only applies if the target function is chosen from a uniform distribution of all possible functions. If this is not the case, and certain target functions are more likely to be chosen than others, then A may perform better than B overall. The contribution of NFL is that it tells us that choosing an appropriate algorithm requires making assumptions about the kinds of target functions the algorithm is being used for. With no assumptions, no "meta-algorithm", such as the scientific method, performs better than random choice.

While some scholars argue that NFL conveys important insight, others argue that NFL is of little relevance to machine learning research.<ref name=whitley/><ref name=carrier/><ref name=goldblum/> If [[Occam's razor]] is correct, for example if sequences of lower [[Kolmogorov complexity]] are more probable than sequences of higher complexity, then (as is observed in real life) some algorithms, such as cross-validation, perform better on average on practical problems (when compared with random choice or with anti-cross-validation).<ref>Lattimore, Tor, and Marcus Hutter. "[https://arxiv.org/abs/1111.3846 No free lunch versus Occam’s razor in supervised learning]." In Algorithmic Probability and Friends. Bayesian Prediction and Artificial Intelligence, pp. 223–235. Springer, Berlin, Heidelberg, 2013.</ref>

However, there are major formal challenges in using arguments based on Kolmogorov complexity to establish properties of
the real world, since it is uncomputable, and undefined up to an arbitrary additive constant. Partly in recognition of these challenges,
it has recently been argued that there are ways to circumvent the no free lunch theorems without invoking
Turing machines, by using "meta-induction".<ref name=SCHU19>{{cite book |last1=Schurz |first1=G. |year=2019 |title=Hume's Problem Solved: The Optimality of Meta-Induction |
publisher=MIT Press}}</ref><ref name=WOLP23>{{cite journal |last1=Wolpert |first1=D. H. |year=2023 |title=The Implications of the No-Free-Lunch Theorems for Meta-induction |doi=10.1007/s10838-022-09609-2 | journal=Journal for General Philosophy of Science |volume=54 |pages=421–432 |url=https://arxiv.org/abs/2103.11956|arxiv=2103.11956 }}</ref> Moreover, the Kolmogorov complexity of machine learning models can be upper bounded through compressions of their data labeling, and it is possible to produce non-vacuous cross-domain generalization bounds via Kolmogorov complexity.<ref name=goldblum/>