Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Empirical risk minimization
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
===Impossibility results=== It is also possible to show lower bounds on algorithm performance if no distributional assumptions are made.<ref>{{Cite journal |last1=Devroye |first1=Luc |last2=Györfi |first2=László |last3=Lugosi |first3=Gábor |date=1996 |title=A Probabilistic Theory of Pattern Recognition |url=https://link.springer.com/book/10.1007/978-1-4612-0711-5 |journal=Stochastic Modelling and Applied Probability |volume=31 |language=en |doi=10.1007/978-1-4612-0711-5 |isbn=978-1-4612-6877-2 |issn=0172-4568|url-access=subscription }}</ref> This is sometimes referred to as the ''[[No free lunch theorem]]''. Even though a specific learning algorithm may provide the asymptotically optimal performance for any distribution, the finite sample performance is always poor for at least one data distribution. This means that no classifier can improve on the error for a given sample size for all distributions.<ref name='patt'/> Specifically, let <math>\epsilon > 0</math> and consider a sample size <math>n</math> and classification rule <math>\phi_n</math>, there exists a distribution of <math>(X, Y)</math> with risk <math>L^* =0</math> (meaning that perfect prediction is possible) such that:<ref name='patt'/> <math display='block'>\mathbb E L_n \geq 1/2 - \epsilon.</math> It is further possible to show that the convergence rate of a learning algorithm is poor for some distributions. Specifically, given a sequence of decreasing positive numbers <math>a_i</math> converging to zero, it is possible to find a distribution such that: <math display='block> \mathbb E L_n \geq a_i</math> for all <math>n</math>. This result shows that universally good classification rules do not exist, in the sense that the rule must be low quality for at least one distribution.<ref name='patt'/>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)