Editing Pythagorean expectation (section)

==Empirical origin==
Empirically, this formula correlates fairly well with how baseball teams actually perform. However, statisticians since the invention of this formula found it to have a fairly routine error, generally about three games off. 

For example, the [[2002 New York Yankees season|2002 New York Yankees]] scored 897 runs and allowed 697 runs: according to James' original formula, the Yankees should have finished with a win percentage of .624.

:<math>\text{Win} = \frac{897^2}{897^2 + 697^2} = 0.624</math>

Based on a 162-game season, the 2002 Yankees should have finished 101-61: they actually finished 103–58.<ref>{{cite web|url=https://www.baseball-reference.com/teams/NYY/2002.shtml|title=2002 New York Yankees|work=Baseball-Reference.com|access-date=7 May 2016}}</ref>

In efforts to fix this routine error, statisticians have performed numerous searches to find the ideal exponent.

If using a single-number exponent, 1.83 is the most accurate, and is the one used by baseball-reference.com.<ref>{{cite web|url=https://www.sports-reference.com/blog/baseball-reference-faqs/|title=Frequently Asked Questions|work=Baseball-Reference.com|access-date=7 May 2016}}</ref> The updated formula therefore reads as follows:

:<math>\text{Win} = \frac{\text{runs scored}^{1.83}}{\text{runs scored}^{1.83} + \text{runs allowed}^{1.83}} = \frac{1}{1+(\text{runs allowed}/\text{runs scored})^{1.83}}</math>

The most widely known is the Pythagenport formula<ref name="baseballprospectus">{{cite web|url=http://www.baseballprospectus.com/article.php?articleid=342|title=Baseball Prospectus – Revisiting the Pythagorean Theorem|work=Baseball Prospectus|date=30 June 1999 |access-date=7 May 2016}}</ref> developed by [[Clay Davenport]] of [[Baseball Prospectus]]:
:<math>\mathrm{Exponent} = 1.50 \log\left(\frac{\text{runs scored} + \text{runs allowed}}{\text{games}}\right) +0.45</math>

He concluded that the exponent should be calculated from a given team based on the team's runs scored, runs allowed, and games. By not reducing the exponent to a single number for teams in any season, Davenport was able to report a 3.991 root-mean-square error as opposed to a 4.126 root-mean-square error for an exponent of 2.<ref name="baseballprospectus" />

Less well known but equally (if not more) effective is the {{visible anchor|Pythagenpat}} formula, developed by David Smyth.<ref>{{cite web|url=http://gosu02.tripod.com/id69.html|title=W% Estimators|access-date=7 May 2016}}</ref>
:<math>\text{Exponent} = \left(\frac{\text{runs scored} + \text{runs allowed}}{\text{games}}\right)^{0.287} </math>

Davenport expressed his support for this formula, saying: <blockquote>
After further review, I (Clay) have come to the conclusion that the so-called Smyth/Patriot method, aka Pythagenpat, is a better fit. In that, ''X''&nbsp;=&nbsp;((''rs''&nbsp;+&nbsp;''ra'')/''g'')<sup>0.287</sup>, although there is some wiggle room for disagreement in the exponent. Anyway, that equation is simpler, more elegant, and gets the better answer over a wider range of runs scored than Pythagenport, including the mandatory value of 1 at 1&nbsp;rpg.<ref>{{cite web|url=http://baseballprospectus.com/glossary/index.php?mode=viewstat&stat=136|title=Baseball Prospectus – Glossary|access-date=7 May 2016}}</ref>
</blockquote>

These formulas are only necessary when dealing with extreme situations in which the average number of runs scored per game is either very high or very low. For most situations, simply squaring each variable yields accurate results.

There are some systematic statistical deviations between actual winning percentage and expected winning percentage, which include [[bullpen]] quality and luck. In addition, the formula tends to [[Regression toward the mean|regress toward the mean]], as teams that win a lot of games tend to be underrepresented by the formula (meaning they "should" have won fewer games), and teams that lose a lot of games tend to be overrepresented (they "should" have won more).

A notable example is the [[2016 Texas Rangers season|2016 Texas Rangers]], who beat their predicted record by 13 games, finishing 95-67 while having an expected win–loss record of 82-80.