Editing Pythagorean expectation (section)

==Theoretical explanation==
Initially the correlation between the formula and actual winning percentage was simply an experimental observation.  In 2003, Hein Hundal provided an inexact derivation of the formula and showed that the Pythagorean exponent was approximately 2/(''σ''{{radic|{{pi}}}}) where ''σ'' was the standard deviation of runs scored by all teams divided by the average number of runs scored.<ref>{{cite news|first=Hein|last=Hundal|title=Derivation of James Pythagorean Formula (Long)|url=http://groups.google.com/group/rec.puzzles/browse_thread/thread/3be0e6ad49631ddb/bfb52d16b12955ac?q=hein+hundal+pythagorean&fwc=1}}</ref>  In 2006, Professor [[Steven J. Miller]] provided a statistical derivation of the formula<ref name=Miller2007>{{cite journal |author1=Miller |journal=Chance | volume = 20 |year=2007 | pages = 40–48 |title=A Derivation of the Pythagorean Won-Loss Formula in Baseball |arxiv=math/0509698 |bibcode=2005math......9698M |doi=10.1080/09332480.2007.10722831|s2cid=8103486 }}</ref> under some assumptions about baseball games: if runs for each team follow a [[Weibull distribution]] and the runs scored and allowed per game are [[Independence (probability theory)|statistically independent]], then the formula gives the probability of winning.<ref name=Miller2007/>

More simply, the Pythagorean formula with exponent 2 follows immediately from two assumptions:  that baseball teams win in proportion to their "quality", and that their "quality" is measured by the ratio of their runs scored to their runs allowed.  For example, if Team A has scored 50 runs and allowed 40, its quality measure would be 50/40 or 1.25.  The quality measure for its (collective) opponent team B, in the games played against A, would be 40/50 (since runs scored by A are runs allowed by B, and vice versa), or 0.8.  If each team wins in proportion to its quality, A's probability of winning would be 1.25&nbsp;/&nbsp;(1.25&nbsp;+&nbsp;0.8), which equals 50<sup>2</sup>&nbsp;/&nbsp;(50<sup>2</sup>&nbsp;+&nbsp;40<sup>2</sup>), the Pythagorean formula.  The same relationship is true for any number of runs scored and allowed, as can be seen by writing the "quality" probability as [50/40] / [ 50/40&nbsp;+&nbsp;40/50], and [[clearing fractions]].

The assumption that one measure of the quality of a team is given by the ratio of its runs scored to allowed is both natural and plausible; this is the formula by which individual victories (games) are determined.  [There are other natural and plausible candidates for team quality measures, which, assuming a "quality" model, lead to corresponding winning percentage expectation formulas that are roughly as accurate as the Pythagorean ones.]  The assumption that baseball teams win in proportion to their quality is not natural, but is plausible.  It is not natural because the degree to which sports contestants win in proportion to their quality is dependent on the role that chance plays in the sport.  If chance plays a very large role, then even a team with much higher quality than its opponents will win only a little more often than it loses.  If chance plays very little role, then a team with only slightly higher quality than its opponents will win much more often than it loses.  The latter is more the case in basketball, for various reasons, including that many more points are scored than in baseball (giving the team with higher quality more opportunities to demonstrate that quality, with correspondingly fewer opportunities for chance or luck to allow the lower-quality team to win.)

Baseball has just the right amount of chance in it to enable teams to win roughly in proportion to their quality, i.e. to produce a roughly Pythagorean result with exponent two.  Basketball's higher exponent of around 14 (see below) is due to the smaller role that chance plays in basketball.  The fact that the most accurate (constant) Pythagorean exponent for baseball is around 1.83, slightly less than 2, can be explained by the fact that there is (apparently) slightly more chance in baseball than would allow teams to win in precise proportion to their quality.  Bill James realized this long ago when noting that an improvement in accuracy on his original Pythagorean formula with exponent two could be realized by simply adding some constant number to the numerator, and twice the constant to the denominator.  This moves the result slightly closer to .500, which is what a slightly larger role for chance would do, and what using the exponent of 1.83 (or any positive exponent less than two) does as well.  Various candidates for that constant can be tried to see what gives a "best fit" to real life data.

The fact that the most accurate exponent for baseball Pythagorean formulas is a variable that is dependent on the total runs per game is also explainable by the role of chance, since the more total runs scored, the less likely it is that the result will be due to chance, rather than to the higher quality of the winning team having been manifested during the scoring opportunities.  The larger the exponent, the farther away from a .500 winning percentage is the result of the corresponding Pythagorean formula, which is the same effect that a decreased role of chance creates.  The fact that accurate formulas for variable exponents yield larger exponents as the total runs per game increases is thus in agreement with an understanding of the role that chance plays in sports.

In his 1981 Baseball Abstract, James explicitly developed another of his formulas, called the log5 formula (which has since proven to be empirically accurate), using the notion of 2 teams having a face-to-face winning percentage against each other in proportion to a "quality" measure.  His quality measure was half the team's "wins ratio" (or "odds of winning").  The wins ratio or odds of winning is the ratio of the team's wins against the league to its losses against the league.  [James did not seem aware at the time that his quality measure was expressible in terms of the wins ratio.  Since in the quality model any constant factor in a quality measure eventually cancels, the quality measure is today better taken as simply the wins ratio itself, rather than half of it.]  He then stated that the Pythagorean formula, which he had earlier developed empirically, for predicting winning percentage from runs, was "the same thing" as the log5 formula, though without a convincing demonstration or proof.  His purported demonstration that they were the same boiled down to showing that the two different formulas simplified to the same expression in a special case, which is itself treated vaguely, and there is no recognition that the special case is not the general one.  Nor did he subsequently promulgate to the public any explicit, quality-based model for the Pythagorean formula.  As of 2013, there is still little public awareness in the sabermetric community that a simple "teams win in proportion to quality" model, using the runs ratio as the quality measure, leads directly to James's original Pythagorean formula.

In the 1981 Abstract, James also says that he had first tried to create a "log5" formula by simply using the winning percentages of the teams in place of the runs in the Pythagorean formula, but that it did not give valid results.  The reason, unknown to James at the time, is that his attempted formulation implies that the relative quality of teams is given by the ratio of their winning percentages.  Yet this cannot be true if teams win in proportion to their quality, since a .900 team wins against its opponents, whose overall winning percentage is roughly .500, in a 9 to 1 ratio, rather than the 9 to 5 ratio of their .900 to .500 winning percentages.  The empirical failure of his attempt led to his eventual, more circuitous (and ingenious) and successful approach to log5, which still used quality considerations, though without a full appreciation of the ultimate simplicity of the model and of its more general applicability and true structural similarity to his Pythagorean formula.