Editing Kahan summation algorithm

{{Short description|Algorithm in numerical analysis}}
In [[numerical analysis]], the '''Kahan summation algorithm''', also known as '''compensated summation''',<ref>Strictly, there exist other variants of compensated summation as well: see {{cite book |first=Nicholas |last=Higham |title=Accuracy and Stability of Numerical Algorithms (2 ed) |publisher=SIAM |year=2002 |pages=110–123 |isbn=978-0-89871-521-7}}</ref> significantly reduces the [[numerical error]] in the total obtained by adding a [[sequence]] of finite-[[decimal precision|precision]] [[floating-point number]]s, compared to the naive approach. This is done by keeping a separate ''running compensation'' (a variable to accumulate small errors), in effect extending the precision of the sum by the precision of the compensation variable.

In particular, simply summing <math>n</math> numbers in sequence has a worst-case error that grows proportional to <math>n</math>, and a [[root mean square]] error that grows as <math>\sqrt{n}</math> for random inputs (the roundoff errors form a [[random walk]]).<ref name=Higham93>{{Citation | title=The accuracy of floating point summation | first1=Nicholas J. | last1=Higham | journal=[[SIAM Journal on Scientific Computing]] | volume=14 | issue=4 | pages=783–799 | doi=10.1137/0914050 | year=1993 | bibcode=1993SJSC...14..783H | citeseerx=10.1.1.43.3535 | s2cid=14071038 }}.</ref>  With compensated summation, using a compensation variable with sufficiently high precision the worst-case error bound is effectively independent of <math>n</math>, so a large number of values can be summed with an error that only depends on the floating-point [[precision (arithmetic)|precision]] of the result.<ref name=Higham93/>

The [[algorithm]] is attributed to [[William Kahan]];<ref name="kahan65">{{Citation |last=Kahan |first=William |title=Further remarks on reducing truncation errors |date=January 1965 |url=http://mgnet.org/~douglas/Classes/na-sc/notes/kahan.pdf |journal=[[Communications of the ACM]] |volume=8 |issue=1 |page=40 |archive-url=https://web.archive.org/web/20180209003010/http://mgnet.org/~douglas/Classes/na-sc/notes/kahan.pdf |doi=10.1145/363707.363723 |s2cid=22584810 |archive-date=9 February 2018}}.</ref> [[Ivo Babuška]] seems to have come up with a similar algorithm independently (hence '''Kahan&ndash;Babuška summation''').<ref>Babuska, I.: Numerical stability in mathematical analysis. Inf. Proc. ˇ 68, 11–23 (1969)</ref> Similar, earlier techniques are, for example, [[Bresenham's line algorithm]], keeping track of the accumulated error in integer operations (although first documented around the same time<ref>{{cite journal |first=Jack E. |last=Bresenham |url=http://www.research.ibm.com/journal/sj/041/ibmsjIVRIC.pdf |title=Algorithm for computer control of a digital plotter |journal=IBM Systems Journal |volume=4 |issue=1 |date=January 1965 |pages=25–30 |doi=10.1147/sj.41.0025 |s2cid=41898371 }}</ref>) and the [[delta-sigma modulation]].<ref>{{cite journal |first1=H. |last1=Inose |first2=Y. |last2=Yasuda |first3=J. |last3=Murakami |title=A Telemetering System by Code Manipulation – ΔΣ Modulation |journal=IRE Transactions on Space Electronics and Telemetry |date=September 1962 |pages=204–209|volume=SET-8|  doi=10.1109/IRET-SET.1962.5008839 |s2cid=51647729 }}</ref>

==The algorithm==
In [[pseudocode]], the algorithm will be:

 '''function''' KahanSum(input)
     // Prepare the accumulator.
     '''var''' sum = 0.0
     // A running compensation for lost low-order bits.
     '''var''' c = 0.0
     // The array ''input'' has elements indexed input[1] to input[input.length].
     '''for''' i = 1 '''to''' input.length '''do'''
         // ''c'' is zero the first time around.
         '''var''' y = input[i] - c
         // Alas, ''sum'' is big, ''y'' small, so low-order digits of ''y'' are lost.         
         '''var''' t = sum + y
         // ''(t - sum)'' cancels the high-order part of ''y'';
         // subtracting ''y'' recovers negative (low part of ''y'')
         c = (t - sum) - y
         // Algebraically, ''c'' should always be zero. Beware
         // overly-aggressive optimizing compilers!
         sum = t
     // Next time around, the lost low part will be added to ''y'' in a fresh attempt.
     '''next''' i
 
     '''return''' sum

This algorithm can also be rewritten to use the [[2Sum|Fast2Sum]] algorithm:<ref>{{cite book |author-last1=Muller |author-first1=Jean-Michel |author-last2=Brunie |author-first2=Nicolas |author-last3=de Dinechin |author-first3=Florent |author-last4=Jeannerod |author-first4=Claude-Pierre |author-first5=Mioara |author-last5=Joldes |author-last6=Lefèvre |author-first6=Vincent |author-last7=Melquiond |author-first7=Guillaume |author-last8=Revol |author-first8=Nathalie|author8-link=Nathalie Revol |author-last9=Torres |author-first9=Serge |title=Handbook of Floating-Point Arithmetic |date=2018 |orig-year=2010 |publisher=[[Birkhäuser]] |edition=2 |isbn=978-3-319-76525-9 |lccn=2018935254 |doi=10.1007/978-3-319-76526-6 |url=https://books.google.com/books?id=h3ZZDwAAQBAJ |page=179}}</ref>

 '''function''' KahanSum2(input)
     // Prepare the accumulator.
     '''var''' sum = 0.0
     // A running compensation for lost low-order bits.
     '''var''' c = 0.0
     // The array ''input'' has elements indexed 
     '''for''' i = 1 '''to''' input.length '''do'''
         // ''c'' is zero the first time around.
         '''var''' y = input[i] + c
         // ''sum'' + ''c'' is an approximation to the exact sum.
         (sum,c) = Fast2Sum(sum,y)
     // Next time around, the lost low part will be added to ''y'' in a fresh attempt.
     '''next''' i
     '''return''' sum

===Worked example===
The algorithm does not mandate any specific choice of [[radix]], only for the arithmetic to "normalize floating-point sums before rounding or truncating".<ref name="kahan65" /> Computers typically use binary arithmetic, but to make the example easier to read, it will be given in decimal. Suppose we are using six-digit decimal [[floating-point arithmetic]], <code>sum</code> has attained the value 10000.0, and the next two values of <code>input[i]</code> are 3.14159 and 2.71828. The exact result is 10005.85987, which rounds to 10005.9. With a plain summation, each incoming value would be aligned with <code>sum</code>, and many low-order digits would be lost (by truncation or rounding). The first result, after rounding, would be 10003.1. The second result would be 10005.81828 before rounding and 10005.8 after rounding. This is not correct.

However, with compensated summation, we get the correctly rounded result of 10005.9.

Assume that <code>c</code> has the initial value zero. Trailing zeros shown where they are significant for the six-digit floating-point number.
   y = 3.14159 - 0.00000             ''y = input[i] - c''
   t = 10000.0 + 3.14159             ''t = sum + y''
     = 10003.14159                   Normalization done, next round off to six digits.
     = 10003.1                       Few digits from ''input[i]'' met those of ''sum''. Many digits have been lost!
   c = (10003.1 - 10000.0) - 3.14159 ''c = (t - sum) - y''  (Note: Parenthesis '''must''' be evaluated first!)
     = 3.10000 - 3.14159             The assimilated part of ''y'' minus the original full ''y''.
     = -0.0415900                    Because ''c'' is close to zero, normalization retains many digits after the floating point. 
 sum = 10003.1                       ''sum = t''

The sum is so large that only the high-order digits of the input numbers are being accumulated. But on the next step, <code>c</code>, an approximation of the running error, counteracts the problem.
   y = 2.71828 - (-0.0415900)        Most digits meet, since ''c'' is of a size similar to ''y''.
     = 2.75987                       The shortfall (low-order digits lost) of previous iteration successfully reinstated.
   t = 10003.1 + 2.75987             But still only few meet the digits of ''sum''.
     = 10005.85987                   Normalization done, next round to six digits.
     = 10005.9                       Again, many digits have been lost, but ''c'' helped nudge the round-off.
   c = (10005.9 - 10003.1) - 2.75987 Estimate the accumulated error, based on the adjusted ''y''.
     = 2.80000 - 2.75987             As expected, the low-order parts can be retained in ''c'' with no or minor round-off effects.
     = 0.0401300                     In this iteration, ''t'' was a bit too high, the excess will be subtracted off in next iteration.
 sum = 10005.9                       Exact result is 10005.85987, ''sum'' is correct, rounded to 6 digits.

The algorithm performs summation with two accumulators: <code>sum</code> holds the sum, and <code>c</code> accumulates the parts not assimilated into <code>sum</code>, to nudge the low-order part of <code>sum</code> the next time around. Thus the summation proceeds with "guard digits" in <code>c</code>, which is better than not having any, but is not as good as performing the calculations with double the precision of the input. However, simply increasing the precision of the calculations is not practical in general; if <code>input</code> is already in double precision, few systems supply [[quadruple precision]], and if they did, <code>input</code> could then be in quadruple precision.

==Accuracy==
{{Confusing|1=section|reason=this section uses ε, while there exist [[Machine epsilon#Variant definitions|two conflicting definitions]]. I suggest to use '''u''' instead, whose definition is rather standard|date=December 2021}}
A careful analysis of the errors in compensated summation is needed to appreciate its accuracy characteristics.  While it is more accurate than naive summation, it can still give large relative errors for ill-conditioned sums.

Suppose that one is summing <math>n</math> values <math>x_i</math>, for <math>i = 1, \, \ldots, \, n</math>.  The exact sum is
: <math>S_n = \sum_{i=1}^n x_i</math> (computed with infinite precision).
With compensated summation, one instead obtains <math>S_n + E_n</math>, where the error <math>E_n</math> is bounded by<ref name=Higham93/>
: <math>|E_n| \le \big[2\varepsilon + O(n\varepsilon^2)\big] \sum_{i=1}^n |x_i|,</math>
where <math>\varepsilon</math> is the [[machine precision]] of the arithmetic being employed (e.g. <math>\varepsilon \approx 10^{-16}</math> for IEEE standard [[double-precision]] floating point).  Usually, the quantity of interest is the [[relative error]] <math>|E_n|/|S_n|</math>, which is therefore bounded above by
: <math>\frac{|E_n|}{|S_n|} \le \big[2\varepsilon + O(n\varepsilon^2)\big] \frac{\sum\limits_{i=1}^n |x_i|}{\left|\sum\limits_{i=1}^n x_i\right|}.</math>

In the expression for the relative error bound, the fraction <math>\Sigma |x_i| / |\Sigma x_i|</math> is the [[condition number]] of the summation problem.  Essentially, the condition number represents the ''intrinsic'' sensitivity of the summation problem to errors, regardless of how it is computed.<ref>{{cite book |first1=Lloyd N. |authorlink1=Lloyd N. Trefethen |last1=Trefethen |first2=David |last2=Bau |title=Numerical Linear Algebra |publisher=SIAM |location=Philadelphia |year=1997 |isbn=978-0-89871-361-9}}</ref>  The relative error bound of ''every'' ([[backwards stable]]) summation method by a fixed algorithm in fixed precision (i.e. not those that use [[arbitrary-precision]] arithmetic, nor algorithms whose memory and time requirements change based on the data), is proportional to this condition number.<ref name=Higham93/>  An ''ill-conditioned'' summation problem is one in which this ratio is large, and in this case even compensated summation can have a large relative error.  For example, if the summands <math>x_i</math> are uncorrelated random numbers with zero mean, the sum is a [[random walk]], and the condition number will grow proportional to <math>\sqrt{n}</math>.  On the other hand, for random inputs with nonzero mean the condition number asymptotes to a finite constant as <math>n \to \infty</math>.  If the inputs are all [[non-negative]], then the condition number is 1.

Given a condition number, the relative error of compensated summation is effectively independent of <math>n</math>.  In principle, there is the <math>O (n \varepsilon^2)</math> that grows linearly with <math>n</math>, but in practice this term is effectively zero: since the final result is rounded to a precision <math>\varepsilon</math>, the <math>n \varepsilon^2</math> term rounds to zero, unless <math>n</math> is roughly <math>1 / \varepsilon</math> or larger.<ref name=Higham93/>  In double precision, this corresponds to an <math>n</math> of roughly <math>10^{16}</math>, much larger than most sums.  So, for a fixed condition number, the errors of compensated summation are effectively <math>O (\varepsilon)</math>, independent of <math>n</math>.

In comparison, the relative error bound for naive summation (simply adding the numbers in sequence, rounding at each step) grows as <math>O(\varepsilon n)</math> multiplied by the condition number.<ref name=Higham93/>  This worst-case error is rarely observed in practice, however, because it only occurs if the rounding errors are all in the same direction. In practice, it is much more likely that the rounding errors have a random sign, with zero mean, so that they form a random walk; in this case, naive summation has a [[root mean square]] relative error that grows as <math>O\left(\varepsilon \sqrt{n}\right)</math> multiplied by the condition number.<ref name=Tasche>Manfred Tasche and Hansmartin Zeuner, ''Handbook of Analytic-Computational Methods in Applied Mathematics'', Boca Raton, FL: CRC Press, 2000.</ref>  This is still much worse than compensated summation, however. However, if the sum can be performed in twice the precision, then <math>\varepsilon</math> is replaced by <math>\varepsilon^2</math>, and naive summation has a worst-case error comparable to the <math>O(n \varepsilon^2)</math> term in compensated summation at the original precision.

By the same token, the <math>\Sigma |x_i|</math> that appears in <math>E_n</math> above is a worst-case bound that occurs only if all the rounding errors have the same sign (and are of maximal possible magnitude).<ref name=Higham93/>  In practice, it is more likely that the errors have random sign, in which case terms in <math>\Sigma |x_i|</math> are replaced by a random walk, in which case, even for random inputs with zero mean, the error <math>E_n</math> grows only as <math>O\left(\varepsilon \sqrt{n}\right)</math> (ignoring the <math>n \varepsilon^2</math> term), the same rate the sum <math>S_n</math> grows, canceling the <math>\sqrt{n}</math> factors when the relative error is computed.  So, even for asymptotically ill-conditioned sums, the relative error for compensated summation can often be much smaller than a worst-case analysis might suggest.

==Further enhancements==
Neumaier<ref name=Neumaier74>{{cite journal |first=A. |last=Neumaier |doi=10.1002/zamm.19740540106 |bibcode=1974ZaMM...54...39N |title=Rundungsfehleranalyse einiger Verfahren zur Summation endlicher Summen |trans-title=Rounding Error Analysis of Some Methods for Summing Finite Sums |language=de |journal=[[Zeitschrift für Angewandte Mathematik und Mechanik]] |volume=54 |issue=1 |date=1974 |pages=39–51 |url=https://www.mat.univie.ac.at/~neum/scan/01.pdf}}</ref> introduced an improved version of Kahan algorithm, which he calls an "improved Kahan–Babuška algorithm", which also covers the case when the next term to be added is larger in absolute value than the running sum, effectively swapping the role of what is large and what is small. In [[pseudocode]], the algorithm is:
 
 '''function''' KahanBabushkaNeumaierSum(input)
     '''var''' sum = 0.0
     '''var''' c = 0.0                       // A running compensation for lost low-order bits.
 
     '''for''' i = 1 '''to''' input.length '''do'''
         '''var''' t = sum + input[i]
         '''if''' |sum| >= |input[i]| '''then'''
             c += (sum - t) + input[i] // If ''sum'' is bigger, low-order digits of ''input[i]'' are lost.
         '''else'''
             c += (input[i] - t) + sum // Else low-order digits of ''sum'' are lost.
         '''endif'''
         sum = t
     '''next''' i
 
     '''return''' sum + c                    // Correction only applied once in the very end.

This enhancement is similar to the Fast2Sum version of Kahan's algorithm with Fast2Sum replaced by [[2Sum]].

For many sequences of numbers, both algorithms agree, but a simple example due to Peters<ref name="python_fsum" /> shows how they can differ: summing <math>[1.0, +10^{100}, 1.0, -10^{100}]</math> in double precision, Kahan's algorithm yields 0.0, whereas Neumaier's algorithm yields the correct value 2.0.

Higher-order modifications of better accuracy are also possible. For example, a variant suggested by Klein,<ref>{{Cite journal |last=A. |first=Klein |date=2006 |title=A generalized Kahan–Babuška-Summation-Algorithm |journal=Computing |volume=76 |issue=3–4 |pages=279–293 |publisher=Springer-Verlag |doi=10.1007/s00607-005-0139-x |s2cid=4561254 }}</ref> which he called a second-order "iterative Kahan–Babuška algorithm". In [[pseudocode]], the algorithm is:

 '''function''' KahanBabushkaKleinSum(input)
     '''var''' sum = 0.0
     '''var''' cs  = 0.0
     '''var''' ccs = 0.0
 
     '''for''' i = 1 '''to''' input.length '''do'''
         '''var''' c, cc
         '''var''' t = sum + input[i]
         '''if''' |sum| >= |input[i]| '''then'''
             c = (sum - t) + input[i]
         '''else'''
             c = (input[i] - t) + sum
         '''endif'''
         sum = t
         t = cs + c
         '''if''' |cs| >= |c| '''then'''
             cc = (cs - t) + c
         '''else'''
             cc = (c - t) + cs
         '''endif'''
         cs = t
         ccs = ccs + cc
     '''end loop''' 
 
     '''return''' sum + (cs + ccs)

==Alternatives==
Although Kahan's algorithm achieves <math>O(1)</math> error growth for summing ''n'' numbers, only slightly worse <math>O(\log n)</math> growth can be achieved by [[pairwise summation]]: one [[recursively]] divides the set of numbers into two halves, sums each half, and then adds the two sums.<ref name=Higham93/>  This has the advantage of requiring the same number of arithmetic operations as the naive summation (unlike Kahan's algorithm, which requires four times the arithmetic and has a latency of four times a simple summation) and can be calculated in parallel.  The base case of the recursion could in principle be the sum of only one (or zero) numbers, but to amortize the overhead of recursion, one would normally use a larger base case.  The equivalent of pairwise summation is used in many [[fast Fourier transform]] (FFT) algorithms and is responsible for the logarithmic growth of roundoff errors in those FFTs.<ref>{{cite web |last1=Johnson |first1=S.G. |last2=Frigo |first2=M. |editor=C. Sidney Burns |title=Fast Fourier Transforms: Implementing FFTs in Practice |archive-url=https://web.archive.org/web/20081220074554/http://cnx.org/content/m16336/latest/ |archive-date=Dec 20, 2008 |url=http://cnx.org/content/m16336/latest/ |url-status=dead}}</ref> In practice, with roundoff errors of random signs, the root mean square errors of pairwise summation actually grow as <math>O\left(\sqrt{\log n}\right)</math>.<ref name=Tasche/>

Another alternative is to use [[arbitrary-precision arithmetic]], which in principle need no rounding at all with a cost of much greater computational effort.  A way of performing correctly rounded sums using arbitrary precision is to extend adaptively using multiple floating-point components. This will minimize computational cost in common cases where high precision is not needed.<ref>{{cite journal |last1=Richard Shewchuk |first1=Jonathan |title=Adaptive Precision Floating-Point Arithmetic and Fast Robust Geometric Predicates |journal=Discrete & Computational Geometry |date=October 1997 |volume=18 |issue=3 |pages=305–363 |doi=10.1007/PL00009321 |s2cid=189937041 |url=https://people.eecs.berkeley.edu/~jrs/papers/robustr.pdf}}</ref><ref name="python_fsum">{{cite web |last1=Hettinger |first1=R. |title=Improve accuracy of builtin sum() for float inputs · Issue #100425 · python/cpython |url=https://github.com/python/cpython/issues/100425 |website=GitHub - CPython v3.12 Added Features |access-date=7 October 2023 |language=en}}</ref>
Another method that uses only integer arithmetic, but a large accumulator, was described by Kirchner and [[Ulrich W. Kulisch|Kulisch]]; <ref>{{cite journal |last1=Kirchner |first1=R. |last2=Kulisch |first2=U. |title=Accurate arithmetic for vector processors |journal=Journal of Parallel and Distributed Computing |date=June 1988 |volume=5 |issue=3 |pages=250–270 |doi=10.1016/0743-7315(88)90020-2 |url=https://www.sciencedirect.com/science/article/abs/pii/0743731588900202|url-access=subscription }}</ref> a hardware implementation was described by Müller, Rüb and Rülling.<ref>{{cite conference |last1=Muller |first1=M. |last2=Rub |first2=C. |last3=Rulling |first3=W. |title=Exact accumulation of floating-point numbers |date=1991 |pages=64–69 |conference=Proceedings 10th IEEE Symposium on Computer Arithmetic |doi=10.1109/ARITH.1991.145535 |url=https://ieeexplore.ieee.org/document/145535|url-access=subscription }}</ref>

==Possible invalidation by compiler optimization==
In principle, a sufficiently aggressive [[Compiler optimization|optimizing compiler]] could destroy the effectiveness of Kahan summation: for example, if the compiler simplified expressions according to the [[associativity]] rules of real arithmetic, it might "simplify" the second step in the sequence
: <code>t = sum + y;</code>
: <code>c = (t - sum) - y;</code>
to
: <code>c = ((sum + y) - sum) - y;</code>
and then to
: <code>c = 0;</code>
thus eliminating the error compensation.<ref name=Goldberg91>{{Citation | title=What every computer scientist should know about floating-point arithmetic |first1=David | last1=Goldberg |journal=[[ACM Computing Surveys]] | volume=23 | issue=1 | pages=5–48 | date=March 1991 |doi=10.1145/103162.103163 |s2cid=222008826 |url=http://www.validlab.com/goldberg/paper.pdf }}.</ref>  In practice, many compilers do not use associativity rules (which are only approximate in floating-point arithmetic) in simplifications, unless explicitly directed to do so by compiler options enabling "unsafe" optimizations,<ref>[[GNU Compiler Collection]] manual, version 4.4.3: [https://gcc.gnu.org/onlinedocs/gcc-4.4.3/gcc/Optimize-Options.html 3.10 Options That Control Optimization], ''-fassociative-math'' (Jan. 21, 2010).</ref><ref>''[http://h21007.www2.hp.com/portal/download/files/unprot/Fortran/docs/unix-um/dfumperf.htm Compaq Fortran User Manual for Tru64 UNIX and Linux Alpha Systems] {{Webarchive|url=https://web.archive.org/web/20110607140000/http://h21007.www2.hp.com/portal/download/files/unprot/Fortran/docs/unix-um/dfumperf.htm |date=2011-06-07 }}'', section 5.9.7 Arithmetic Reordering Optimizations (retrieved March 2010).</ref><ref>Börje Lindh, [http://www.sun.com/blueprints/0302/optimize.pdf Application Performance Optimization], ''Sun BluePrints OnLine'' (March 2002).</ref><ref>Eric Fleegal, "[http://msdn.microsoft.com/en-us/library/aa289157%28VS.71%29.aspx Microsoft Visual C++ Floating-Point Optimization]", ''Microsoft Visual Studio Technical Articles''  (June 2004).</ref> although the [[Intel C++ Compiler]] is one example that allows associativity-based transformations by default.<ref>Martyn J. Corden, "[http://software.intel.com/en-us/articles/consistency-of-floating-point-results-using-the-intel-compiler/ Consistency of floating-point results using the Intel compiler]", ''Intel technical report'' (Sep. 18, 2009).</ref>  The original [[K&R C]] version of the [[C programming language]] allowed the compiler to re-order floating-point expressions according to real-arithmetic associativity rules, but the subsequent [[ANSI C]] standard prohibited re-ordering in order to make C better suited for numerical applications (and more similar to [[Fortran]], which also prohibits re-ordering),<ref>{{cite journal |first=Tom |last=MacDonald |title=C for Numerical Computing |journal=Journal of Supercomputing |volume=5 |issue=1 |pages=31–48 |year=1991 |doi=10.1007/BF00155856|s2cid=27876900 }}</ref> although in practice compiler options can re-enable re-ordering, as mentioned above.

A portable way to inhibit such optimizations locally is to break one of the lines in the original formulation into two statements, and make two of the intermediate products [[Volatile variable|volatile]]:
 '''function''' KahanSum(input)
     '''var''' sum = 0.0
     '''var''' c = 0.0
 
     '''for''' i = 1 '''to''' input.length '''do'''
         '''var''' y = input[i] - c
         '''volatile var''' t = sum + y
         '''volatile var''' z = t - sum
         c = z - y
         sum = t
     '''next''' i
 
     '''return''' sum

==Support by libraries==
In general, built-in "sum" functions in computer languages typically provide no guarantees that a particular summation algorithm will be employed, much less Kahan summation.{{Citation needed|date=February 2010}}  The [[BLAS]] standard for [[linear algebra]] subroutines explicitly avoids mandating any particular computational order of operations for performance reasons,<ref>[http://www.netlib.org/blas/blast-forum/ BLAS Technical Forum], section 2.7 (August 21, 2001), [https://web.archive.org/web/20040410160918/http://www.netlib.org/blas/blast-forum/chapter2.pdf#page=17 Archived on Wayback Machine].</ref> and BLAS implementations typically do not use Kahan summation.
 
The standard library of the [[Python (programming language)|Python]] computer language specifies an [https://docs.python.org/library/math.html#math.fsum fsum] function for accurate summation. Starting with Python 3.12, the built-in "sum()" function uses the Neumaier summation.<ref>[https://docs.python.org/3.12/whatsnew/3.12.html#other-language-changes What's New in Python 3.12].</ref>

In the [[Julia (programming language)|Julia]] language, the default implementation of the <code>sum</code> function does [[pairwise summation]] for high accuracy with good performance,<ref>[https://github.com/JuliaLang/julia/pull/4039 RFC: use pairwise summation for sum, cumsum, and cumprod], github.com/JuliaLang/julia pull request #4039 (August 2013).</ref> but an external library provides an implementation of Neumaier's variant named <code>sum_kbn</code> for the cases when higher accuracy is needed.<ref>[https://github.com/JuliaMath/KahanSummation.jl KahanSummation library] in Julia.</ref>

In the [[C Sharp (programming language)|C#]] language, [https://www.nuget.org/packages/HPCsharp HPCsharp nuget package] implements the Neumaier variant and [[pairwise summation]]: both as scalar, data-parallel using [[SIMD]] processor instructions, and parallel multi-core.<ref>[https://github.com/DragonSpit/HPCsharp HPCsharp nuget package of high performance algorithms].</ref>

==See also==
* [[Algorithms for calculating variance]], which includes stable summation

==References==
{{reflist|30em}}

==External links==
* [http://www.ddj.com/cpp/184403224 Floating-point Summation, Dr. Dobb's Journal September, 1996]

{{DEFAULTSORT:Kahan Summation Algorithm}}
[[Category:Computer arithmetic]]
[[Category:Floating point]]
[[Category:Numerical analysis]]
[[Category:Articles with example pseudocode]]