Editing Round-off error (section)

=== Addition ===

Machine addition consists of lining up the decimal points of the two numbers to be added, adding them, and then storing the result again as a floating-point number. The addition itself can be done in higher precision but the result must be rounded back to the specified precision, which may lead to roundoff error.<ref name="Forrester_2018"/>

* For example, adding <math>1</math> to <math>2^{-53}</math> in IEEE double precision as follows,{{Break}}<math>\begin{align}
1.00\ldots 0 \times 2^{0} + 1.00\ldots 0 \times 2^{-53} &= 1.\underbrace{00\ldots 0}_\text{52 bits} \times 2^{0} + 0.\underbrace{00\ldots 0}_\text{52 bits}1 \times 2^{0}\\
&= 1.\underbrace{00\ldots 0}_\text{52 bits}1\times 2^{0}.
\end{align}</math>{{Break}}This is saved as <math>1.\underbrace{00\ldots 0}_\text{52 bits}\times 2^{0}</math> since round-to-nearest is used in IEEE standard. Therefore, <math>1+2^{-53}</math> is equal to <math>1</math> in IEEE double precision and the roundoff error is <math>2^{-53}</math>.

This example shows that roundoff error can be introduced when adding a large number and a small number. The shifting of the decimal points in the significands to make the exponents match causes the loss of some of the less significant digits. The loss of precision may be described as '''absorption'''.<ref>{{cite book |last1=Biran |first1=Adrian B. |last2=Breiner |first2=Moshe |title=What Every Engineer Should Know About MATLAB and Simulink |date=2010 |publisher=[[CRC Press]] |publication-place=[[Boca Raton]], [[Florida]] |isbn=978-1-4398-1023-1 |pages=193–194 |chapter=5}}</ref>

Note that the addition of two floating-point numbers can produce roundoff error when their sum is an order of magnitude greater than that of the larger of the two.

* For example, consider a normalized floating-point number system with base <math>10</math> and precision <math>2</math>. Then <math>fl(62)=6.2 \times 10^{1}</math> and <math>fl(41) = 4.1 \times 10^{1}</math>. Note that <math>62+41=103</math> but <math>fl(103)=1.0 \times 10^{2}</math>. There is a roundoff error of <math>103-fl(103)=3</math>.

This kind of error can occur alongside an absorption error in a single operation.