Editing Floating-point arithmetic (section)

=== Addition and subtraction ===
A simple method to add floating-point numbers is to first represent them with the same exponent. In the example below, the second number (with the smaller exponent) is shifted right by three digits, and one then proceeds with the usual addition method:

   123456.7 = 1.234567 × 10^5
   101.7654 = 1.017654 × 10^2 = 0.001017654 × 10^5

   Hence:
   123456.7 + 101.7654 = (1.234567 × 10^5) + (1.017654 × 10^2)
                       = (1.234567 × 10^5) + (0.001017654 × 10^5)
                       = (1.234567 + 0.001017654) × 10^5
                       =  1.235584654 × 10^5

In detail:

   e=5;  s=1.234567     (123456.7)
 + e=2;  s=1.017654     (101.7654)

   e=5;  s=1.234567
 + e=5;  s=0.001017654  (after shifting)
 --------------------
   e=5;  s=1.235584654  (true sum: 123558.4654)

This is the true result, the exact sum of the operands. It will be rounded to seven digits and then normalized if necessary. The final result is
   e=5;  s=1.235585    (final sum: 123558.5)

The lowest three digits of the second operand (654) are essentially lost. This is [[round-off error]]. In extreme cases, the sum of two non-zero numbers may be equal to one of them:

   e=5;  s=1.234567
 + e=−3; s=9.876543

   e=5;  s=1.234567
 + e=5;  s=0.00000009876543 (after shifting)
 ----------------------
   e=5;  s=1.23456709876543 (true sum)
   e=5;  s=1.234567         (after rounding and normalization)

In the above conceptual examples it would appear that a large number of extra digits would need to be provided by the adder to ensure correct rounding; however, for binary addition or subtraction using careful implementation techniques only a ''guard'' bit, a ''rounding'' bit and one extra ''sticky'' bit need to be carried beyond the precision of the operands.<ref name="Goldberg_1991"/><ref name="Patterson-Hennessy_2014"/>{{rp|218–220}}

Another problem of loss of significance occurs when ''approximations'' to two nearly equal numbers are subtracted. In the following example ''e''&nbsp;=&nbsp;5; ''s''&nbsp;=&nbsp;1.234571 and ''e''&nbsp;=&nbsp;5; ''s''&nbsp;=&nbsp;1.234567 are approximations to the rationals 123457.1467 and 123456.659.

   e=5;  s=1.234571
 − e=5;  s=1.234567
 ----------------
   e=5;  s=0.000004
   e=−1; s=4.000000 (after rounding and normalization)

The floating-point difference is computed exactly because the numbers are close—the [[Sterbenz lemma]] guarantees this, even in case of underflow when [[gradual underflow]] is supported. Despite this, the difference of the original numbers is ''e''&nbsp;=&nbsp;−1; ''s''&nbsp;=&nbsp;4.877000, which differs more than 20% from the difference ''e''&nbsp;=&nbsp;−1; ''s''&nbsp;=&nbsp;4.000000 of the approximations. In extreme cases, all significant digits of precision can be lost.<ref name="Goldberg_1991"/><ref name="Sierra_1962"/> This ''[[Catastrophic cancellation|cancellation]]'' illustrates the danger in assuming that all of the digits of a computed result are meaningful. Dealing with the consequences of these errors is a topic in [[numerical analysis]]; see also [[#Accuracy problems|Accuracy problems]].