Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Floating-point arithmetic
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== Addition and subtraction === A simple method to add floating-point numbers is to first represent them with the same exponent. In the example below, the second number (with the smaller exponent) is shifted right by three digits, and one then proceeds with the usual addition method: 123456.7 = 1.234567 Γ 10^5 101.7654 = 1.017654 Γ 10^2 = 0.001017654 Γ 10^5 Hence: 123456.7 + 101.7654 = (1.234567 Γ 10^5) + (1.017654 Γ 10^2) = (1.234567 Γ 10^5) + (0.001017654 Γ 10^5) = (1.234567 + 0.001017654) Γ 10^5 = 1.235584654 Γ 10^5 In detail: e=5; s=1.234567 (123456.7) + e=2; s=1.017654 (101.7654) e=5; s=1.234567 + e=5; s=0.001017654 (after shifting) -------------------- e=5; s=1.235584654 (true sum: 123558.4654) This is the true result, the exact sum of the operands. It will be rounded to seven digits and then normalized if necessary. The final result is e=5; s=1.235585 (final sum: 123558.5) The lowest three digits of the second operand (654) are essentially lost. This is [[round-off error]]. In extreme cases, the sum of two non-zero numbers may be equal to one of them: e=5; s=1.234567 + e=β3; s=9.876543 e=5; s=1.234567 + e=5; s=0.00000009876543 (after shifting) ---------------------- e=5; s=1.23456709876543 (true sum) e=5; s=1.234567 (after rounding and normalization) In the above conceptual examples it would appear that a large number of extra digits would need to be provided by the adder to ensure correct rounding; however, for binary addition or subtraction using careful implementation techniques only a ''guard'' bit, a ''rounding'' bit and one extra ''sticky'' bit need to be carried beyond the precision of the operands.<ref name="Goldberg_1991"/><ref name="Patterson-Hennessy_2014"/>{{rp|218β220}} Another problem of loss of significance occurs when ''approximations'' to two nearly equal numbers are subtracted. In the following example ''e'' = 5; ''s'' = 1.234571 and ''e'' = 5; ''s'' = 1.234567 are approximations to the rationals 123457.1467 and 123456.659. e=5; s=1.234571 β e=5; s=1.234567 ---------------- e=5; s=0.000004 e=β1; s=4.000000 (after rounding and normalization) The floating-point difference is computed exactly because the numbers are closeβthe [[Sterbenz lemma]] guarantees this, even in case of underflow when [[gradual underflow]] is supported. Despite this, the difference of the original numbers is ''e'' = β1; ''s'' = 4.877000, which differs more than 20% from the difference ''e'' = β1; ''s'' = 4.000000 of the approximations. In extreme cases, all significant digits of precision can be lost.<ref name="Goldberg_1991"/><ref name="Sierra_1962"/> This ''[[Catastrophic cancellation|cancellation]]'' illustrates the danger in assuming that all of the digits of a computed result are meaningful. Dealing with the consequences of these errors is a topic in [[numerical analysis]]; see also [[#Accuracy problems|Accuracy problems]].
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)