Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Floating-point arithmetic
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== Rounding modes === Rounding is used when the exact result of a floating-point operation (or a conversion to floating-point format) would need more digits than there are digits in the significand. IEEE 754 requires ''correct rounding'': that is, the rounded result is as if infinitely precise arithmetic was used to compute the value and then rounded (although in implementation only three extra bits are needed to ensure this). There are several different [[rounding]] schemes (or ''rounding modes''). Historically, [[truncation]] was the typical approach. Since the introduction of IEEE 754, the default method (''[[rounding|round to nearest, ties to even]]'', sometimes called Banker's Rounding) is more commonly used. This method rounds the ideal (infinitely precise) result of an arithmetic operation to the nearest representable value, and gives that representation as the result.<ref group="nb" name="NB_1"/> In the case of a tie, the value that would make the significand end in an even digit is chosen. The IEEE 754 standard requires the same rounding to be applied to all fundamental algebraic operations, including square root and conversions, when there is a numeric (non-NaN) result. It means that the results of IEEE 754 operations are completely determined in all bits of the result, except for the representation of NaNs. ("Library" functions such as cosine and log are not mandated.) Alternative rounding options are also available. IEEE 754 specifies the following rounding modes: * round to nearest, where ties round to the nearest even digit in the required position (the default and by far the most common mode) * round to nearest, where ties round away from zero (optional for binary floating-point and commonly used in decimal) * round up (toward +β; negative results thus round toward zero) * round down (toward ββ; negative results thus round away from zero) * round toward zero (truncation; it is similar to the common behavior of float-to-integer conversions, which convert β3.9 to β3 and 3.9 to 3) Alternative modes are useful when the amount of error being introduced must be bounded. Applications that require a bounded error are multi-precision floating-point, and [[interval arithmetic]]. The alternative rounding modes are also useful in diagnosing numerical instability: if the results of a subroutine vary substantially between rounding to + and β infinity then it is likely numerically unstable and affected by round-off error.<ref name="Kahan_2006_Mindless"/>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)