Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Round-off error
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Roundoff error under different rounding rules == There are two common rounding rules, round-by-chop and round-to-nearest. The IEEE standard uses round-to-nearest. * '''Round-by-chop''': The base-<math>\beta</math> expansion of <math>x</math> is truncated after the <math>(p-1)</math>-th digit. ** This rounding rule is biased because it always moves the result toward zero. * '''Round-to-nearest''': <math>fl(x)</math> is set to the nearest floating-point number to <math>x</math>. When there is a tie, the floating-point number whose last stored digit is even (also, the last digit, in binary form, is equal to 0) is used. ** For IEEE standard where the base <math>\beta</math> is <math>2</math>, this means when there is a tie it is rounded so that the last digit is equal to <math>0</math>. ** This rounding rule is more accurate but more computationally expensive. ** Rounding so that the last stored digit is even when there is a tie ensures that it is not rounded up or down systematically. This is to try to avoid the possibility of an unwanted slow drift in long calculations due simply to a biased rounding. * The following example illustrates the level of roundoff error under the two rounding rules.<ref name="Forrester_2018"/> The rounding rule, round-to-nearest, leads to less roundoff error in general. {| class="wikitable" style="margin:1em auto" ! x ! Round-by-chop ! Roundoff Error ! Round-to-nearest ! Roundoff Error |- |1.649 || 1.6 || 0.049 || 1.6 || 0.049 |- |1.650 || 1.6 || 0.050 || 1.6 || 0.050 |- |1.651 || 1.6 || 0.051 || 1.7 || β0.049 |- |1.699 || 1.6 || 0.099 || 1.7 || β0.001 |- |1.749 || 1.7 || 0.049 || 1.7 || 0.049 |- |1.750 || 1.7 || 0.050 || 1.8 || β0.050 |} === Calculating roundoff error in IEEE standard === Suppose the usage of round-to-nearest and IEEE double precision. * Example: the decimal number <math>(9.4)_{10}=(1001.{\overline{0110}})_{2}</math> can be rearranged into <math display="block">+1.\underbrace{0010110011001100110011001100110011001100110011001100}_\text{52 bits}110 \ldots \times 2^{3}</math> Since the 53rd bit to the right of the binary point is a 1 and is followed by other nonzero bits, the round-to-nearest rule requires rounding up, that is, add 1 bit to the 52nd bit. Thus, the normalized floating-point representation in IEEE standard of 9.4 is <math display="block">fl(9.4)=1.0010110011001100110011001100110011001100110011001101 \times 2^{3}.</math> * Now the roundoff error can be calculated when representing <math>9.4</math> with <math>fl(9.4)</math>. This representation is derived by discarding the infinite tail <math display="block">0.{\overline{1100}} \times 2^{-52}\times 2^{3} = 0.{\overline{0110}} \times 2^{-51} \times 2^{3}=0.4 \times 2^{-48}</math> from the right tail and then added <math>1 \times 2^{-52} \times 2^{3}=2^{-49}</math> in the rounding step. :Then <math>fl(9.4) = 9.4-0.4 \times 2^{-48} + 2^{-49} = 9.4+(0.2)_{10} \times 2^{-49}</math>. :Thus, the roundoff error is <math>(0.2 \times 2^{-49})_{10}</math>. === Measuring roundoff error by using machine epsilon === The machine epsilon <math>\epsilon_\text{mach}</math> can be used to measure the level of roundoff error when using the two rounding rules above. Below are the formulas and corresponding proof.<ref name="Forrester_2018"/> The first definition of machine epsilon is used here. ==== Theorem ==== # Round-by-chop: <math>\epsilon_\text{mach} = \beta^{1-p}</math> # Round-to-nearest: <math>\epsilon_\text{mach} = \frac{1}{2}\beta^{1-p}</math> ==== Proof ==== Let <math>x=d_{0}.d_{1}d_{2} \ldots d_{p-1}d_{p} \ldots \times \beta^{n} \in \mathbb{R}</math> where <math>n \in [L, U]</math>, and let <math>fl(x)</math> be the floating-point representation of <math>x</math>. Since round-by-chop is being used, it is <math display="block"> \begin{align} \frac{|x-fl(x)|}{|x|} &= \frac{|d_{0}.d_{1}d_{2}\ldots d_{p-1}d_{p}d_{p+1}\ldots \times \beta^{n} - d_{0}.d_{1}d_{2}\ldots d_{p-1} \times \beta^{n}|}{|d_{0}.d_{1}d_{2}\ldots \times \beta^{n}|}\\ &= \frac{|d_{p}.d_{p+1} \ldots \times \beta^{n-p}|}{|d_{0}.d_{1}d_{2}\ldots \times \beta^{n}|}\\ &= \frac{|d_{p}.d_{p+1}d_{p+2}\ldots|}{|d_{0}.d_{1}d_{2}\ldots|} \times \beta^{-p} \end{align}</math> In order to determine the maximum of this quantity, there is a need to find the maximum of the numerator and the minimum of the denominator. Since <math>d_{0}\neq 0</math> (normalized system), the minimum value of the denominator is <math>1</math>. The numerator is bounded above by <math>(\beta-1).(\beta-1){\overline{(\beta-1)}}=\beta </math>. Thus, <math>\frac{|x-fl(x)|}{|x|} \leq \frac{\beta}{1} \times \beta^{-p} = \beta^{1-p}</math>. Therefore, <math>\epsilon=\beta^{1-p}</math> for round-by-chop. The proof for round-to-nearest is similar. * Note that the first definition of machine epsilon is not quite equivalent to the second definition when using the round-to-nearest rule but it is equivalent for round-by-chop.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)