Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Round-off error
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Floating-point number system == Compared with the [[fixed-point arithmetic|fixed-point number system]], the [[floating-point arithmetic|floating-point number system]] is more efficient in representing real numbers so it is widely used in modern computers. While the real numbers <math>\mathbb{R}</math> are infinite and continuous, a floating-point number system <math>F</math> is finite and discrete. Thus, representation error, which leads to roundoff error, occurs under the floating-point number system. === Notation of floating-point number system === A floating-point number system <math>F</math> is characterized by <math>4</math> integers: *<math> \beta </math>: base or radix *<math>p</math>: precision *<math> [L, U] </math>: exponent range, where <math>L</math> is the lower bound and <math>U</math> is the upper bound Any <math>x \in F</math> has the following form: <math display="block"> x = \pm (\underbrace{d_{0}.d_{1}d_{2}\ldots d_{p-1}}_\text{significand})_{\beta} \times \beta ^{\overbrace{E}^\text{exponent}} = \pm d_{0}\times \beta ^{E}+d_{1}\times \beta ^{E-1}+\ldots+ d_{p-1}\times \beta ^{E-(p-1)}</math> where <math>d_{i}</math> is an integer such that <math>0 \leq d_{i} \leq \beta-1</math> for <math>i = 0, 1, \ldots, p-1</math>, and <math>E</math> is an integer such that <math>L \leq E \leq U</math>. === Normalized floating-number system === * A floating-point number system is normalized if the leading digit <math>d_{0}</math> is always nonzero unless the number is zero.<ref name="Forrester_2018"/> Since the [[significand]] is <math>d_{0}.d_{1}d_{2}\ldots d_{p-1}</math>, the significand of a nonzero number in a normalized system satisfies <math>1 \leq \text{significand} < \beta ^{p}</math>. Thus, the normalized form of a nonzero [[Institute of Electrical and Electronics Engineers|IEEE]] floating-point number is <math>\pm 1.bb \ldots b \times 2^{E}</math> where <math>b \in {0, 1}</math>. In binary, the leading digit is always <math>1</math> so it is not written out and is called the implicit bit. This gives an extra bit of precision so that the roundoff error caused by representation error is reduced. * Since floating-point number system <math>F</math> is finite and discrete, it cannot represent all real numbers which means infinite real numbers can only be approximated by some finite numbers through [[rounding|rounding rule]]s. The floating-point approximation of a given real number <math>x</math> by <math>fl(x)</math> can be denoted. ** The total number of normalized floating-point numbers is <math display="block">2(\beta -1)\beta^{p-1} (U-L+1)+1,</math> where *** <math>2</math> counts choice of sign, being positive or negative *** <math>(\beta -1)</math> counts choice of the leading digit *** <math>\beta^{p-1}</math> counts remaining significand digits *** <math>U-L+1</math> counts choice of exponents *** <math>1</math> counts the case when the number is <math>0</math>. === IEEE standard === In the [[Institute of Electrical and Electronics Engineers|IEEE]] standard the base is binary, i.e. <math>\beta = 2</math>, and normalization is used. The IEEE standard stores the sign, exponent, and significand in separate fields of a floating point word, each of which has a fixed width (number of bits). The two most commonly used levels of precision for floating-point numbers are single precision and double precision. {| class="wikitable" style="margin:1em auto" ! Precision ! Sign (bits) ! Exponent (bits) ! Trailing Significand field (bits) |- |Single || 1 || 8 || 23 |- |Double || 1 || 11 || 52 |}
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)