Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Bisection method
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== IEEE Standard-754 for Computer Arithmetic == If the algorithm is being used in the real number system, it is possible to continue the bisection until the relative error produces the desired approximation. If the algorithm is used with computer arithmetic, a further problem arises. In order to improve reliably and portably, the Institute of Electrical and Electronics Engineers (IEEE) produced a standard for floating point arithmetic in 1985 and has revised it in 2008 and 2019; see [[IEEE 754]]. <ref>{{Cite book |title=IEEE Standard for Floating-Point Arithmetic |series=IEEE STD 754-2019 |pages=1β84 |author=IEEE Computer Society |date=22 July 2019 |publisher=IEEE |id=IEEE Std 754-2019 |doi=10.1109/IEEESTD.2019.8766229 |isbn=978-1-5044-5924-2 |ref=CITEREFIEEE_7542019}}</ref> The IEEE Standard 754 representation is the standard used in most micro-computers. It is, for example, the basis of the PC floating point processor. [[Double-precision]] numbers occupy 64 bits which are divided into a sign bit (+/-), an exponent of 10 bits, and a fractional part of 53 bits. In order to allow for fractions (negative exponents), the exponent is biased to make the effective number of bits for the exponent 9. The effective values of the exponent with {{math|0 < e ≤ 1023}} would be <math>(2^{-511}, 2^{512})</math> making the double precision numbers take the form <math>(-1)^{s} 2^{e-511} 0.f</math> The extreme range for a positive DP number would then be <math>(1.492 \times 10^{-154}, 1.341\times 10^{154})</math> Because the fraction would normally have a non-zero leading digit (a 1 for binary) that bit does not need to be stored as the processor will supply it. As a result, the 53 bit fraction can be stored in 52 bits so the other bit can be used in the exponent to give an actual range of 0 < e β€ 2047. The range can be further extended by putting the assumed 1 '''before''' the binary point. If both the exponent and fraction are 0, then the number is 0 (with a sign). In order to deal with 3 other extreme situations, an exponent of 2047 is reserved for NaN (Not a Number - such as division by 0) and the infinities. A number is thus stored in the following form: {| class="wikitable"|- |- |style="background-color: lightblue"| . |style="background-color: pink"| |style="background-color: pink"| |style="background-color: pink"| |style="background-color: pink"| |style="background-color: pink"| |style="background-color: pink"| |style="background-color: pink"| |style="background-color: pink"| |style="background-color: pink"| |style="background-color: pink"| |style="background-color: pink"| |style="background-color: lightgreen"| |style="background-color: lightgreen"| |style="background-color: lightgreen"| |style="background-color: lightgreen"| |style="background-color: lightgreen"| |style="background-color: lightgreen"| |style="background-color: lightgreen"| |style="background-color: lightgreen"| |style="background-color: lightgreen"| |style="background-color: lightgreen"| |style="background-color: lightgreen"| |style="background-color: lightgreen"| |style="background-color: lightgreen"| |style="background-color: lightgreen"| |style="background-color: lightgreen"| |style="background-color: lightgreen"| |style="background-color: lightgreen"| |style="background-color: lightgreen"| |style="background-color: lightgreen"| |style="background-color: lightgreen"| |style="background-color: lightgreen"| |style="background-color: lightgreen"| |style="background-color: lightgreen"| |style="background-color: lightgreen"| |style="background-color: lightgreen"| |style="background-color: lightgreen"| |style="background-color: lightgreen"| |style="background-color: lightgreen"| |style="background-color: lightgreen"| |style="background-color: lightgreen"| |style="background-color: lightgreen"| |style="background-color: lightgreen"| |style="background-color: lightgreen"| |style="background-color: lightgreen"| |style="background-color: lightgreen"| |style="background-color: lightgreen"| |style="background-color: lightgreen"| |style="background-color: lightgreen"| |style="background-color: lightgreen"| |style="background-color: lightgreen"| |style="background-color: lightgreen"| |style="background-color: lightgreen"| |style="background-color: lightgreen"| |style="background-color: lightgreen"| |style="background-color: lightgreen"| |style="background-color: lightgreen"| |style="background-color: lightgreen"| |style="background-color: lightgreen"| |style="background-color: lightgreen"| |style="background-color: lightgreen"| |style="background-color: lightgreen"| |style="background-color: lightgreen"| |- |style="text-align: center;background-color:|s |style="text-align: center;background-color: pink" colspan="11" |e |style="text-align: center;background-color: lightgreen" colspan="53" |f |- |style="text-align: center;background-color:|63 |style="text-align: right;background-color: pink" colspan="11" |52 |style="text-align: right;background-color: lightgreen" colspan="53" |0 |} The following are examples of some double precision numbers: {| class="wikitable"|- |+ Double Precision |- ! style="text-align: center;" colspan="20" |Decimal 3 |- |0 |style="background-color: pink"|100 |style="background-color: pink"|0000 |style="background-color: pink"|0000 |style="background-color: lightgreen"|1000 |style="background-color: lightgreen"|0000 |style="background-color: lightgreen"|0000 |style="background-color: lightgreen"|0000 |style="background-color: lightgreen"|0000 |style="background-color: lightgreen"|0000 |style="background-color: lightgreen"|0000 |style="background-color: lightgreen"|0000 |style="background-color: lightgreen"|0000 |style="background-color: lightgreen"|0000 |style="background-color: lightgreen"|0000 |style="background-color: lightgreen"|0000 |style="background-color: lightgreen"|0000 |- |style="text-align: center;background-color: pink" colspan="2" |4 |style="text-align: center;background-color: pink" |0 |style="text-align: center;background-color: pink" |0 |style="text-align: center;background-color: lightgreen" |8 |style="text-align: center;background-color: lightgreen" |0 |style="text-align: center;background-color: lightgreen" |0 |style="text-align: center;background-color: lightgreen" |0 |style="text-align: center;background-color: lightgreen" |0 |style="text-align: center;background-color: lightgreen" |0 |style="text-align: center;background-color: lightgreen" |0 |style="text-align: center;background-color: lightgreen" |0 |style="text-align: center;background-color: lightgreen" |0 |style="text-align: center;background-color: lightgreen" |0 |style="text-align: center;background-color: lightgreen" |0 |style="text-align: center;background-color: lightgreen" |0 |style="text-align: center;background-color: lightgreen" |0 |- ! style="text-align: center;" colspan="20" |Positive infinity <math>(+\infty)</math> |- |0 |style="background-color: pink"|111 |style="background-color: pink"|1111 |style="background-color: pink"|1111 |style="background-color: lightgreen"|0000 |style="background-color: lightgreen"|0000 |style="background-color: lightgreen"|0000 |style="background-color: lightgreen"|0000 |style="background-color: lightgreen"|0000 |style="background-color: lightgreen"|0000 |style="background-color: lightgreen"|0000 |style="background-color: lightgreen"|0000 |style="background-color: lightgreen"|0000 |style="background-color: lightgreen"|0000 |style="background-color: lightgreen"|0000 |style="background-color: lightgreen"|0000 |style="background-color: lightgreen"|0000 |- |style="text-align: center;background-color: pink" colspan="2" |7 |style="text-align: center;background-color: pink" |F |style="text-align: center;background-color: pink" |F |style="text-align: center;background-color: lightgreen" |0 |style="text-align: center;background-color: lightgreen" |0 |style="text-align: center;background-color: lightgreen" |0 |style="text-align: center;background-color: lightgreen" |0 |style="text-align: center;background-color: lightgreen" |0 |style="text-align: center;background-color: lightgreen" |0 |style="text-align: center;background-color: lightgreen" |0 |style="text-align: center;background-color: lightgreen" |0 |style="text-align: center;background-color: lightgreen" |0 |style="text-align: center;background-color: lightgreen" |0 |style="text-align: center;background-color: lightgreen" |0 |style="text-align: center;background-color: lightgreen" |0 |style="text-align: center;background-color: lightgreen" |0 |- ! style="text-align: center;" colspan="20" |Max. double 1.7976931348623157 Γ <math>10^{308}</math> |- |0 |style="background-color: pink"|111 |style="background-color: pink"|1111 |style="background-color: pink"|1110 |style="background-color: lightgreen"|1111 |style="background-color: lightgreen"|1111 |style="background-color: lightgreen"|1111 |style="background-color: lightgreen"|1111 |style="background-color: lightgreen"|1111 |style="background-color: lightgreen"|1111 |style="background-color: lightgreen"|1111 |style="background-color: lightgreen"|1111 |style="background-color: lightgreen"|1111 |style="background-color: lightgreen"|1111 |style="background-color: lightgreen"|1111 |style="background-color: lightgreen"|1111 |style="background-color: lightgreen"|1111 |- |style="text-align: center;background-color: pink" colspan="2" |7 |style="text-align: center;background-color: pink" |F |style="text-align: center;background-color: pink" |E |style="text-align: center;background-color: lightgreen" |F |style="text-align: center;background-color: lightgreen" |F |style="text-align: center;background-color: lightgreen" |F |style="text-align: center;background-color: lightgreen" |F |style="text-align: center;background-color: lightgreen" |F |style="text-align: center;background-color: lightgreen" |F |style="text-align: center;background-color: lightgreen" |F |style="text-align: center;background-color: lightgreen" |F |style="text-align: center;background-color: lightgreen" |F |style="text-align: center;background-color: lightgreen" |F |style="text-align: center;background-color: lightgreen" |F |style="text-align: center;background-color: lightgreen" |F |style="text-align: center;background-color: lightgreen" |F |- ! style="text-align: center;" colspan="20" |Min. normal 2.2250738585072014 Γ 1<math>10^{-308}</math> |- |0 |style="background-color: pink"|000 |style="background-color: pink"|0000 |style="background-color: pink"|0001 |style="background-color: lightgreen"|0000 |style="background-color: lightgreen"|0000 |style="background-color: lightgreen"|0000 |style="background-color: lightgreen"|0000 |style="background-color: lightgreen"|0000 |style="background-color: lightgreen"|0000 |style="background-color: lightgreen"|0000 |style="background-color: lightgreen"|0000 |style="background-color: lightgreen"|0000 |style="background-color: lightgreen"|0000 |style="background-color: lightgreen"|0000 |style="background-color: lightgreen"|0000 |style="background-color: lightgreen"|0000 |- |style="text-align: center;background-color: pink" colspan="2" |0 |style="text-align: center;background-color: pink" |0 |style="text-align: center;background-color: pink" |1 |style="text-align: center;background-color: lightgreen" |0 |style="text-align: center;background-color: lightgreen" |0 |style="text-align: center;background-color: lightgreen" |0 |style="text-align: center;background-color: lightgreen" |0 |style="text-align: center;background-color: lightgreen" |0 |style="text-align: center;background-color: lightgreen" |0 |style="text-align: center;background-color: lightgreen" |0 |style="text-align: center;background-color: lightgreen" |0 |style="text-align: center;background-color: lightgreen" |0 |style="text-align: center;background-color: lightgreen" |0 |style="text-align: center;background-color: lightgreen" |0 |style="text-align: center;background-color: lightgreen" |0 |style="text-align: center;background-color: lightgreen" |0 |- ! style="text-align: center;" colspan="20" |Max. subnormal 2.2250738585072009 Γ <math>10^{-308}</math> |- |0 |style="background-color: pink"|000 |style="background-color: pink"|0000 |style="background-color: pink"|0000 |style="background-color: lightgreen"|1111 |style="background-color: lightgreen"|1111 |style="background-color: lightgreen"|1111 |style="background-color: lightgreen"|1111 |style="background-color: lightgreen"|1111 |style="background-color: lightgreen"|1111 |style="background-color: lightgreen"|1111 |style="background-color: lightgreen"|1111 |style="background-color: lightgreen"|1111 |style="background-color: lightgreen"|1111 |style="background-color: lightgreen"|1111 |style="background-color: lightgreen"|1111 |style="background-color: lightgreen"|1111 |- |style="text-align: center;background-color: pink" colspan="2" |0 |style="text-align: center;background-color: pink" |0 |style="text-align: center;background-color: pink" |0 |style="text-align: center;background-color: lightgreen" |F |style="text-align: center;background-color: lightgreen" |F |style="text-align: center;background-color: lightgreen" |F |style="text-align: center;background-color: lightgreen" |F |style="text-align: center;background-color: lightgreen" |F |style="text-align: center;background-color: lightgreen" |F |style="text-align: center;background-color: lightgreen" |F |style="text-align: center;background-color: lightgreen" |F |style="text-align: center;background-color: lightgreen" |F |style="text-align: center;background-color: lightgreen" |F |style="text-align: center;background-color: lightgreen" |F |style="text-align: center;background-color: lightgreen" |F |style="text-align: center;background-color: lightgreen" |F |- ! style="text-align: center;" colspan="20" |Min. subnormal 4.9406564584124654 Γ <math>10^{-324}</math> |- |0 |style="background-color: pink"|000 |style="background-color: pink"|0000 |style="background-color: pink"|0000 |style="background-color: lightgreen"|0000 |style="background-color: lightgreen"|0000 |style="background-color: lightgreen"|0000 |style="background-color: lightgreen"|0000 |style="background-color: lightgreen"|0000 |style="background-color: lightgreen"|0000 |style="background-color: lightgreen"|0000 |style="background-color: lightgreen"|0000 |style="background-color: lightgreen"|0000 |style="background-color: lightgreen"|0000 |style="background-color: lightgreen"|0000 |style="background-color: lightgreen"|0000 |style="background-color: lightgreen"|0001 |- |style="text-align: center; background-color: pink" colspan="2" |0 |style="text-align: center; background-color: pink" |0 |style="text-align: center; background-color: pink" |0 |style="text-align: center; background-color: lightgreen" |0 |style="text-align: center; background-color: lightgreen" |0 |style="text-align: center; background-color: lightgreen" |0 |style="text-align: center; background-color: lightgreen" |0 |style="text-align: center; background-color: lightgreen" |0 |style="text-align: center; background-color: lightgreen" |0 |style="text-align: center; background-color: lightgreen" |0 |style="text-align: center; background-color: lightgreen" |0 |style="text-align: center; background-color: lightgreen" |0 |style="text-align: center; background-color: lightgreen" |0 |style="text-align: center; background-color: lightgreen" |0 |style="text-align: center; background-color: lightgreen" |0 |style="text-align: center; background-color: lightgreen" |1 |- ! style="text-align: center;" colspan="20" |NaN |- |0 |style="background-color: pink"|111 |style="background-color: pink"|1111 |style="background-color: pink"|1111 |style="background-color: lightgreen"|0000 |style="background-color: lightgreen"|0000 |style="background-color: lightgreen"|0000 |style="background-color: lightgreen"|0000 |style="background-color: lightgreen"|0000 |style="background-color: lightgreen"|0000 |style="background-color: lightgreen"|0000 |style="background-color: lightgreen"|0000 |style="background-color: lightgreen"|0000 |style="background-color: lightgreen"|0000 |style="background-color: lightgreen"|0000 |style="background-color: lightgreen"|0000 |style="background-color: lightgreen"|0001 |- |style="text-align: center; background-color: pink" colspan="2" |7 |style="text-align: center; background-color: pink" |F |style="text-align: center; background-color: pink" |F |style="text-align: center; background-color: lightgreen" |0 |style="text-align: center; background-color: lightgreen" |0 |style="text-align: center; background-color: lightgreen" |0 |style="text-align: center; background-color: lightgreen" |0 |style="text-align: center; background-color: lightgreen" |0 |style="text-align: center;background-color: lightgreen" |0 |style="text-align: center;background-color: lightgreen" |0 |style="text-align: center;background-color: lightgreen" |0 |style="text-align: center;background-color: lightgreen" |0 |style="text-align: center;background-color: lightgreen" |0 |style="text-align: center;background-color: lightgreen" |0 |style="text-align: center;background-color: lightgreen" |0 |style="text-align: center;background-color: lightgreen" |1 |} * The first one (decimal 3) illustrates that 3 (binary 11) has a single one In the fraction part - the other 1 is ''assumed''. * The second one Is an example for which the exponent is 2047 <math>(+ \infty)</math>. * The third one gives the largest number which can be represented in double precision arithmetic. Note that 1.7976931348623157e+308 + 0.0000000000000001e+308 = inf * The next one, the minimum normal, represents the smallest number that can be used with full double precision. * The maximum subnormal and the minimum subnormal represent a range of numbers that have less than full double precision. It is the minimum subnormal, that is crucial for the bisection algorithm. If <math>b - a < 9.8813129168249309\times 10^{-324}</math> (2 X the min.subnormal) the interval can not be divided and the process must stop.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)