Editing Fixed-point arithmetic (section)

==Representation==
{| class="wikitable" style="margin:0 0 0 1em; text-align:right; float:right;"
|+ Fixed-point representation
|+ with scaling 1/100
|-
! Value<br/>represented !! Internal<br/>representation
|-
| 0.00|| 0
|-
| 0.5|| 50
|-
| 0.99|| 99
|-
| 2|| 200
|-
| −14.1|| −1410
|-
| 314.160|| 31416
|}
A fixed-point representation of a fractional number is essentially an [[integer]] that is to be implicitly multiplied by a fixed scaling factor. For example, the value 1.23 can be stored in a variable as the integer value 1230 with implicit scaling factor of 1/1000 (meaning that the last 3 decimal digits are implicitly assumed to be a decimal fraction), and the value {{thinspace|1|230|000}} can be represented as 1230 with an implicit scaling factor of 1000 (with "minus 3" implied decimal fraction digits, that is, with 3 implicit zero digits at right).  This representation allows standard integer [[arithmetic logic unit]]s to perform [[rational number]] calculations.

Negative values are usually represented in binary fixed-point format as a signed integer in [[two's complement]] representation with an implicit scaling factor as above.  The sign of the value will always be indicated by the [[Bit numbering|first stored bit]] (1 = negative, 0 = non-negative), even if the number of fraction bits is greater than or equal to the total number of bits.  For example, the 8-bit signed binary integer (11110101)<sub>2</sub> = −11, taken with −3, +5, and +12 implied fraction bits, would represent the values −11/2<sup>−3</sup> = −88, −11/2<sup>5</sup> = −0.{{thinspace|343|75}}, and −11/2<sup>12</sup> = −0.{{thinspace|002|685|546|875}}, respectively.

Alternatively, negative values can be represented by an integer in the [[sign-magnitude]] format, in which case the sign is never included in the number of implied fraction bits.  This variant is more commonly used in decimal fixed-point arithmetic. Thus the signed 5-digit decimal integer (−00025)<sub>10</sub>, taken with −3, +5, and +12 implied decimal fraction digits, would represent the values −25/10<sup>−3</sup> = −25000, −25/10<sup>5</sup> = −0.00025, and −25/10<sup>12</sup> = −0.{{thinspace|000|000|000|025}}, respectively.

A program will usually assume that all fixed-point values that will be stored into a given variable, or will be produced by a given [[instruction (computing)|instruction]], will have the same scaling factor. This parameter can usually be chosen by the programmer depending on the [[Accuracy and precision|precision]] needed and range of values to be stored.

The scaling factor of a variable or formula may not appear explicitly in the program.  [[Software engineering|Good programming practice]] then requires that it be provided in the [[software documentation|documentation]], at least as a [[comment (computing)|comment]] in the [[source code]].

===Choice of scaling factors===

For greater efficiency, scaling factors are often chosen to be [[exponentiation|powers]] (positive or negative) of the base ''b'' used to represent the integers internally.  However, often the best scaling factor is dictated by the application.  Thus one often uses scaling factors that are powers of 10 (e.g. 1/100 for dollar values), for human convenience, even when the integers are represented internally in binary.  Decimal scaling factors also mesh well with the [[International System of Units|metric (SI) system]], since the choice of the fixed-point scaling factor is often equivalent to the choice of a unit of measure (like [[centimeter]]s or [[micron]]s  instead of [[meter]]s).

However, other scaling factors may be used occasionally, e.g. a fractional amount of hours may be represented as an integer number of seconds; that is, as a fixed-point number with scale factor of 1/3600.

Even with the most careful rounding, fixed-point values represented with a scaling factor ''S'' may have an error of up to ±0.5 in the stored integer, that is, ±0.5 ''S'' in the value.  Therefore, smaller scaling factors generally produce more accurate results.

On the other hand, a smaller scaling factor means a smaller range of the values that can be stored in a given program variable.  The maximum fixed-point value that can be stored into a variable is the largest integer value that can be stored into it, multiplied by the scaling factor; and similarly for the minimum value.  For example, the table below gives the implied scaling factor ''S'', the minimum and maximum representable values ''V''<sub>min</sub> and ''V''<sub>max</sub>, and the accuracy ''δ'' = ''S''/2 of values that could be represented in 16-bit signed binary fixed point format, depending on the number ''f'' of implied fraction bits.
{| class= "wikitable" style="margin:0 0 0 1em; float:center;"
|+ Parameters of some 16-bit signed binary fixed-point formats
|-
! ''f'' !! ''S'' !! ''δ'' !! ''V''<sub>min</sub> !!''V''<sub>max</sub>
|-
| −3 ||  1/2<sup>−3</sup> = 8 || 4 || −{{thinspace|262|144}} || +{{thinspace|262|136}}
|-
| 0 || 1/2<sup>0</sup> = 1 || 0.5 ||   −{{thinspace|32|768}} || +{{thinspace|32|767}}
|-
| 5 || 1/2<sup>5</sup> = 1/32 || < 0.016 || −1024.{{thinspace|000|00}} || +1023.{{thinspace|968|75}}
|-
| 14 || 1/2<sup>14</sup> = 1/{{thinspace|16|384}} || < 0.{{thinspace|000|031}} || −2.{{thinspace|000|000|000|000|00}} ||+1.{{thinspace|999|938|964|843|75}}
|-
| 15 || 1/2<sup>15</sup> = 1/{{thinspace|32|768}} || < 0.{{thinspace|000|016}} || −1.{{thinspace|000|000|000|000|000}} ||+0.{{thinspace|999|969|482|421|875}}
|-
| 16 || 1/2<sup>16</sup> = 1/{{thinspace|65|536}} || < 0.{{thinspace|000|008}} || −0.{{thinspace|500|000|000|000|000|0}} ||+0.{{thinspace|499|984|741|210|937|5}}
|-
| 20 || 1/2<sup>20</sup> = 1/{{thinspace|1|048|576}} || < 0.{{thinspace|000|000|5}} || −0.{{thinspace|031|250|000|000|000|000|00}} || +0.{{thinspace|031|249|046|325|683|593|75}}
|}

Fixed-point formats with scaling factors of the form 2<sup>''n''</sup>−1 (namely 1, 3, 7, 15, 31, etc.) have been said to be appropriate for image processing and other digital signal processing tasks. They are supposed to provide more consistent conversions between fixed- and floating-point values than the usual 2<sup>''n''</sup> scaling. The [[Julia (programming language)|Julia]] programming language implements both versions.<ref name="julia"/>

===Exact values===

Any binary fraction ''a''/2<sup>''m''</sup>, such as 1/16 or 17/32, can be exactly represented in fixed-point, with a power-of-two scaling factor 1/2<sup>''n''</sup> with any ''n'' ≥ ''m''.  However, most decimal fractions like 0.1 or 0.123 are infinite [[repeating fraction]]s in base 2. and hence cannot be represented that way.

Similarly, any decimal fraction ''a''/10<sup>''m''</sup>, such as 1/100 or 37/1000, can be exactly represented in fixed point with a power-of-ten scaling factor 1/10<sup>''n''</sup> with any ''n'' ≥ ''m''.  This decimal format can also represent any binary fraction ''a''/2<sup>''m''</sup>, such as 1/8 (0.125) or 17/32 (0.53125).

More generally, a [[rational number]] ''a''/''b'', with ''a'' and ''b'' [[relatively prime]] and ''b'' positive, can be exactly represented in binary fixed point only if ''b'' is a power of 2; and in decimal fixed point only if ''b'' has no [[prime]] factors other than 2 and/or 5.

===Comparison with floating-point===

Fixed-point computations can be faster and/or use less hardware than floating-point ones.  If the range of the values to be represented is known in advance and is sufficiently limited, fixed point can make better use of the available bits.  For example, if 32 bits are available to represent a number between 0 and 1, a fixed-point representation can have error less than 1.2 × 10<sup>−10</sup>, whereas the standard floating-point representation may have error up to 596 × 10<sup>−10</sup> — because 9 of the bits are wasted with the sign and exponent of the dynamic scaling factor. Specifically, comparing 32-bit fixed-point to [[IEEE 754|floating-point]] audio, a recording requiring less than 40 [[Decibel|dB]] of [[Headroom (audio signal processing)|headroom]] has a higher [[signal-to-noise ratio]] using 32-bit fixed.

Programs using fixed-point computations are usually more portable than those using floating-point since they do not depend on the availability of an FPU. This advantage was particularly strong before the [[IEEE Floating Point Standard]] was widely adopted when floating-point computations with the same data would yield different results depending on the manufacturer, and often on the computer model.

Many embedded processors lack an FPU, because integer arithmetic units require substantially fewer [[logic gate]]s and consume much smaller [[integrated circuit|chip]] area than an FPU; and software [[emulation (computing)|emulation]] of floating-point on low-speed devices would be too slow for most applications. CPU chips for the earlier [[personal computer]]s and [[game console]]s, like the [[Intel 386]] and [[Intel 486|486SX]], also lacked an FPU.

The ''absolute'' resolution (difference between successive values) of any fixed-point format is constant over the whole range, namely the scaling factor ''S''. In contrast, the ''relative'' resolution of a floating-point format is approximately constant over their whole range, varying within a factor of the base ''b''; whereas their ''absolute'' resolution varies by many orders of magnitude, like the values themselves.

In many cases, the [[Quantization (signal processing)|rounding and truncation]] errors of fixed-point computations are easier to analyze than those of the equivalent floating-point computations.  Applying linearization techniques to truncation, such as [[dither]]ing and/or [[noise shaping]] is more straightforward within fixed-point arithmetic. On the other hand, the use of fixed point requires greater care by the programmer.  Avoidance of overflow requires much tighter estimates for the ranges of variables and all intermediate values in the computation, and often also extra code to adjust their scaling factors.  Fixed-point programming normally requires the use of [[C data types#Main types|integer types of different widths]].  Fixed-point applications can make use of [[block floating point]], which is a fixed-point environment having each array (block) of fixed-point data be scaled with a common exponent in a single word.