Editing C99 (section)

==IEEE&nbsp;754 floating-point support==
A major feature of C99 is its numerics support, and in particular its support for access to the features of [[IEEE&nbsp;754-1985]] (also known as IEC&nbsp;60559) [[floating-point]] hardware present in the vast majority of modern processors (defined in "Annex F IEC 60559 floating-point arithmetic"). Platforms without IEEE&nbsp;754 hardware can also implement it in software.<ref name="grouper.ieee.org"/>

On platforms with IEEE&nbsp;754 floating point:
{{unordered list
|1= <code>float</code> is defined as IEEE&nbsp;754 [[Single-precision floating-point format|single precision]], <code>double</code> is defined as [[Double-precision floating-point format|double precision]], and <code>[[long double]]</code> is defined as IEEE&nbsp;754 [[extended precision]] (e.g., Intel 80-bit [[extended precision|double extended]] precision on [[x86]] or [[x86-64]] platforms), or some form of [[Quadruple-precision floating-point format|quad precision]] where available; otherwise, it is double precision.  
|2= The four arithmetic operations and square root are correctly rounded as defined by IEEE&nbsp;754.
{{(!}} class="wikitable floatright" style="margin-left: 1.5em; font-family:monospace;"
{{!-}}
!  FLT_EVAL_METHOD !! float !! double !! long double
{{!-}}
{{!}}0{{!!}}float{{!!}}double{{!!}}long double
{{!-}}
{{!}}1{{!!}}double{{!!}}double{{!!}}long double
{{!-}} 
{{!}}2{{!!}}long double{{!!}}long double{{!!}}long double
{{!)}}
|3= Expression evaluation is defined to be performed in one of three well-defined methods, indicating whether floating-point variables are first promoted to a more precise format in expressions: <code>FLT_EVAL_METHOD == 2</code> indicates that all internal intermediate computations are performed by default at high precision (long double) where available (e.g., [[extended precision|80&nbsp;bit double extended]]), <code>FLT_EVAL_METHOD == 1</code> performs all internal intermediate expressions in double precision (unless an operand is long double), while <code>FLT_EVAL_METHOD == 0</code> specifies each operation is evaluated only at the precision of the widest operand of each operator. The intermediate result type for operands of a given precision are summarized in the adjacent table.}}

<code>FLT_EVAL_METHOD == 2</code> tends to limit the risk of [[Round-off error|rounding errors]] affecting numerically unstable expressions (see [[floating point#IEEE 754 design rationale|IEEE&nbsp;754 design rationale]]) and is the designed default method for [[x87]] hardware, but yields unintuitive behavior for the unwary user;<ref>{{cite web|url=https://www.validlab.com/goldberg/addendum.html|title=Differences Among IEEE 754 Implementations|author=Doug Priest|year=1997}}</ref> <code>FLT_EVAL_METHOD == 1</code> was the default evaluation method originally used in [[K&R C|K&R&nbsp;C]], which promoted all floats to double in expressions; and <code>FLT_EVAL_METHOD == 0</code> is also commonly used and specifies a strict "evaluate to type" of the operands. (For [[GNU Compiler Collection|gcc]], <code>FLT_EVAL_METHOD&nbsp;==&nbsp;2</code> is the default on 32&nbsp;bit x86, and <code>FLT_EVAL_METHOD&nbsp;==&nbsp;0</code> is the default on 64&nbsp;bit x86-64, but <code>FLT_EVAL_METHOD&nbsp;==&nbsp;2</code> can be specified on x86-64 with option -mfpmath=387.) Before C99, compilers could round intermediate results inconsistently, especially when using [[x87]] floating-point hardware, leading to compiler-specific behaviour;<ref name=stackinterview>{{cite web|url=https://drdobbs.com/architecture-and-design/184410314 | title=A conversation with William Kahan. | author=Jack Woehr |date=1 November 1997}}</ref> such inconsistencies are not permitted in compilers conforming to C99 (annex F).

=== Example ===
The following annotated example C99 code for computing a continued fraction function demonstrates the main features:

<syntaxhighlight lang=C line highlight="9,11,13,15,21,23,25,36,42">
#include <stdio.h>
#include <math.h>
#include <float.h>
#include <fenv.h>
#include <tgmath.h>
#include <stdbool.h>
#include <assert.h>

double compute_fn(double z)  // [1]
{
        #pragma STDC FENV_ACCESS ON  // [2]

        assert(FLT_EVAL_METHOD == 2);  // [3]

        if (isnan(z))  // [4]
                puts("z is not a number");

        if (isinf(z))
                puts("z is infinite");

        long double r = 7.0 - 3.0/(z - 2.0 - 1.0/(z - 7.0 + 10.0/(z - 2.0 - 2.0/(z - 3.0)))); // [5, 6]

        feclearexcept(FE_DIVBYZERO);  // [7]

        bool raised = fetestexcept(FE_OVERFLOW);  // [8]

        if (raised)
                puts("Unanticipated overflow.");

        return r;
}

int main(void)
{
        #ifndef __STDC_IEC_559__
        puts("Warning: __STDC_IEC_559__ not defined. IEEE 754 floating point not fully supported."); // [9]
        #endif

        #pragma STDC FENV_ACCESS ON

        #ifdef TEST_NUMERIC_STABILITY_UP
        fesetround(FE_UPWARD);                   // [10]
        #elif TEST_NUMERIC_STABILITY_DOWN
        fesetround(FE_DOWNWARD);
        #endif

        printf("%.7g\n", compute_fn(3.0));
        printf("%.7g\n", compute_fn(NAN));

        return 0;
}
</syntaxhighlight>

Footnotes:
# Compile with: {{code|lang=bash|1=gcc -std=c99 -mfpmath=387 -o test_c99_fp test_c99_fp.c -lm}}
# As the IEEE&nbsp;754 status flags are manipulated in this function, this #pragma is needed to avoid the compiler incorrectly rearranging such tests when optimising. (Pragmas are usually implementation-defined, but those prefixed with <code>STDC</code> are defined in the C standard.)
# C99 defines a limited number of expression evaluation methods: the current compilation mode can be checked to ensure it meets the assumptions the code was written under.
# The special values such as [[NaN]] and positive or negative infinity can be tested and set.
# <code>long double</code> is defined as IEEE 754 double extended or quad precision if available. Using higher precision than required for intermediate computations can minimize [[round-off error]]<ref name=Baleful>{{cite web |url=https://www.cs.berkeley.edu/~wkahan/ieee754status/baleful.pdf |title=The Baleful Effect of Computer Benchmarks upon Applied Mathematics, Physics and Chemistry| author=William Kahan |date=11 June 1996}}</ref> (the [[typedef]] <code>double_t</code> can be used for code that is portable under all <code>FLT_EVAL_METHOD</code>s).
# The main function to be evaluated. Although it appears that some arguments to this continued fraction, e.g., 3.0, would lead to a divide-by-zero error, in fact the function is well-defined at 3.0 and division by 0 will simply return a +infinity that will then correctly lead to a finite result: IEEE 754 is defined not to trap on such exceptions by default and is designed so that they can very often be ignored, as in this case. (If <code>FLT_EVAL_METHOD</code> is defined as 2 then all internal computations including constants will be performed in long double precision; if <code>FLT_EVAL_METHOD</code> is defined as 0 then additional care is need to ensure this, including possibly additional casts and explicit specification of constants as long double.)
# As the raised divide-by-zero flag is not an error in this case, it can simply be dismissed to clear the flag for use by later code.
# In some cases, other exceptions may be regarded as an error, such as overflow (although it can in fact be shown that this cannot occur in this case).
# <code>__STDC_IEC_559__</code> is to be defined only if "Annex F IEC 60559 floating-point arithmetic" is fully implemented by the compiler and the C library (users should be aware that this macro is sometimes defined while it should not be).
# The default rounding mode is round to nearest (with the even rounding rule in the halfway cases) for IEEE 754, but explicitly setting the rounding mode toward + and - infinity (by defining <code>TEST_NUMERIC_STABILITY_UP</code> etc. in this example, when debugging) can be used to diagnose numerical instability.<ref>{{cite web|url=https://www.cs.berkeley.edu/~wkahan/Mindless.pdf | title=How Futile are Mindless Assessments of Roundoff in Floating-Point Computation? | author=William Kahan |date=11 January 2006}}</ref> This method can be used even if <code>compute_fn()</code> is part of a separately compiled binary library. But depending on the function, numerical instabilities cannot always be detected.