Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Undefined behavior
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{verify|date=February 2025}} {{Short description|Unpredictable result when running a program}} {{Distinguish|Undefined value|Unspecified behavior}} In [[computer programming]], a program exhibits '''undefined behavior''' ('''UB''') when it contains, or is executing code for which its [[programming language specification]] does not mandate any specific requirements.<ref>{{cite web|title=What Every C Programmer Should Know About Undefined Behavior #1/3|date=13 May 2011 |url=https://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html|access-date=23 February 2025}}</ref> This is different from [[unspecified behavior]], for which the language specification does not prescribe a result, and implementation-defined behavior that defers to the documentation of another component of the [[computing platform|platform]] (such as the [[application binary interface|ABI]] or the [[translator (computing)|translator]] documentation). In the [[C (programming language)|C programming community]], undefined behavior may be humorously referred to as "'''nasal demons'''", after a [[comp.* hierarchy|comp.std.c]] post that explained undefined behavior as allowing the compiler to do anything it chooses, even "to make demons fly out of your nose".<ref>{{cite web|title=nasal demons|url=http://catb.org/jargon/html/N/nasal-demons.html|website=[[Jargon File]]|access-date=12 June 2014}}</ref> == Overview == Some programming languages allow a program to operate differently or even have a different control flow from the source code, as long as it exhibits the same user-visible [[side effect (computer science)|side effects]], ''if undefined behavior never happens during program execution''. Undefined behavior is the name of a list of conditions that the program must not meet. In the early versions of [[C (programming language)|C]], undefined behavior's primary advantage was the production of performant [[compiler]]s for a wide variety of machines: a specific construct could be mapped to a machine-specific feature, and the compiler did not have to generate additional code for the runtime to adapt the side effects to match semantics imposed by the language. The program source code was written with prior knowledge of the specific compiler and of the [[computing platform|platforms]] that it would support. However, progressive standardization of the platforms has made this less of an advantage, especially in newer versions of C. Now, the cases for undefined behavior typically represent unambiguous [[software bug|bugs]] in the code, for example [[array index|indexing an array]] outside of its bounds. By definition, the [[runtime system|runtime]] can assume that undefined behavior never happens; therefore, some invalid conditions do not need to be checked against. For a [[compiler]], this also means that various [[program transformation]]s become valid, or their proofs of correctness are simplified; this allows for various kinds of optimizations whose correctness depend on the assumption that the program state never meets any such condition. The compiler can also remove explicit checks that may have been in the source code, without notifying the programmer; for example, detecting undefined behavior by testing whether it happened is not guaranteed to work, by definition. This makes it hard or impossible to program a portable fail-safe option (non-portable solutions are possible for some constructs). Current compiler development usually evaluates and compares compiler performance with benchmarks designed around micro-optimizations, even on platforms that are mostly used on the general-purpose desktop and laptop market (such as amd64). Therefore, undefined behavior provides ample room for compiler performance improvement, as the source code for a specific source code statement is allowed to be mapped to anything at runtime. For C and C++, the compiler is allowed to give a compile-time diagnostic in these cases, but is not required to: the implementation will be considered correct whatever it does in such cases, analogous to [[don't-care term]]s in digital logic. It is the responsibility of the programmer to write code that never invokes undefined behavior, although compiler implementations are allowed to issue diagnostics when this happens. Compilers nowadays have flags that enable such diagnostics, for example, <code>-fsanitize=undefined</code> enables the "undefined behavior sanitizer" ([[UBSan]]) in [[GNU Compiler Collection|gcc]] 4.9<ref>[https://developers.redhat.com/blog/2014/10/16/gcc-undefined-behavior-sanitizer-ubsan/ ''GCC Undefined Behavior Sanitizer β ubsan'']</ref> and in [[clang]]. However, this flag is not the default and enabling it is a choice of the person who builds the code. Under some circumstances there can be specific restrictions on undefined behavior. For example, the [[instruction set]] specifications of a [[central processing unit|CPU]] might leave the behavior of some forms of an instruction undefined, but if the CPU supports [[memory protection]] then the specification will probably include a blanket rule stating that no user-accessible instruction may cause a hole in the [[operating system]]'s security; so an actual CPU would be permitted to corrupt user registers in response to such an instruction, but would not be allowed to, for example, switch into [[supervisor mode]]. The runtime [[computing platform|platform]] can also provide some restrictions or guarantees on undefined behavior, if the [[toolchain]] or the [[runtime system|runtime]] explicitly document that specific constructs found in the [[source code]] are mapped to specific well-defined mechanisms available at runtime. For example, an [[interpreter (computing)|interpreter]] may document a particular behavior for some operations that are undefined in the language specification, while other interpreters or compilers for the same language may not. A [[compiler]] produces [[executable code]] for a specific [[application binary interface|ABI]], filling the [[semantic gap]] in ways that depend on the compiler version: the documentation for that compiler version and the ABI specification can provide restrictions on undefined behavior. Relying on these implementation details makes the software non-[[portable application|portable]], but portability may not be a concern if the software is not supposed to be used outside of a specific runtime. Undefined behavior can result in a program crash or even in failures that are harder to detect and make the program look like it is working normally, such as silent loss of data and production of incorrect results. == Benefits == Documenting an operation as undefined behavior allows compilers to assume that this operation will never happen in a conforming program. This gives the compiler more information about the code and this information can lead to more optimization opportunities. An example for the C language: <syntaxhighlight lang="c"> int foo(unsigned char x) { int value = 2147483600; /* assuming 32-bit int and 8-bit char */ value += x; if (value < 2147483600) bar(); return value; } </syntaxhighlight> The value of <code>x</code> cannot be negative and, given that signed [[integer overflow]] is undefined behavior in C, the compiler can assume that <code>value < 2147483600</code> will always be false. Thus the <code>if</code> statement, including the call to the function <code>bar</code>, can be ignored by the compiler since the test expression in the <code>if</code> has no [[side effect (computer science)|side effects]] and its condition will never be satisfied. The code is therefore semantically equivalent to: <syntaxhighlight lang="c"> int foo(unsigned char x) { int value = 2147483600; value += x; return value; } </syntaxhighlight> Had the compiler been forced to assume that signed integer overflow has ''[[Integer overflow|wraparound]]'' behavior, then the transformation above would not have been legal. Such optimizations become hard to spot by humans when the code is more complex and other optimizations, like [[inlining]], take place. For example, another function may call the above function: <syntaxhighlight lang="c"> void run_tasks(unsigned char *ptrx) { int z; z = foo(*ptrx); while (*ptrx > 60) { run_one_task(ptrx, z); } } </syntaxhighlight> The compiler is free to optimize away the <code>while</code>-loop here by applying [[value range analysis]]: by inspecting <code>foo()</code>, it knows that the initial value pointed to by <code>ptrx</code> cannot possibly exceed 47 (as any more would trigger undefined behavior in <code>foo()</code>); therefore, the initial check of <code>*ptrx > 60</code> will always be false in a conforming program. Going further, since the result <code>z</code> is now never used and <code>foo()</code> has no side effects, the compiler can optimize <code>run_tasks()</code> to be an empty function that returns immediately. The disappearance of the <code>while</code>-loop may be especially surprising if <code>foo()</code> is defined in a [[interprocedural optimization|separately compiled object file]]. Another benefit from allowing signed integer overflow to be undefined is that it makes it possible to store and manipulate a variable's value in a [[processor register]] that is larger than the size of the variable in the source code. For example, if the type of a variable as specified in the source code is narrower than the native register width (such as <code>[[C data types#Basic types|int]]</code> on a [[64-bit]] machine, a common scenario), then the compiler can safely use a signed 64-bit integer for the variable in the [[machine code]] it produces, without changing the defined behavior of the code. If a program depended on the behavior of a 32-bit integer overflow, then a compiler would have to insert additional logic when compiling for a 64-bit machine, because the overflow behavior of most machine instructions depends on the register width.<ref>{{Cite web|url=https://gist.github.com/rygorous/e0f055bfb74e3d5f0af20690759de5a7#file-gistfile1-txt-L166|title = A bit of background on compilers exploiting signed overflow}}</ref> Undefined behavior also allows more compile-time checks by both compilers and [[static program analysis]].{{citation needed|date=December 2019}} == Risks == C and C++ standards have several forms of undefined behavior throughout, which offer increased liberty in compiler implementations and compile-time checks at the expense of undefined run-time behavior if present. In particular, the [[International Organization for Standardization|ISO]] standard for C has an appendix listing common sources of undefined behavior.<ref>ISO/IEC 9899:2011 Β§J.2.</ref> Moreover, compilers are not required to diagnose code that relies on undefined behavior. Hence, it is common for programmers, even experienced ones, to rely on undefined behavior either by mistake, or simply because they are not well-versed in the rules of the language that can span hundreds of pages. This can result in bugs that are exposed when a different compiler, or different settings, are used. Testing or [[fuzzing]] with dynamic undefined behavior checks enabled, e.g., the [[Clang]] sanitizers, can help to catch undefined behavior not diagnosed by the compiler or static analyzers.<ref>{{cite web|title=Undefined behavior in 2017, cppcon 2017|author=John Regehr|website=[[YouTube]]|date=19 October 2017 |url=https://www.youtube.com/watch?v=v1COuU2vU_w}}</ref> Undefined behavior can lead to [[computer security|security]] vulnerabilities in software. For example, buffer overflows and other security vulnerabilities in the major [[web browser]]s are due to undefined behavior. When [[GNU C Compiler|GCC]]'s developers changed their compiler in 2008 such that it omitted certain overflow checks that relied on undefined behavior, [[CERT Coordination Center|CERT]] issued a warning against the newer versions of the compiler.<ref>{{cite web |archive-url=https://web.archive.org/web/20080409224149/http://www.kb.cert.org/vuls/id/162289 |url=http://www.kb.cert.org/vuls/id/162289 |archive-date=9 April 2008 |title=Vulnerability Note VU#162289 β gcc silently discards some wraparound checks |date=4 April 2008 |website=Vulnerability Notes Database |publisher=CERT}}</ref> [[Linux Weekly News]] pointed out that the same behavior was observed in [[PathScale|PathScale C]], [[Visual C++|Microsoft Visual C++ 2005]] and several other compilers;<ref>{{cite web |url=http://lwn.net/Articles/278137/ |date=16 April 2008 |author=Jonathan Corbet |title=GCC and pointer overflows |website=[[Linux Weekly News]]}}</ref> the warning was later amended to warn about various compilers.<ref>{{cite web |title=Vulnerability Note VU#162289 β C compilers may silently discard some wraparound checks |url=http://www.kb.cert.org/vuls/id/162289 |orig-year=4 April 2008|date=8 October 2008 |website=Vulnerability Notes Database |publisher=CERT}}</ref> == Examples in C and C++ == The major forms of undefined behavior in C can be broadly classified as:<ref>{{cite web|url=https://blog.regehr.org/archives/1520|title=Undefined Behavior in 2017, Embedded in Academia Blog|date=4 July 2017|author= Pascal Cuoq and John Regehr}}</ref> spatial memory safety violations, temporal memory safety violations, [[integer overflow]], strict aliasing violations, alignment violations, unsequenced modifications, data races, and loops that neither perform I/O nor terminate. In C the use of any [[automatic variable]] before it has been initialized yields undefined behavior, as does integer [[division by zero]], signed integer overflow, indexing an array outside of its defined bounds (see [[buffer overflow]]), or [[null pointer]] [[dereference operator|dereferencing]]. In general, any instance of undefined behavior leaves the abstract execution machine in an unknown state, and causes the behavior of the entire program to be undefined. Attempting to modify a [[string literal]] causes undefined behavior:<ref name="C++03 2.13.4/2">[[International Organization for Standardization|ISO]]/[[International Electrotechnical Commission|IEC]] (2003). ''[[ISO/IEC 14882|ISO/IEC 14882:2003(E): Programming Languages β C++]] Β§2.13.4 String literals [lex.string]'' para. 2</ref> <syntaxhighlight lang="cpp"> char *p = "wikipedia"; // valid C, deprecated in C++98/C++03, ill-formed as of C++11 p[0] = 'W'; // undefined behavior </syntaxhighlight> Integer [[division by zero]] results in undefined behavior:<ref name="C++03 5.6/4">[[International Organization for Standardization|ISO]]/[[International Electrotechnical Commission|IEC]] (2003). ''[[ISO/IEC 14882|ISO/IEC 14882:2003(E): Programming Languages β C++]] Β§5.6 Multiplicative operators [expr.mul]'' para. 4</ref> <syntaxhighlight lang="cpp"> int x = 1; return x / 0; // undefined behavior </syntaxhighlight> Certain pointer operations may result in undefined behavior:<ref name="C++03 5.6/5">[[International Organization for Standardization|ISO]]/[[International Electrotechnical Commission|IEC]] (2003). ''[[ISO/IEC 14882|ISO/IEC 14882:2003(E): Programming Languages - C++]] Β§5.7 Additive operators [expr.add]'' para. 5</ref> <syntaxhighlight lang="cpp"> int arr[4] = {0, 1, 2, 3}; int *p = arr + 5; // undefined behavior for indexing out of bounds p = NULL; int a = *p; // undefined behavior for dereferencing a null pointer </syntaxhighlight> In C and C++, the relational comparison of [[pointer (computer programming)|pointer]]s to objects (for less-than or greater-than comparison) is only strictly defined if the pointers point to members of the same object, or elements of the same [[array (data structure)|array]].<ref name="C++03 5.9/2">[[International Organization for Standardization|ISO]]/[[International Electrotechnical Commission|IEC]] (2003). ''[[ISO/IEC 14882|ISO/IEC 14882:2003(E): Programming Languages β C++]] Β§5.9 Relational operators [expr.rel]'' para. 2</ref> Example: <syntaxhighlight lang="cpp"> int main(void) { int a = 0; int b = 0; return &a < &b; /* undefined behavior */ } </syntaxhighlight> Reaching the end of a value-returning function (other than <code>main()</code>) without a return statement results in undefined behavior if the value of the function call is used by the caller:<ref name="C99 6.9.1/12">[[International Organization for Standardization|ISO]]/[[International Electrotechnical Commission|IEC]] (2007). ''[[ISO/IEC 9899|ISO/IEC 9899:2007(E): Programming Languages β C]] Β§6.9 External definitions'' para. 1</ref> <syntaxhighlight lang="c"> int f() { } /* undefined behavior if the value of the function call is used*/ </syntaxhighlight> Modifying an object between two [[sequence point]]s more than once produces undefined behavior.<ref>ANSI X3.159-1989 ''Programming Language C'', footnote 26</ref> There are considerable changes in what causes undefined behavior in relation to sequence points as of C++11.<ref name=":0">{{cite web|url=http://en.cppreference.com/w/cpp/language/eval_order|title=Order of evaluation - cppreference.com|work=en.cppreference.com|access-date=9 August 2016}}</ref> Modern compilers can emit warnings when they encounter multiple unsequenced modifications to the same object.<ref>{{cite web | title=Warning Options (Using the GNU Compiler Collection (GCC)) | website=GCC, the GNU Compiler Collection - GNU Project - Free Software Foundation (FSF) | url=https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html | access-date=2021-07-09}}</ref><ref>{{cite web | title=Diagnostic flags in Clang | website=Clang 13 documentation | url=https://clang.llvm.org/docs/DiagnosticsReference.html#wunsequenced | access-date=2021-07-09}}</ref> The following example will cause undefined behavior in both C and C++. <syntaxhighlight lang="c"> int f(int i) { return i++ + i++; /* undefined behavior: two unsequenced modifications to i */ } </syntaxhighlight> When modifying an object between two sequence points, reading the value of the object for any other purpose than determining the value to be stored is also undefined behavior.<ref name="C99 6.5/2">[[International Organization for Standardization|ISO]]/[[International Electrotechnical Commission|IEC]] (1999). ''[[ISO/IEC 9899|ISO/IEC 9899:1999(E): Programming Languages β C]] Β§6.5 Expressions'' para. 2</ref> <syntaxhighlight lang="c"> a[i] = i++; // undefined behavior printf("%d %d\n", ++n, power(2, n)); // also undefined behavior </syntaxhighlight> In C/C++ [[logical shift|bitwise shifting]] a value by a number of bits which is either a negative number or is greater than or equal to the total number of bits in this value results in undefined behavior. The safest way (regardless of compiler vendor) is to always keep the number of bits to shift (the right operand of the <code><<</code> and <code>>></code> [[bitwise operation|bitwise operators]]) within the range: [<code>0, [[sizeof]] value * CHAR_BIT - 1</code>] (where <code>value</code> is the left operand). <syntaxhighlight lang="c"> int num = -1; unsigned int val = 1 << num; // shifting by a negative number - undefined behavior num = 32; // or whatever number greater than 31 val = 1 << num; // the literal '1' is typed as a 32-bit integer - in this case shifting by more than 31 bits is undefined behavior num = 64; // or whatever number greater than 63 unsigned long long val2 = 1ULL << num; // the literal '1ULL' is typed as a 64-bit integer - in this case shifting by more than 63 bits is undefined behavior </syntaxhighlight> == Examples in Rust == While undefined behavior is never present in safe [[Rust (programming language)|Rust]], it is possible to invoke undefined behavior in unsafe Rust in many ways.<ref>{{cite web | title=Behavior considered undefined | website=The Rust Reference | url=https://doc.rust-lang.org/reference/behavior-considered-undefined.html | access-date=2022-11-28}}</ref> For example, creating an invalid reference (a reference which does not refer to a valid value) invokes immediate undefined behavior: <syntaxhighlight lang="rust"> fn main() { // The following line invokes immediate undefined behaviour. let _null_reference: &i32 = unsafe { std::mem::zeroed() }; } </syntaxhighlight> It is not necessary to use the reference; undefined behavior is invoked merely from the creation of such a reference. == See also == * [[Compiler]] * [[Halt and Catch Fire (computing)|Halt and Catch Fire]] * [[Unspecified behavior]] == References == {{Reflist|30em}} == Further reading == * [[Peter van der Linden]], ''Expert C Programming''. {{ISBN|0-13-177429-8}} * [https://blog.regehr.org/archives/1234 UB Canaries] (April 2015), John Regehr (University of Utah, USA) * [https://blog.regehr.org/archives/1520 Undefined Behavior in 2017] (July 2017) Pascal Cuoq (TrustInSoft, France) and John Regehr (University of Utah, USA) == External links == * [https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf Corrected version of the C99 standard]. See at section 6.10.6 for #pragma {{DEFAULTSORT:Undefined behavior}} [[Category:Programming language implementation]] [[Category:C (programming language)]] [[Category:C++]] [[Category:Articles with example C++ code]]
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)
Pages transcluded onto the current version of this page
(
help
)
:
Template:Citation needed
(
edit
)
Template:Cite web
(
edit
)
Template:Distinguish
(
edit
)
Template:ISBN
(
edit
)
Template:More citations needed
(
edit
)
Template:Reflist
(
edit
)
Template:Short description
(
edit
)
Template:Verify
(
edit
)