Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
C syntax
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Data structures== {{main|C data types}} ===Primitive data types=== The C programming language represents numbers in three forms: ''[[Integer (computer science)|integral]]'', ''[[Real data type|real]]'' and ''[[Complex data type|complex]]''. This distinction reflects similar distinctions in the [[instruction set]] architecture of most [[central processing unit]]s. ''Integral'' data types store numbers in the set of [[integer]]s, while ''real'' and ''complex'' numbers represent numbers (or pair of numbers) in the set of [[real number]]s in [[floating-point arithmetic|floating-point]] form. All C integer types have {{code|signed}} and {{code|unsigned}} variants. If {{code|signed}} or {{code|unsigned}} is not specified explicitly, in most circumstances, {{code|signed}} is assumed. However, for historic reasons, plain {{code|char}} is a type distinct from both {{code|signed char}} and {{code|unsigned char}}. It may be a signed type or an unsigned type, depending on the compiler and the character set (C guarantees that members of the C basic character set have positive values). Also, [[bit field]] types specified as plain {{code|int}} may be signed or unsigned, depending on the compiler. ====Integer types==== C's integer types come in different fixed sizes, capable of representing various ranges of numbers. The type {{code|char}} occupies exactly one [[byte]] (the smallest addressable storage unit), which is typically 8 bits wide. (Although {{code|char}} can represent any of C's "basic" characters, a wider type may be required for international character sets.) Most integer types have both [[signedness|signed and unsigned]] varieties, designated by the {{code|signed}} and {{code|unsigned}} keywords. Signed integer types always use the [[two's complement]] [[Signed number representations|representation]], since [[C23 (C standard revision)|C23]]<ref name="N2412">{{cite web |title=WG14-N2412: Two's complement sign representation |url=https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2412.pdf |website=open-std.org |archive-url=https://web.archive.org/web/20221227174224/https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2412.pdf |archive-date=December 27, 2022 |date=August 11, 2019 |url-status=live}}</ref> (and in practice before; in older C versions before C23 the representation might alternatively have been [[ones' complement]], or [[sign-and-magnitude]], but in practice that has not been the case for decades on modern hardware). In many cases, there are multiple equivalent ways to designate the type; for example, {{code|signed short int}} and {{code|short}} are synonymous. The representation of some types may include unused "padding" bits, which occupy storage but are not included in the width. The following table provides a complete list of the standard integer types and their ''minimum'' allowed widths (including any sign bit). {| class="wikitable" |+ Specifications for standard integer types |- ! Shortest form of specifier !! Minimum width (bits) |- | {{code|bool}} | style="text-align: center" | 1 |- | {{code|char}} | style="text-align: center" | 8 |- | {{code|signed char}} | style="text-align: center" | 8 |- | {{code|unsigned char}} | style="text-align: center" | 8 |- | {{code|short}} | style="text-align: center" | 16 |- | {{code|unsigned short}} | style="text-align: center" | 16 |- | {{code|int}} | style="text-align: center" | 16 |- | {{code|unsigned int}} | style="text-align: center" | 16 |- | {{code|long}} | style="text-align: center" | 32 |- | {{code|unsigned long}} | style="text-align: center" | 32 |- | {{code|long long}}<ref group="note" name="long long">The {{code|long long}} modifier was introduced in the [[C99]] standard.</ref> | style="text-align: center" | 64 |- | {{code|unsigned long long}}<ref group="note" name="long long"/> | style="text-align: center" | 64 |} The {{code|char}} type is distinct from both {{code|signed char}} and {{code|unsigned char}}, but is guaranteed to have the same representation as one of them. The {{code|_Bool}} and {{code|long long}} types are standardized since 1999, and may not be supported by older C compilers. Type {{code|_Bool}} is usually accessed via the <code>[[typedef]]</code> name {{code|bool}} defined by the standard header <code><[[stdbool.h]]></code>, however since C23 the {{code|_Bool}} type has been renamed {{code|bool}}, and <code><stdbool.h></code> has been deprecated. In general, the widths and representation scheme implemented for any given platform are chosen based on the machine architecture, with some consideration given to the ease of importing source code developed for other platforms. The width of the {{code|int}} type varies especially widely among C implementations; it often corresponds to the most "natural" word size for the specific platform. The standard header [[limits.h]] defines macros for the minimum and maximum representable values of the standard integer types as implemented on any specific platform. In addition to the standard integer types, there may be other "extended" integer types, which can be used for {{code|typedef}}s in standard headers. For more precise specification of width, programmers can and should use {{code|typedef}}s from the standard header [[stdint.h]]. Integer constants may be specified in source code in several ways. Numeric values can be specified as [[decimal]] (example: {{code|1022}}), [[octal]] with zero ({{code|0}}) as a prefix ({{code|01776}}), or [[hexadecimal]] with {{code|0x}} (zero x) as a prefix ({{code|0x3FE}}). A character in single quotes (example: {{code|'R'}}), called a "character constant," represents the value of that character in the execution character set, with type {{code|int}}. Except for character constants, the type of an integer constant is determined by the width required to represent the specified value, but is always at least as wide as {{code|int}}. This can be overridden by appending an explicit length and/or signedness modifier; for example, {{code|12lu}} has type {{code|unsigned long}}. There are no negative integer constants, but the same effect can often be obtained by using a unary negation operator "{{code|-}}". ====Enumerated type==== The [[enumerated type]] in C, specified with the {{code|enum}} keyword, and often just called an "enum" (usually pronounced {{IPAc-en|'|i:|n|V|m}} {{respell|EE|num}} or {{IPAc-en|'|i:|n|u:|m}} {{respell|EE|noom}}), is a type designed to represent values across a series of named constants. Each of the enumerated constants has type {{code|int}}. Each {{code|enum}} type itself is compatible with {{code|char}} or a signed or unsigned integer type, but each implementation defines its own rules for choosing a type. Some compilers warn if an object with enumerated type is assigned a value that is not one of its constants. However, such an object can be assigned any values in the range of their compatible type, and {{code|enum}} constants can be used anywhere an integer is expected. For this reason, {{code|enum}} values are often used in place of preprocessor {{code|#define}} directives to create named constants. Such constants are generally safer to use than macros, since they reside within a specific identifier namespace. An enumerated type is declared with the {{code|enum}} specifier and an optional name (or ''tag'') for the enum, followed by a list of one or more constants contained within curly braces and separated by commas, and an optional list of variable names. Subsequent references to a specific enumerated type use the {{code|enum}} keyword and the name of the enum. By default, the first constant in an enumeration is assigned the value zero, and each subsequent value is incremented by one over the previous constant. Specific values may also be assigned to constants in the declaration, and any subsequent constants without specific values will be given incremented values from that point onward. For example, consider the following declaration: <syntaxhighlight lang=C>enum colors { RED, GREEN, BLUE = 5, YELLOW } paint_color;</syntaxhighlight> This declares the {{code|enum colors}} type; the {{code|int}} constants {{code|RED}} (whose value is 0), {{code|GREEN}} (whose value is one greater than {{code|RED}}, 1), {{code|BLUE}} (whose value is the given value, 5), and {{code|YELLOW}} (whose value is one greater than {{code|BLUE}}, 6); and the {{code|enum colors}} variable {{code|paint_color}}. The constants may be used outside of the context of the {{code|enum}} (where any integer value is allowed), and values other than the constants may be assigned to {{code|paint_color}}, or any other variable of type {{code|enum colors}}. ====Floating-point types==== A floating-point form is used to represent numbers with a fractional component. They do not, however, represent most rational numbers exactly; they are instead a close approximation. There are three standard types of real values, denoted by their specifiers (and since [[C23 (C standard revision)|C23]] three more decimal types): single precision ({{code|float}}), double precision ({{code|double}}), and double extended precision ({{code|long double}}). Each of these may represent values in a different form, often one of the [[IEEE floating-point]] formats. {| class="wikitable" width="80%" |+ Floating-point types |- ! rowspan="2" | Type specifiers ! colspan="2" | Precision (decimal digits) ! colspan="2" | Exponent range |- ! Minimum ! IEEE 754 ! Minimum ! IEEE 754 |- | {{code|float}} | align="center" | 6 | align="center" | 7.2 (24 bits) | align="center" | ±37 | align="center" | ±38 (8 bits) |- | {{code|double}} | align="center" | 10 | align="center" | 15.9 (53 bits) | align="center" | ±37 | align="center" | ±307 (11 bits) |- | {{code|long double}} | align="center" | 10 | align="center" | 34.0 (113 bits) | align="center" | ±37 | align="center" | ±4931 (15 bits) |} Floating-point constants may be written in [[decimal notation]], e.g. {{code|1.23}}. [[Decimal scientific notation]] may be used by adding {{code|e}} or {{code|E}} followed by a decimal exponent, also known as [[E notation]], e.g. {{code|1.23e2}} (which has the value 1.23 × 10<sup>2</sup> = 123.0). Either a decimal point or an exponent is required (otherwise, the number is parsed as an integer constant). [[Hexadecimal floating-point constant]]s follow similar rules, except that they must be prefixed by {{code|0x}} and use {{code|p}} or {{code|P}} to specify a binary exponent, e.g. {{code|0xAp-2}} (which has the value 2.5, since A<sub>h</sub> × 2<sup>−2</sup> = 10 × 2<sup>−2</sup> = 10 ÷ 4). Both decimal and hexadecimal floating-point constants may be suffixed by {{code|f}} or {{code|F}} to indicate a constant of type {{code|float}}, by {{code|l}} (letter {{code|l}}) or {{code|L}} to indicate type {{code|long double}}, or left unsuffixed for a {{code|double}} constant. The standard header file [[float.h|{{code|float.h}}]] defines the minimum and maximum values of the implementation's floating-point types {{code|float}}, {{code|double}}, and {{code|long double}}. It also defines other limits that are relevant to the processing of floating-point numbers. [[C23 (C standard revision)|C23]] introduces three additional ''decimal'' (as opposed to binary) real floating-point types: _Decimal32, _Decimal64, and _Decimal128. <!-- It's unclear if decimal is optional as keywords, or just in library: Support for the ISO/IEC 60559:2020, the current version of the [[IEEE 754|IEEE 754 standard]] for floating-point arithmetic, with extended binary floating-point arithmetic and (optional) decimal floating-point arithmetic.<ref name="N2341">{{cite web |title=WG14-N2341: ISO/IEC TS 18661-2 - Floating-point extensions for C - Part 2: Decimal floating-point arithmetic |url=https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2341.pdf |website=open-std.org |archive-url=https://web.archive.org/web/20221121122559/https://open-std.org/JTC1/SC22/WG14/www/docs/n2341.pdf |archive-date=November 21, 2022 |date=February 26, 2019 |url-status=live}}</ref><ref name="N2601">{{cite web |title=WG14-N2601: Annex X - IEC 60559 interchange and extended types |url=https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2601.pdf |website=open-std.org |archive-url=https://web.archive.org/web/20221014221322/https://open-std.org/JTC1/SC22/WG14/www/docs/n2601.pdf |archive-date=October 14, 2022 |date=October 15, 2020 |url-status=live}}</ref> --> : NOTE C does not specify a radix for '''float''', '''double''', and '''long double'''. An implementation can choose the representation of '''float''', '''double''', and '''long double''' to be the same as the decimal floating types.<ref name="N2341">{{cite web |title=WG14-N2341: ISO/IEC TS 18661-2 - Floating-point extensions for C - Part 2: Decimal floating-point arithmetic |url=https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2341.pdf |website=open-std.org |archive-url=https://web.archive.org/web/20221121122559/https://open-std.org/JTC1/SC22/WG14/www/docs/n2341.pdf |archive-date=November 21, 2022 |date=February 26, 2019 |url-status=live}}</ref> Despite that, the radix has historically been binary (base 2), meaning numbers like 1/2 or 1/4 are exact, but not 1/10, 1/100 or 1/3. With decimal floating point all the same numbers are exact plus numbers like 1/10 and 1/100, but still not e.g. 1/3. No known implementation does opt into the decimal radix for the previously known to be binary types. Since most computers do not even have the hardware for the decimal types, and those few that do (e.g. IBM mainframes since [[IBM System z10]]), can use the explicitly decimal types. <!-- Lots more keywords added in C23 (also for the preprocessor), see at https://en.cppreference.com/w/c/keyword such as _BitInt, typeof and thread_local as opposed to older _Thread_local. NOT mentioned there, only at https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2601.pdf (and note C23 is not yet finalized): X.5.1 Keywords _Float32x _Float64x _Float128x .. X.5.2 Constants [1] This subclause specifies constants of interchange and extended floating types. [2] This subclause expands floating-suffix (6.4.4.2) to also include: fN FN fNx FNx dN DN dNx DNx .. The type specifiers _FloatN (where N is 16, 32, 64, or ≥ 128 and a multiple of 32), _Float32x, _Float64x, _Float128x, _DecimalN (where N is 96 or > 128 and a multiple of 32), _Decimal64x, and _Decimal128x shall not be used if the implementation does not support the corresponding types (see 6.10.8.3 and X.2). .. — _DecimalN, where N is 96 or > 128 and a multiple of 32 — _Decimal64x — _Decimal128x — _FloatN _Complex, where N is 16, 32, 64, or ≥ 128 and a multiple of 32 --> ====Storage class specifiers==== Every object has a storage class.{{cn|date=April 2025}} This specifies most basically the storage ''duration,'' which may be static (default for global), automatic (default for local), or dynamic (allocated), together with other features (linkage and register hint).{{cn|date=April 2025}} {| class="wikitable" |+ Storage classes |- ! Specifiers ! Lifetime ! Scope ! Default initializer |- | {{code|auto}} | Block (stack) | Block | Uninitialized |- | {{code|register}} | Block (stack or CPU register) | Block | Uninitialized |- | {{code|static}} | Program | Block or compilation unit | Zero |- | {{code|extern}} | Program | Global (entire program) | Zero |- | {{code|_Thread_local}} | Thread | | |- | ''(none)''<sup>1</sup> | Dynamic (heap) | | Uninitialized (initialized to {{code|0}} if using {{code|calloc()}}) |} :<sup>1</sup> Allocated and deallocated using the {{code|malloc()}} and {{code|free()}} library functions. Variables declared within a [[block (programming)|block]] by default have automatic storage, as do those explicitly declared with the [[Automatic variable|{{code|auto}}]]<ref group="note">The meaning of auto is a type specifier rather than a storage class specifier in C++0x</ref> or [[Register (keyword)|{{code|register}}]] storage class specifiers. The {{code|auto}} and {{code|register}} specifiers may only be used within functions and function argument declarations;{{cn|date=April 2025}} as such, the {{code|auto}} specifier is always redundant. Objects declared outside of all blocks and those explicitly declared with the [[Static variable|{{code|static}}]] storage class specifier have static storage duration. Static variables are initialized to zero by default by the [[compiler]].{{cn|date=April 2025}} Objects with automatic storage are local to the block in which they were declared and are discarded when the block is exited. Additionally, objects declared with the {{code|register}} storage class may be given higher priority by the compiler for access to [[Register (computing)|registers]]; although the compiler may choose not to actually store any of them in a register. Objects with this storage class may not be used with the address-of ({{code|&}}) unary operator. Objects with static storage persist for the program's entire duration. In this way, the same object can be accessed by a function across multiple calls. Objects with allocated storage duration are created and destroyed explicitly with [[malloc|{{code|malloc}}]], {{code|free}}, and related functions. The [[External variable|{{code|extern}}]] storage class specifier indicates that the storage for an object has been defined elsewhere. When used inside a block, it indicates that the storage has been defined by a declaration outside of that block. When used outside of all blocks, it indicates that the storage has been defined outside of the compilation unit. The {{code|extern}} storage class specifier is redundant when used on a function declaration. It indicates that the declared function has been defined outside of the compilation unit. The [[Thread-local storage|{{code|_Thread_local}}]] (<code>thread_local</code> in [[C++]], and in C since [[C23 (C standard revision)|C23]],{{cn|date=April 2025}} and in earlier versions of C if the header <code><threads.h></code> is included) storage class specifier, introduced in [[C11 (C standard revision)|C11]], is used to declare a thread-local variable. It can be combined with {{code|static}} or {{code|extern}} to determine linkage.{{explain|reason=probably should link to, or otherwise explain, threads before discussing thread local storage|date=April 2025}} Note that storage specifiers apply only to functions and objects; other things such as type and enum declarations are private to the compilation unit in which they appear.{{cn|date=April 2025}} Types, on the other hand, have qualifiers (see below). ====Type qualifiers==== {{main|Type qualifier}} Types can be qualified to indicate special properties of their data. The type qualifier <code>[[const (computer programming)|const]]</code> indicates that a value does not change once it has been initialized. Attempting to modify a <code>const</code> qualified value yields undefined behavior, so some C compilers store them in [[rodata]] or (for embedded systems) in [[read-only memory]] (ROM). The type qualifier <code>[[volatile (computer programming)|volatile]]</code> indicates to an [[optimizing compiler]] that it may not remove apparently redundant reads or writes, as the value may change even if it was not modified by any expression or statement, or multiple writes may be necessary, such as for [[memory-mapped I/O]]. ===Incomplete types=== An incomplete type is a [[#Structures_and_unions|structure or union]] type whose members have not yet been specified, an [[#Arrays|array type]] whose dimension has not yet been specified, or the {{code|void}} type (the {{code|void}} type cannot be completed). Such a type may not be instantiated (its size is not known), nor may its members be accessed (they, too, are unknown); however, the derived pointer type may be used (but not dereferenced). They are often used with pointers, either as forward or external declarations. For instance, code could declare an incomplete type like this: <syntaxhighlight lang=C> struct thing *pt; </syntaxhighlight> This declares {{code|pt}} as a pointer to {{code|struct thing}} ''and'' the incomplete type {{code|struct thing}}. Pointers to data always have the same byte-width regardless of what they point to, so this statement is valid by itself (as long as {{code|pt}} is not dereferenced). The incomplete type can be completed later in the same scope by redeclaring it: <syntaxhighlight lang=C> struct thing { int num; }; /* thing struct type is now completed */ </syntaxhighlight> Incomplete types are used to implement [[Recursion (computer science)|recursive]] structures; the body of the type declaration may be deferred to later in the translation unit: <syntaxhighlight lang=C> typedef struct Bert Bert; typedef struct Wilma Wilma; struct Bert { Wilma *wilma; }; struct Wilma { Bert *bert; }; </syntaxhighlight> Incomplete types are also used for [[data hiding]]; the incomplete type is defined in a [[header file]], and the body only within the relevant source file. ===Pointers=== In declarations the asterisk modifier ({{code|*}}) specifies a pointer type. For example, where the specifier {{code|int}} would refer to the integer type, the specifier {{code|int*}} refers to the type "pointer to integer". Pointer values associate two pieces of information: a memory address and a data type. The following line of code declares a pointer-to-integer variable called ''ptr'': <syntaxhighlight lang=C>int *ptr;</syntaxhighlight> ====Referencing==== When a non-static pointer is declared, it has an unspecified value associated with it. The address associated with such a pointer must be changed by assignment prior to using it. In the following example, ''ptr'' is set so that it points to the data associated with the variable ''a'': <syntaxhighlight lang=C> int a = 0; int *ptr = &a; </syntaxhighlight> In order to accomplish this, the "address-of" operator (unary {{code|&}}) is used. It produces the memory location of the data object that follows. ====Dereferencing==== The pointed-to data can be accessed through a pointer value. In the following example, the integer variable ''b'' is set to the value of integer variable ''a'', which is 10: <syntaxhighlight lang=C> int a=10; int *p; p = &a; int b = *p; </syntaxhighlight> In order to accomplish that task, the unary [[dereference operator]], denoted by an asterisk (*), is used. It returns the data to which its operand—which must be of pointer type—points. Thus, the expression *''p'' denotes the same value as ''a''. Dereferencing a [[null pointer]] is illegal. ===Arrays=== ====Array definition==== Arrays are used in C to represent structures of consecutive elements of the same type. The definition of a (fixed-size) array has the following syntax: <syntaxhighlight lang=C>int array[100];</syntaxhighlight> which defines an array named ''array'' to hold 100 values of the primitive type {{code|int}}. If declared within a function, the array dimension may also be a non-constant expression, in which case memory for the specified number of elements will be allocated. In most contexts in later use, a mention of the variable ''array'' is converted to a pointer to the first item in the array. The [[sizeof|{{code|sizeof}}]] operator is an exception: {{code|sizeof array}} yields the size of the entire array (that is, 100 times the size of an {{code|int}}, and {{code|sizeof(array) / sizeof(int)}} will return 100). Another exception is the & (address-of) operator, which yields a pointer to the entire array, for example <syntaxhighlight lang=C>int (*ptr_to_array)[100] = &array;</syntaxhighlight> ====Accessing elements==== The primary facility for accessing the values of the elements of an array is the array subscript operator. To access the ''i''-indexed element of ''array'', the syntax would be {{code|array[i]}}, which refers to the value stored in that array element. Array subscript numbering begins at 0 (see [[Zero-based indexing]]). The largest allowed array subscript is therefore equal to the number of elements in the array minus 1. To illustrate this, consider an array ''a'' declared as having 10 elements; the first element would be {{code|a[0]}} and the last element would be {{code|a[9]}}. C provides no facility for automatic [[bounds checking]] for array usage. Though logically the last subscript in an array of 10 elements would be 9, subscripts 10, 11, and so forth could accidentally be specified, with undefined results. Due to arrays and pointers being interchangeable, the addresses of each of the array elements can be expressed in equivalent [[pointer arithmetic]]. The following table illustrates both methods for the existing array: {| class="wikitable" style="margin-left: auto; margin-right: auto; text-align: center" |+ Array subscripts vs. pointer arithmetic ! style="text-align: left" | Element ! First ! Second ! Third ! ''n''th |- ! style="text-align: left" | Array subscript | {{C-lang|array[0]}} | {{C-lang|array[1]}} | {{C-lang|array[2]}} | {{C-lang|array[n - 1]}} |- ! style="text-align: left" | Dereferenced pointer | {{C-lang|*array}} | {{C-lang|*(array + 1)}} | {{C-lang|*(array + 2)}} | {{C-lang|*(array + n - 1)}} |} Since the expression {{code|a[i]}} is semantically equivalent to {{code|*(a+i)}}, which in turn is equivalent to {{code|*(i+a)}}, the expression can also be written as {{code|i[a]}}, although this form is rarely used. ====Variable-length arrays==== [[C99]] standardised [[variable-length array]]s (VLAs) within block scope. Such array variables are allocated based on the value of an integer value at runtime upon entry to a block, and are deallocated at the end of the block.<ref name="bk21st" /> As of [[C11 (C standard revision)|C11]] this feature is no longer required to be implemented by the compiler. <syntaxhighlight lang=C> int n = ...; int a[n]; a[3] = 10; </syntaxhighlight> This syntax produces an array whose size is fixed until the end of the block. ====Dynamic arrays==== {{main|C dynamic memory allocation}} Arrays that can be resized dynamically can be produced with the help of the [[C standard library]]. The <code>[[malloc]]</code> function provides a simple method for allocating memory. It takes one parameter: the amount of memory to allocate in bytes. Upon successful allocation, {{code|malloc}} returns a generic ({{code|void}}) pointer value, pointing to the beginning of the allocated space. The pointer value returned is converted to an appropriate type implicitly by assignment. If the allocation could not be completed, {{code|malloc}} returns a [[null pointer]]. The following segment is therefore similar in function to the above desired declaration: <syntaxhighlight lang=C> #include <stdlib.h> /* declares malloc */ ... int *a = malloc(n * sizeof *a); a[3] = 10; </syntaxhighlight> The result is a "pointer to {{code|int}}" variable (''a'') that points to the first of ''n'' contiguous {{code|int}} objects; due to array–pointer equivalence this can be used in place of an actual array name, as shown in the last line. The advantage in using this [[dynamic allocation]] is that the amount of memory that is allocated to it can be limited to what is actually needed at run time, and this can be changed as needed (using the standard library function [[realloc|{{code|realloc}}]]). When the dynamically allocated memory is no longer needed, it should be released back to the run-time system. This is done with a call to the {{code|free}} function. It takes a single parameter: a pointer to previously allocated memory. This is the value that was returned by a previous call to {{code|malloc}}. As a security measure, some programmers {{who|date=August 2020}} then set the pointer variable to {{code|NULL}}: <syntaxhighlight lang=C> free(a); a = NULL; </syntaxhighlight> This ensures that further attempts to dereference the pointer, on most systems, will crash the program. If this is not done, the variable becomes a [[dangling pointer]] which can lead to a use-after-free bug. However, if the pointer is a local variable, setting it to {{code|NULL}} does not prevent the program from using other copies of the pointer. Local use-after-free bugs are usually easy for [[static analyzer]]s to recognize. Therefore, this approach is less useful for local pointers and it is more often used with pointers stored in long-living structs. In general though, setting pointers to {{code|NULL}} is good practice {{according to whom|date=August 2020}} as it allows a programmer to {{code|NULL}}-check pointers prior to dereferencing, thus helping prevent crashes. Recalling the array example, one could also create a fixed-size array through dynamic allocation: <syntaxhighlight lang=C> int (*a)[100] = malloc(sizeof *a); </syntaxhighlight> ...Which yields a pointer-to-array. Accessing the pointer-to-array can be done in two ways: <syntaxhighlight lang=C> (*a)[index]; index[*a]; </syntaxhighlight> Iterating can also be done in two ways: <syntaxhighlight lang=C> for (int i = 0; i < 100; i++) (*a)[i]; for (int *i = a[0]; i < a[1]; i++) *i; </syntaxhighlight> The benefit to using the second example is that the numeric limit of the first example isn't required, which means that the pointer-to-array could be of any size and the second example can execute without any modifications. ====Multidimensional arrays==== In addition, C supports arrays of multiple dimensions, which are stored in [[row-major order]]. Technically, C multidimensional arrays are just one-dimensional arrays whose elements are arrays. The syntax for declaring multidimensional arrays is as follows: <syntaxhighlight lang=C>int array2d[ROWS][COLUMNS];</syntaxhighlight> where ''ROWS'' and ''COLUMNS'' are constants. This defines a two-dimensional array. Reading the subscripts from left to right, ''array2d'' is an array of length ''ROWS'', each element of which is an array of ''COLUMNS'' integers. To access an integer element in this multidimensional array, one would use <syntaxhighlight lang=C>array2d[4][3]</syntaxhighlight> Again, reading from left to right, this accesses the 5th row, and the 4th element in that row. The expression {{code|array2d[4]}} is an array, which we are then subscripting with [3] to access the fourth integer. {| class="wikitable" style="margin-left: auto; margin-right: auto; text-align: center" |+ Array subscripts vs. pointer arithmetic<ref>{{cite book|last=Balagurusamy|first=E|title=Programming in ANSI C|publisher=Tata McGraw Hill|pages=366}}</ref> ! style="text-align: left" | Element ! First ! Second row, second column ! ''i''th row, ''j''th column |- ! style="text-align: left" | Array subscript | {{C-lang|array[0][0]}} | {{C-lang|array[1][1]}} | {{C-lang|array[i - 1][j - 1]}} |- ! style="text-align: left" | Dereferenced pointer | {{C-lang|*(*(array + 0) + 0)}} | {{C-lang|*(*(array + 1) + 1)}} | {{C-lang|*(*(array + i - 1) + j - 1)}} |} Higher-dimensional arrays can be declared in a similar manner. A multidimensional array should not be confused with an array of pointers to arrays (also known as an [[Iliffe vector]] or sometimes an ''array of arrays''). The former is always rectangular (all subarrays must be the same size), and occupies a contiguous region of memory. The latter is a one-dimensional array of pointers, each of which may point to the first element of a subarray in a different place in memory, and the sub-arrays do not have to be the same size. The latter can be created by multiple uses of {{code|malloc}}. ===Strings=== {{main | C string handling}} In C, string literals are surrounded by double quotes ({{code|"}}) (e.g., {{code|"Hello world!"}}) and are compiled to an array of the specified {{code|char}} values with an additional [[null terminating character]] (0-valued) code to mark the end of the string. [[String literal]]s may not contain embedded newlines; this proscription somewhat simplifies parsing of the language. To include a newline in a string, the [[#Backslash escapes|backslash escape]] {{code|\n}} may be used, as below. There are several standard library functions for operating with string data (not necessarily constant) organized as array of {{code|char}} using this null-terminated format; see [[#Library functions|below]]. C's string-literal syntax has been very influential, and has made its way into many other languages, such as C++, Objective-C, Perl, Python, PHP, Java, JavaScript, C#, and Ruby. Nowadays, almost all new languages adopt or build upon C-style string syntax. Languages that lack this syntax tend to precede C. ====Backslash escapes==== {{main|Escape sequences in C}} Because certain characters cannot be part of a literal string expression directly, they are instead identified by an escape sequence starting with a backslash ({{code|\}}). For example, the backslashes in {{code|"This string contains \"double quotes\"."}} indicate (to the compiler) that the inner pair of quotes are intended as an actual part of the string, rather than the default reading as a delimiter (endpoint) of the string itself. Backslashes may be used to enter various control characters, etc., into a string: {| class="wikitable" ! align="left" |Escape ! align="left" |Meaning |- | {{code|\\}} || Literal backslash |- | {{code|\"}} || Double quote |- | {{code|\'}} || Single quote |- | {{code|\n}} || Newline (line feed) |- | {{code|\r}} || Carriage return |- | {{code|\b}} || Backspace |- | {{code|\t}} || Horizontal tab |- | {{code|\f}} || Form feed |- | {{code|\a}} || Alert (bell) |- | {{code|\v}} || Vertical tab |- | {{code|\?}} || Question mark (used to escape [[C trigraph|trigraphs]], obsolete feature dropped in C23) |- | <code>\''OOO''</code> || Character with octal value ''OOO'' (where ''OOO'' is 1-3 octal digits, '0'-'7') |- | <code>\x''hh''</code> || Character with hexadecimal value ''hh'' (where ''hh'' is 1 or more hex digits, '0'-'9','A'-'F','a'-'f') |- | <code>\u''hhhh''</code> || [[Unicode]] [[code point]] below 10000 hexadecimal (added in C99) |- | <code>\U''hhhhhhhh''</code> || Unicode code point where ''hhhhhhhh'' is eight hexadecimal digits (added in C99) |} The use of other backslash escapes is not defined by the C standard, although compiler vendors often provide additional escape codes as language extensions. One of these is the escape sequence <code>\e</code> for the [[escape character]] with ASCII hex value 1B which was not added to the C standard due to lacking representation in other [[character set]]s (such as [[EBCDIC]]). It is available in [[GNU Compiler Collection|GCC]], [[clang]] and [[Tiny C Compiler|tcc]]. Note that [[printf format string]]s use {{code|%%}} to represent literal {{code|%}} character; there is no {{code|\%}} escape sequence in standard C. ====String literal concatenation==== C has [[string literal concatenation]], meaning that adjacent string literals are concatenated at compile time; this allows long strings to be split over multiple lines, and also allows string literals resulting from [[C preprocessor]] defines and macros to be appended to strings at compile time: <syntaxhighlight lang=C> printf(__FILE__ ": %d: Hello " "world\n", __LINE__); </syntaxhighlight> will expand to <syntaxhighlight lang=C> printf("helloworld.c" ": %d: Hello " "world\n", 10); </syntaxhighlight> which is syntactically equivalent to <syntaxhighlight lang=C> printf("helloworld.c: %d: Hello world\n", 10); </syntaxhighlight> ====Character constants==== Individual character constants are single-quoted, e.g. {{code|'A'}}, and have type {{code|int}} (in C++, {{code|char}}). The difference is that {{code|"A"}} represents a null-terminated array of two characters, 'A' and '\0', whereas {{code|'A'}} directly represents the character value (65 if ASCII is used). The same backslash-escapes are supported as for strings, except that (of course) {{code|"}} can validly be used as a character without being escaped, whereas {{code|'}} must now be escaped. A character constant cannot be empty (i.e. {{code|''}} is invalid syntax), although a string may be (it still has the null terminating character). Multi-character constants (e.g. {{code|'xy'}}) are valid, although rarely useful — they let one store several characters in an integer (e.g. 4 ASCII characters can fit in a 32-bit integer, 8 in a 64-bit one). Since the order in which the characters are packed into an {{code|int}} is not specified (left to the implementation to define), portable use of multi-character constants is difficult. Nevertheless, in situations limited to a specific platform and the compiler implementation, multicharacter constants do find their use in specifying signatures. One common use case is the [[OSType]], where the combination of Classic Mac OS compilers and its inherent big-endianness means that bytes in the integer appear in the exact order of characters defined in the literal. The definition by popular "implementations" are in fact consistent: in GCC, Clang, and [[Visual C++]], {{code|'1234'}} yields <code>0x3'''1'''3'''2'''3'''3'''3'''4'''</code> under ASCII.<ref>{{cite web |title=The C Preprocessor: Implementation-defined behavior |url=https://gcc.gnu.org/onlinedocs/cpp/Implementation-defined-behavior.html |website=gcc.gnu.org}}</ref><ref>{{cite web |title=String and character literals (C++) |url=https://docs.microsoft.com/en-us/cpp/cpp/string-and-character-literals-cpp?view=vs-2019#code-try-2 |website=Visual C++ 19 Documentation |access-date=20 November 2019 |language=en-us}}</ref> Like string literals, character constants can also be modified by prefixes, for example {{code|L'A'}} has type {{code|wchar_t}} and represents the character value of "A" in the wide character encoding. ====Wide character strings==== Since type {{code|char}} is 1 byte wide, a single {{code|char}} value typically can represent at most 255 distinct character codes, not nearly enough for all the characters in use worldwide. To provide better support for international characters, the first C standard (C89) introduced [[wide character]]s (encoded in type {{code|wchar_t}}) and wide character strings, which are written as {{code|L"Hello world!"}} Wide characters are most commonly either 2 bytes (using a 2-byte encoding such as [[UTF-16]]) or 4 bytes (usually [[UTF-32]]), but Standard C does not specify the width for {{code|wchar_t}}, leaving the choice to the implementor. [[Microsoft Windows]] generally uses UTF-16, thus the above string would be 26 bytes long for a Microsoft compiler; the [[Unix]] world prefers UTF-32<!-- dubious?! See also new in C23: char8_t type for storing UTF-8 encoded data -->, thus compilers such as GCC would generate a 52-byte string. A 2-byte wide {{code|wchar_t}} suffers the same limitation as {{code|char}}, in that certain characters (those outside the [[Basic Multilingual Plane|BMP]]) cannot be represented in a single {{code|wchar_t}}; but must be represented using [[surrogate pair]]s. The original C standard specified only minimal functions for operating with wide character strings; in 1995 the standard was modified to include much more extensive support, comparable to that for {{code|char}} strings. The relevant functions are mostly named after their {{code|char}} equivalents, with the addition of a "w" or the replacement of "str" with "wcs"; they are specified in {{code|<wchar.h>}}, with {{code|<wctype.h>}} containing wide-character classification and mapping functions. The now generally recommended method<ref group="note">see [[UTF-8]] first section for references</ref> of supporting international characters is through [[UTF-8]], which is stored in {{code|char}} arrays, and can be written directly in the source code if using a UTF-8 editor, because UTF-8 is a direct [[Extended ASCII|ASCII extension]]. ====Variable width strings==== A common alternative to {{code|wchar_t}} is to use a [[variable-width encoding]], whereby a logical character may extend over multiple positions of the string. Variable-width strings may be encoded into literals verbatim, at the risk of confusing the compiler, or using numerical backslash escapes (e.g. {{code|"\xc3\xa9"}} for "é" in UTF-8). The [[UTF-8]] encoding was specifically designed (under [[Plan 9 from Bell Labs|Plan 9]]) for compatibility with the standard library string functions; supporting features of the encoding include a lack of embedded nulls, no valid interpretations for subsequences, and trivial resynchronisation. Encodings lacking these features are likely to prove incompatible with the standard library functions; encoding-aware string functions are often used in such cases. ====Library functions==== [[String (computer science)|Strings]], both constant and variable, can be manipulated without using the [[standard library]]. However, the library contains many [[C string handling|useful functions]] for working with null-terminated strings. ===Structures and unions=== ====Structures==== Structures and [[Union type|unions]] in C are defined as data containers consisting of a sequence of named members of various types. They are similar to records in other programming languages. The members of a structure are stored in consecutive locations in memory, although the compiler is allowed to insert padding between or after members (but not before the first member) for efficiency or as padding required for proper [[data structure alignment|alignment]] by the target architecture. The size of a structure is equal to the sum of the sizes of its members, plus the size of the padding. ====Unions==== Unions in C are related to structures and are defined as objects that may hold (at different times) objects of different types and sizes. They are analogous to variant records in other programming languages. Unlike structures, the components of a union all refer to the same location in memory. In this way, a union can be used at various times to hold different types of objects, without the need to create a separate object for each new type. The size of a union is equal to the size of its largest component type. ====Declaration==== Structures are declared with the [[struct (C programming language)|{{code|struct}}]] keyword and unions are declared with the {{code|union}} keyword. The specifier keyword is followed by an optional identifier name, which is used to identify the form of the structure or union. The identifier is followed by the declaration of the structure or union's body: a list of member declarations, contained within curly braces, with each declaration terminated by a semicolon. Finally, the declaration concludes with an optional list of identifier names, which are declared as instances of the structure or union. For example, the following statement declares a structure named {{code|s}} that contains three members; it will also declare an instance of the structure known as {{code|tee}}: <syntaxhighlight lang=C> struct s { int x; float y; char *z; } tee; </syntaxhighlight> And the following statement will declare a similar union named {{code|u}} and an instance of it named {{code|n}}: <syntaxhighlight lang=C> union u { int x; float y; char *z; } n; </syntaxhighlight> Members of structures and unions cannot have an incomplete or function type. Thus members cannot be an instance of the structure or union being declared (because it is incomplete at that point) but can be pointers to the type being declared. Once a structure or union body has been declared and given a name, it can be considered a new data type using the specifier {{code|struct}} or {{code|union}}, as appropriate, and the name. For example, the following statement, given the above structure declaration, declares a new instance of the structure {{code|s}} named {{code|r}}: <syntaxhighlight lang=C>struct s r;</syntaxhighlight> It is also common to use the <code>[[typedef]]</code> specifier to eliminate the need for the {{code|struct}} or {{code|union}} keyword in later references to the structure. The first identifier after the body of the structure is taken as the new name for the structure type (structure instances may not be declared in this context). For example, the following statement will declare a new type known as ''s_type'' that will contain some structure: <syntaxhighlight lang=C>typedef struct {...} s_type;</syntaxhighlight> Future statements can then use the specifier ''s_type'' (instead of the expanded {{code|struct}} ... specifier) to refer to the structure. ====Accessing members==== Members are accessed using the name of the instance of a structure or union, a period ({{code|.}}), and the name of the member. For example, given the declaration of ''tee'' from above, the member known as ''y'' (of type {{code|float}}) can be accessed using the following syntax: <syntaxhighlight lang=C>tee.y</syntaxhighlight> Structures are commonly accessed through pointers. Consider the following example that defines a pointer to ''tee'', known as ''ptr_to_tee'': <syntaxhighlight lang=C>struct s *ptr_to_tee = &tee;</syntaxhighlight> Member ''y'' of ''tee'' can then be accessed by dereferencing ''ptr_to_tee'' and using the result as the left operand: <syntaxhighlight lang=C>(*ptr_to_tee).y</syntaxhighlight> Which is identical to the simpler {{code|tee.y}} above as long as ''ptr_to_tee'' points to ''tee''. Due to [[Operators in C and C++#Operator precedence|operator precedence]] ("." being higher than "*"), the shorter <code>*ptr_to_tee.y</code> is incorrect for this purpose, instead being parsed as <code>*(ptr_to_tee.y)</code> and thus the parentheses are necessary. Because this operation is common, C provides an [[syntactic sugar|abbreviated syntax]] for accessing a member directly from a pointer. With this syntax, the name of the instance is replaced with the name of the pointer and the period is replaced with the character sequence {{code|->}}. Thus, the following method of accessing ''y'' is identical to the previous two: <syntaxhighlight lang=C>ptr_to_tee->y</syntaxhighlight> Members of unions are accessed in the same way. This can be chained; for example, in a linked list, one may refer to <code>n->next->next</code> for the second following node (assuming that <code>n->next</code> is not null). ====Assignment==== Assigning values to individual members of structures and unions is syntactically identical to assigning values to any other object. The only difference is that the ''lvalue'' of the assignment is the name of the member, as accessed by the syntax mentioned above. A structure can also be assigned as a unit to another structure of the same type. Structures (and pointers to structures) may also be used as function parameter and return types. For example, the following statement assigns the value of 74 (the ASCII code point for the letter 't') to the member named ''x'' in the structure ''tee'', from above: <syntaxhighlight lang=C>tee.x = 74;</syntaxhighlight> And the same assignment, using ''ptr_to_tee'' in place of ''tee'', would look like: <syntaxhighlight lang=C>ptr_to_tee->x = 74;</syntaxhighlight> Assignment with members of unions is identical. ====Other operations==== According to the C standard, the only legal operations that can be performed on a structure are copying it, assigning to it as a unit (or initializing it), taking its address with the address-of ({{code|&}}) unary operator, and accessing its members. Unions have the same restrictions. One of the operations implicitly forbidden is comparison: structures and unions cannot be compared using C's standard comparison facilities ({{code|1===}}, {{code|>}}, {{code|<}}, etc.). ====Bit fields==== C also provides a special type of member known as a [[bit field]], which is an integer with an explicitly specified number of bits. A bit field is declared as a structure (or union) member of type {{code|int}}, {{code|signed int}}, {{code|unsigned int}}, or {{code|_Bool}}<!-- Add bool and _BitInt(N) when updating the page for C23 -->,<ref group="note">Other implementation-defined types are also allowed. C++ allows using all integral and enumerated types and a lot of C compilers do the same.</ref> following the member name by a colon ({{code|:}}) and the number of bits it should occupy. The total number of bits in a single bit field must not exceed the total number of bits in its declared type (this is allowed in C++ however, where the extra bits are used for padding). As a special exception to the usual C syntax rules, it is implementation-defined whether a bit field declared as type {{code|int}}, without specifying {{code|signed}} or {{code|unsigned}}, is signed or unsigned. Thus, it is recommended to explicitly specify {{code|signed}} or {{code|unsigned}} on all structure members for portability. Unnamed fields consisting of just a colon followed by a number of bits are also allowed; these indicate [[data padding|padding]]. Specifying a width of zero for an unnamed field is used to force [[data structure alignment|alignment]] to a new word.<ref>Kernighan & Richie</ref> Since all members of a union occupy the same memory, unnamed bit-fields of width zero do nothing in unions, however unnamed bit-fields of non zero width can change the size of the union since they have to fit in it. The members of bit fields do not have addresses, and as such cannot be used with the address-of ({{code|&}}) unary operator. The {{code|sizeof}} operator may not be applied to bit fields. The following declaration declares a new structure type known as {{code|f}} and an instance of it known as {{code|g}}. Comments provide a description of each of the members: <syntaxhighlight lang=C> struct f { unsigned int flag : 1; /* a bit flag: can either be on (1) or off (0) */ signed int num : 4; /* a signed 4-bit field; range -7...7 or -8...7 */ signed int : 3; /* 3 bits of padding to round out to 8 bits */ } g; </syntaxhighlight> ===Initialization=== Default initialization depends on the [[#Storage class specifiers|storage class specifier]], described above. Because of the language's grammar, a scalar initializer may be enclosed in any number of curly brace pairs. Most compilers issue a warning if there is more than one such pair, though. <syntaxhighlight lang=C>int x = 12; int y = { 23 }; //Legal, no warning int z = { { 34 } }; //Legal, expect a warning</syntaxhighlight> Structures, unions and arrays can be initialized in their declarations using an initializer list. Unless designators are used, the components of an initializer correspond with the elements in the order they are defined and stored, thus all preceding values must be provided before any particular element's value. Any unspecified elements are set to zero (except for unions). Mentioning too many initialization values yields an error. The following statement will initialize a new instance of the structure ''s'' known as ''pi'': <syntaxhighlight lang=C>struct s { int x; float y; char *z; }; struct s pi = { 3, 3.1415, "Pi" };</syntaxhighlight> ====Designated initializers==== Designated initializers allow members to be initialized by name, in any order, and without explicitly providing the preceding values. The following initialization is equivalent to the previous one: <syntaxhighlight lang=C>struct s pi = { .z = "Pi", .x = 3, .y = 3.1415 };</syntaxhighlight> Using a designator in an initializer moves the initialization "cursor". In the example below, if <code>MAX</code> is greater than 10, there will be some zero-valued elements in the middle of <code>a</code>; if it is less than 10, some of the values provided by the first five initializers will be overridden by the second five (if <code>MAX</code> is less than 5, there will be a compilation error): <syntaxhighlight lang=C>int a[MAX] = { 1, 3, 5, 7, 9, [MAX-5] = 8, 6, 4, 2, 0 };</syntaxhighlight> In [[C89 (C version)|C89]], a union was initialized with a single value applied to its first member. That is, the union ''u'' defined above could only have its ''int x'' member initialized: <syntaxhighlight lang=C>union u value = { 3 };</syntaxhighlight> Using a designated initializer, the member to be initialized does not have to be the first member: <syntaxhighlight lang=C>union u value = { .y = 3.1415 }; </syntaxhighlight> If an array has unknown size (i.e. the array was an [[#Incomplete_types|incomplete type]]), the number of initializers determines the size of the array and its type becomes complete: <syntaxhighlight lang=C> int x[] = { 0, 1, 2 } ;</syntaxhighlight> Compound designators can be used to provide explicit initialization when unadorned initializer lists might be misunderstood. In the example below, <code>w</code> is declared as an array of structures, each structure consisting of a member <code>a</code> (an array of 3 <code>int</code>) and a member <code>b</code> (an <code>int</code>). The initializer sets the size of <code>w</code> to 2 and sets the values of the first element of each <code>a</code>: <syntaxhighlight lang=C>struct { int a[3], b; } w[] = { [0].a = {1}, [1].a[0] = 2 };</syntaxhighlight><!-- Note: The [http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf C99 specification]'s grammar for designations reads in part: designation: designator-list = designator-list: designator designator-list designator designator: [ constant-expression ] . identifier ...which can be understood as making these code fragments legal: int x[] = { [1] [9] = 2 } ; struct t { char *name; char *nickname; } who = { .name .nickname = "Unknown" }; But that is incorrect. The designators that make up the designator-list are *not* space-separated: the grammar means to describe the kind of structure-path given in the example. --> This is equivalent to:<syntaxhighlight lang=C>struct { int a[3], b; } w[] = { { { 1, 0, 0 }, 0 }, { { 2, 0, 0 }, 0 } };</syntaxhighlight> There is no way to specify repetition of an initializer in standard C. ====Compound literals==== It is possible to borrow the initialization methodology to generate compound structure and array literals: <syntaxhighlight lang=C> // pointer created from array literal. int *ptr = (int[]){ 10, 20, 30, 40 }; // pointer to array. float (*foo)[3] = &(float[]){ 0.5f, 1.f, -0.5f }; struct s pi = (struct s){ 3, 3.1415, "Pi" }; </syntaxhighlight> Compound literals are often combined with designated initializers to make the declaration more readable:<ref name="bk21st" /> <syntaxhighlight lang=C>pi = (struct s){ .z = "Pi", .x = 3, .y = 3.1415 };</syntaxhighlight>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)