Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
C syntax
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
===Strings=== {{main | C string handling}} In C, string literals are surrounded by double quotes ({{code|"}}) (e.g., {{code|"Hello world!"}}) and are compiled to an array of the specified {{code|char}} values with an additional [[null terminating character]] (0-valued) code to mark the end of the string. [[String literal]]s may not contain embedded newlines; this proscription somewhat simplifies parsing of the language. To include a newline in a string, the [[#Backslash escapes|backslash escape]] {{code|\n}} may be used, as below. There are several standard library functions for operating with string data (not necessarily constant) organized as array of {{code|char}} using this null-terminated format; see [[#Library functions|below]]. C's string-literal syntax has been very influential, and has made its way into many other languages, such as C++, Objective-C, Perl, Python, PHP, Java, JavaScript, C#, and Ruby. Nowadays, almost all new languages adopt or build upon C-style string syntax. Languages that lack this syntax tend to precede C. ====Backslash escapes==== {{main|Escape sequences in C}} Because certain characters cannot be part of a literal string expression directly, they are instead identified by an escape sequence starting with a backslash ({{code|\}}). For example, the backslashes in {{code|"This string contains \"double quotes\"."}} indicate (to the compiler) that the inner pair of quotes are intended as an actual part of the string, rather than the default reading as a delimiter (endpoint) of the string itself. Backslashes may be used to enter various control characters, etc., into a string: {| class="wikitable" ! align="left" |Escape ! align="left" |Meaning |- | {{code|\\}} || Literal backslash |- | {{code|\"}} || Double quote |- | {{code|\'}} || Single quote |- | {{code|\n}} || Newline (line feed) |- | {{code|\r}} || Carriage return |- | {{code|\b}} || Backspace |- | {{code|\t}} || Horizontal tab |- | {{code|\f}} || Form feed |- | {{code|\a}} || Alert (bell) |- | {{code|\v}} || Vertical tab |- | {{code|\?}} || Question mark (used to escape [[C trigraph|trigraphs]], obsolete feature dropped in C23) |- | <code>\''OOO''</code> || Character with octal value ''OOO'' (where ''OOO'' is 1-3 octal digits, '0'-'7') |- | <code>\x''hh''</code> || Character with hexadecimal value ''hh'' (where ''hh'' is 1 or more hex digits, '0'-'9','A'-'F','a'-'f') |- | <code>\u''hhhh''</code> || [[Unicode]] [[code point]] below 10000 hexadecimal (added in C99) |- | <code>\U''hhhhhhhh''</code> || Unicode code point where ''hhhhhhhh'' is eight hexadecimal digits (added in C99) |} The use of other backslash escapes is not defined by the C standard, although compiler vendors often provide additional escape codes as language extensions. One of these is the escape sequence <code>\e</code> for the [[escape character]] with ASCII hex value 1B which was not added to the C standard due to lacking representation in other [[character set]]s (such as [[EBCDIC]]). It is available in [[GNU Compiler Collection|GCC]], [[clang]] and [[Tiny C Compiler|tcc]]. Note that [[printf format string]]s use {{code|%%}} to represent literal {{code|%}} character; there is no {{code|\%}} escape sequence in standard C. ====String literal concatenation==== C has [[string literal concatenation]], meaning that adjacent string literals are concatenated at compile time; this allows long strings to be split over multiple lines, and also allows string literals resulting from [[C preprocessor]] defines and macros to be appended to strings at compile time: <syntaxhighlight lang=C> printf(__FILE__ ": %d: Hello " "world\n", __LINE__); </syntaxhighlight> will expand to <syntaxhighlight lang=C> printf("helloworld.c" ": %d: Hello " "world\n", 10); </syntaxhighlight> which is syntactically equivalent to <syntaxhighlight lang=C> printf("helloworld.c: %d: Hello world\n", 10); </syntaxhighlight> ====Character constants==== Individual character constants are single-quoted, e.g. {{code|'A'}}, and have type {{code|int}} (in C++, {{code|char}}). The difference is that {{code|"A"}} represents a null-terminated array of two characters, 'A' and '\0', whereas {{code|'A'}} directly represents the character value (65 if ASCII is used). The same backslash-escapes are supported as for strings, except that (of course) {{code|"}} can validly be used as a character without being escaped, whereas {{code|'}} must now be escaped. A character constant cannot be empty (i.e. {{code|''}} is invalid syntax), although a string may be (it still has the null terminating character). Multi-character constants (e.g. {{code|'xy'}}) are valid, although rarely useful β they let one store several characters in an integer (e.g. 4 ASCII characters can fit in a 32-bit integer, 8 in a 64-bit one). Since the order in which the characters are packed into an {{code|int}} is not specified (left to the implementation to define), portable use of multi-character constants is difficult. Nevertheless, in situations limited to a specific platform and the compiler implementation, multicharacter constants do find their use in specifying signatures. One common use case is the [[OSType]], where the combination of Classic Mac OS compilers and its inherent big-endianness means that bytes in the integer appear in the exact order of characters defined in the literal. The definition by popular "implementations" are in fact consistent: in GCC, Clang, and [[Visual C++]], {{code|'1234'}} yields <code>0x3'''1'''3'''2'''3'''3'''3'''4'''</code> under ASCII.<ref>{{cite web |title=The C Preprocessor: Implementation-defined behavior |url=https://gcc.gnu.org/onlinedocs/cpp/Implementation-defined-behavior.html |website=gcc.gnu.org}}</ref><ref>{{cite web |title=String and character literals (C++) |url=https://docs.microsoft.com/en-us/cpp/cpp/string-and-character-literals-cpp?view=vs-2019#code-try-2 |website=Visual C++ 19 Documentation |access-date=20 November 2019 |language=en-us}}</ref> Like string literals, character constants can also be modified by prefixes, for example {{code|L'A'}} has type {{code|wchar_t}} and represents the character value of "A" in the wide character encoding. ====Wide character strings==== Since type {{code|char}} is 1 byte wide, a single {{code|char}} value typically can represent at most 255 distinct character codes, not nearly enough for all the characters in use worldwide. To provide better support for international characters, the first C standard (C89) introduced [[wide character]]s (encoded in type {{code|wchar_t}}) and wide character strings, which are written as {{code|L"Hello world!"}} Wide characters are most commonly either 2 bytes (using a 2-byte encoding such as [[UTF-16]]) or 4 bytes (usually [[UTF-32]]), but Standard C does not specify the width for {{code|wchar_t}}, leaving the choice to the implementor. [[Microsoft Windows]] generally uses UTF-16, thus the above string would be 26 bytes long for a Microsoft compiler; the [[Unix]] world prefers UTF-32<!-- dubious?! See also new in C23: char8_t type for storing UTF-8 encoded data -->, thus compilers such as GCC would generate a 52-byte string. A 2-byte wide {{code|wchar_t}} suffers the same limitation as {{code|char}}, in that certain characters (those outside the [[Basic Multilingual Plane|BMP]]) cannot be represented in a single {{code|wchar_t}}; but must be represented using [[surrogate pair]]s. The original C standard specified only minimal functions for operating with wide character strings; in 1995 the standard was modified to include much more extensive support, comparable to that for {{code|char}} strings. The relevant functions are mostly named after their {{code|char}} equivalents, with the addition of a "w" or the replacement of "str" with "wcs"; they are specified in {{code|<wchar.h>}}, with {{code|<wctype.h>}} containing wide-character classification and mapping functions. The now generally recommended method<ref group="note">see [[UTF-8]] first section for references</ref> of supporting international characters is through [[UTF-8]], which is stored in {{code|char}} arrays, and can be written directly in the source code if using a UTF-8 editor, because UTF-8 is a direct [[Extended ASCII|ASCII extension]]. ====Variable width strings==== A common alternative to {{code|wchar_t}} is to use a [[variable-width encoding]], whereby a logical character may extend over multiple positions of the string. Variable-width strings may be encoded into literals verbatim, at the risk of confusing the compiler, or using numerical backslash escapes (e.g. {{code|"\xc3\xa9"}} for "Γ©" in UTF-8). The [[UTF-8]] encoding was specifically designed (under [[Plan 9 from Bell Labs|Plan 9]]) for compatibility with the standard library string functions; supporting features of the encoding include a lack of embedded nulls, no valid interpretations for subsequences, and trivial resynchronisation. Encodings lacking these features are likely to prove incompatible with the standard library functions; encoding-aware string functions are often used in such cases. ====Library functions==== [[String (computer science)|Strings]], both constant and variable, can be manipulated without using the [[standard library]]. However, the library contains many [[C string handling|useful functions]] for working with null-terminated strings.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)