Editing C syntax (section)

====Wide character strings====
Since type {{code|char}} is 1 byte wide, a single {{code|char}} value typically can represent at most 255 distinct character codes, not nearly enough for all the characters in use worldwide. To provide better support for international characters, the first C standard (C89) introduced [[wide character]]s (encoded in type {{code|wchar_t}}) and wide character strings, which are written as {{code|L"Hello world!"}}

Wide characters are most commonly either 2 bytes (using a 2-byte encoding such as [[UTF-16]]) or 4 bytes (usually [[UTF-32]]), but Standard C does not specify the width for {{code|wchar_t}}, leaving the choice to the implementor. [[Microsoft Windows]] generally uses UTF-16, thus the above string would be 26 bytes long for a Microsoft compiler; the [[Unix]] world prefers UTF-32<!-- dubious?! See also new in C23:  char8_t type for storing UTF-8 encoded data -->, thus compilers such as GCC would generate a 52-byte string. A 2-byte wide {{code|wchar_t}} suffers the same limitation as {{code|char}}, in that certain characters (those outside the [[Basic Multilingual Plane|BMP]]) cannot be represented in a single {{code|wchar_t}}; but must be represented using [[surrogate pair]]s.

The original C standard specified only minimal functions for operating with wide character strings; in 1995 the standard was modified to include much more extensive support, comparable to that for {{code|char}} strings. The relevant functions are mostly named after their {{code|char}} equivalents, with the addition of a "w" or the replacement of "str" with "wcs"; they are specified in {{code|<wchar.h>}}, with {{code|<wctype.h>}} containing wide-character classification and mapping functions.

The now generally recommended method<ref group="note">see [[UTF-8]] first section for references</ref> of supporting international characters is through [[UTF-8]], which is stored in {{code|char}} arrays, and can be written directly in the source code if using a UTF-8 editor, because UTF-8 is a direct [[Extended ASCII|ASCII extension]].