Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
C syntax
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
====Wide character strings==== Since type {{code|char}} is 1 byte wide, a single {{code|char}} value typically can represent at most 255 distinct character codes, not nearly enough for all the characters in use worldwide. To provide better support for international characters, the first C standard (C89) introduced [[wide character]]s (encoded in type {{code|wchar_t}}) and wide character strings, which are written as {{code|L"Hello world!"}} Wide characters are most commonly either 2 bytes (using a 2-byte encoding such as [[UTF-16]]) or 4 bytes (usually [[UTF-32]]), but Standard C does not specify the width for {{code|wchar_t}}, leaving the choice to the implementor. [[Microsoft Windows]] generally uses UTF-16, thus the above string would be 26 bytes long for a Microsoft compiler; the [[Unix]] world prefers UTF-32<!-- dubious?! See also new in C23: char8_t type for storing UTF-8 encoded data -->, thus compilers such as GCC would generate a 52-byte string. A 2-byte wide {{code|wchar_t}} suffers the same limitation as {{code|char}}, in that certain characters (those outside the [[Basic Multilingual Plane|BMP]]) cannot be represented in a single {{code|wchar_t}}; but must be represented using [[surrogate pair]]s. The original C standard specified only minimal functions for operating with wide character strings; in 1995 the standard was modified to include much more extensive support, comparable to that for {{code|char}} strings. The relevant functions are mostly named after their {{code|char}} equivalents, with the addition of a "w" or the replacement of "str" with "wcs"; they are specified in {{code|<wchar.h>}}, with {{code|<wctype.h>}} containing wide-character classification and mapping functions. The now generally recommended method<ref group="note">see [[UTF-8]] first section for references</ref> of supporting international characters is through [[UTF-8]], which is stored in {{code|char}} arrays, and can be written directly in the source code if using a UTF-8 editor, because UTF-8 is a direct [[Extended ASCII|ASCII extension]].
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)