Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Wide character
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Size of a wide character== Early adoption of [[UCS-2]] ("Unicode 1.0") led to common use of [[UTF-16]] in a number of platforms, most notably [[Microsoft Windows]], [[.NET]] and [[Java (software platform)|Java]]. In these systems, it is common to have a "wide character" ({{code|wchar_t}} in C/C++; {{code|char}} in Java) type of 16-bits. These types do not always map directly to one "character", as [[surrogate pairs]] are required to store the full range of Unicode (1996, Unicode 2.0).<ref>{{cite web |url=http://msdn.microsoft.com/en-us/goglobal/bb688113.aspx |title=Globalization Step-by-Step: Unicode Enabled |website=msdn.microsoft.com |url-status=dead |archive-url=https://web.archive.org/web/20090101025155/http://msdn.microsoft.com/en-us/goglobal/bb688113.aspx |archive-date=2009-01-01}}</ref><ref>{{cite web |title=String Class (System) |url=https://learn.microsoft.com/en-us/dotnet/api/system.string?view=net-7.0 |website=learn.microsoft.com |language=en-us}}</ref><ref>{{cite web |title=Primitive Data Types (The Javaβ’ Tutorials > Learning the Java Language > Language Basics) |url=https://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html |website=docs.oracle.com}}</ref> [[Unix-like]] generally use a 32-bit {{code|wchar_t}} to fit the 21-bit Unicode code point, as C90 prescribed.<ref>{{cite web |title=Null-terminated wide strings <wctype.h> - cppreference.com |url=https://en.cppreference.com/w/c/string/wide |website=en.cppreference.com}}</ref> The size of a wide character type does not dictate what kind of text encodings a system can process, as conversions are available. (Old conversion code commonly overlook surrogates, however.) The historical circumstances of their adoption does also decide what types of encoding they ''prefer''. A system influenced by Unicode 1.0, such as Windows, tends to mainly use "wide strings" made out of wide character units. Other systems such as the Unix-likes, however, tend to retain the 8-bit "narrow string" convention, using a multibyte encoding (almost universally UTF-8) to handle "wide" characters.<ref>{{cite web |title=UTF-8 Everywhere |url=http://utf8everywhere.org/ |quote=In the following years many systems have added support for Unicode and switched to the UCS-2 encoding. It was especially attractive for new technologies, such as the Qt framework (1992), Windows NT 3.1 (1993) and Java (1995).}}</ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)