Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Code page
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Criticism == Many older character encodings (unlike Unicode) suffer from several problems. Some vendors insufficiently document the meaning of all code point values in their code pages, which decreases the reliability of handling textual data consistently through various computer systems. Some vendors add proprietary extensions to established code pages, to add or change certain code point values: for example, byte 0x5C in [[Shift JIS]] can represent either a [[back slash]] or a [[yen sign]] depending on the platform. Finally, in order to support several languages in a program that does not use Unicode, the code page used for each string/document needs to be stored. Applications may also mislabel text in [[Windows-1252]] as [[ISO-8859-1]]. The only difference between these code pages is that the code point values in the range 0x80{{ndash}}0x9F, used by ISO-8859-1 for control characters, are instead used as additional printable characters in Windows-1252{{snd}} notably for [[quotation marks]], the [[euro sign]] and the [[trademark symbol]] among others. Browsers on non-Windows platforms would tend to show empty boxes or question marks for these characters, making the text hard to read. Most browsers fixed this by ignoring the character set and interpreting as Windows-1252 to look acceptable. In HTML5, treating ISO-8859-1 as Windows-1252 is even codified as a [[W3C]] standard.<ref>{{cite web |url=https://encoding.spec.whatwg.org/#names-and-labels |title=Encoding |at=sec. 4.2 Names and labels |publisher=[[WHATWG]] |date=27 January 2015 |access-date=4 February 2015 |archive-url=https://web.archive.org/web/20150204174315/https://encoding.spec.whatwg.org/#names-and-labels |archive-date=4 February 2015 |url-status=live}}</ref> Although browsers were typically programmed to deal with this behaviour, this was not always true of other software. Consequently, when receiving a file transfer from a Windows system, non-Windows platforms would either ignore these characters or treat them as a standard control characters and attempt to take the specified control action accordingly. Due to Unicode's extensive documentation, vast repertoire of characters and stability policy of characters, the problems listed above are rarely a concern for Unicode. [[UTF-8]] (which can encode over one million codepoints) has replaced the code-page method in terms of popularity on the Internet.<ref name="Statistics"/><ref name="Statistics_UTF-8"/>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)