Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Unicode
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== Mapping to legacy character sets === Unicode was designed to provide code-point-by-code-point [[round-trip format conversion]] to and from any preexisting character encodings, so that text files in older character sets can be converted to Unicode and then back and get back the same file, without employing context-dependent interpretation. That has meant that inconsistent legacy architectures, such as [[combining character|combining diacritics]] and [[precomposed character]]s, both exist in Unicode, giving more than one method of representing some text. This is most pronounced in the three different encoding forms for Korean [[Hangul]]. Since version 3.0, any precomposed characters that can be represented by a combined sequence of already existing characters can no longer be added to the standard to preserve interoperability between software using different versions of Unicode. [[Injective]] mappings must be provided between characters in existing legacy character sets and characters in Unicode to facilitate conversion to Unicode and allow interoperability with legacy software. Lack of consistency in various mappings between earlier Japanese encodings such as [[Shift-JIS]] or [[EUC-JP]] and Unicode led to [[round-trip format conversion]] mismatches, particularly the mapping of the character JIS X 0208 'ο½' (1-33, WAVE DASH), heavily used in legacy database data, to either {{unichar|FF5E|FULLWIDTH TILDE}} (in [[Microsoft Windows]]) or {{unichar|301C|WAVE DASH}} (other vendors).<ref>[http://std.dkuug.dk/jtc1/sc2/wg2/docs/n2166.doc AFII contribution about WAVE DASH], {{Cite web |date=22 April 2011 |title=An Unicode vendor-specific character table for japanese |url=http://www.ingrid.org/java/i18n/unicode.html |archive-url=https://web.archive.org/web/20110422181018/http://www.ingrid.org/java/i18n/unicode.html |archive-date=22 April 2011 |access-date=2019-05-20 }}</ref> Some Japanese computer programmers objected to Unicode because it requires them to separate the use of {{unichar|005C|REVERSE SOLIDUS|note=backslash}} and {{unichar|00A5|YEN SIGN}}, which was mapped to 0x5C in JIS X 0201, and a lot of legacy code exists with this usage.<ref>[https://www.debian.org/doc/manuals/intro-i18n/ch-codes.en.html#s-646problem ''ISO 646-* Problem''], Section 4.4.3.5 of ''Introduction to I18n'', Tomohiro Kubota, 2001</ref> (This encoding also replaces tilde '~' 0x7E with macron 'Β―', now 0xAF.) The separation of these characters exists in [[ISO 8859-1]], from long before Unicode.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)