Editing Unicode (section)

=== Mapping to legacy character sets ===
Unicode was designed to provide code-point-by-code-point [[round-trip format conversion]] to and from any preexisting character encodings, so that text files in older character sets can be converted to Unicode and then back and get back the same file, without employing context-dependent interpretation. That has meant that inconsistent legacy architectures, such as [[combining character|combining diacritics]] and [[precomposed character]]s, both exist in Unicode, giving more than one method of representing some text. This is most pronounced in the three different encoding forms for Korean [[Hangul]]. Since version 3.0, any precomposed characters that can be represented by a combined sequence of already existing characters can no longer be added to the standard to preserve interoperability between software using different versions of Unicode.

[[Injective]] mappings must be provided between characters in existing legacy character sets and characters in Unicode to facilitate conversion to Unicode and allow interoperability with legacy software. Lack of consistency in various mappings between earlier Japanese encodings such as [[Shift-JIS]] or [[EUC-JP]] and Unicode led to [[round-trip format conversion]] mismatches, particularly the mapping of the character JIS X 0208 '～' (1-33, WAVE DASH), heavily used in legacy database data, to either {{unichar|FF5E|FULLWIDTH TILDE}} (in [[Microsoft Windows]]) or {{unichar|301C|WAVE DASH}} (other vendors).<ref>[http://std.dkuug.dk/jtc1/sc2/wg2/docs/n2166.doc AFII contribution about WAVE DASH], {{Cite web |date=22 April 2011 |title=An Unicode vendor-specific character table for japanese |url=http://www.ingrid.org/java/i18n/unicode.html |archive-url=https://web.archive.org/web/20110422181018/http://www.ingrid.org/java/i18n/unicode.html |archive-date=22 April 2011 |access-date=2019-05-20 }}</ref>

Some Japanese computer programmers objected to Unicode because it requires them to separate the use of {{unichar|005C|REVERSE SOLIDUS|note=backslash}} and {{unichar|00A5|YEN SIGN}}, which was mapped to 0x5C in JIS X 0201, and a lot of legacy code exists with this usage.<ref>[https://www.debian.org/doc/manuals/intro-i18n/ch-codes.en.html#s-646problem ''ISO 646-* Problem''], Section 4.4.3.5 of ''Introduction to I18n'', Tomohiro Kubota, 2001</ref> (This encoding also replaces tilde '~' 0x7E with macron '¯', now 0xAF.) The separation of these characters exists in [[ISO 8859-1]], from long before Unicode.