Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Unicode
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== Ready-made versus composite characters === Unicode includes a mechanism for modifying characters that greatly extends the supported repertoire of glyphs. This covers the use of [[combining diacritical mark]]s that may be added after the base character by the user. Multiple combining diacritics may be simultaneously applied to the same character. Unicode also contains [[precomposed character|precomposed]] versions of most letter/diacritic combinations in normal use. These make the conversion to and from legacy encodings simpler, and allow applications to use Unicode as an internal text format without having to implement combining characters. For example, <code>Γ©</code> can be represented in Unicode as {{unichar|65|LATIN SMALL LETTER E}} followed by {{unichar|301|COMBINING ACUTE ACCENT|cwith=β}}), and equivalently as the precomposed character {{unichar|E9|LATIN SMALL LETTER E WITH ACUTE}}. Thus, users often have multiple equivalent ways of encoding the same character. The mechanism of [[canonical equivalence]] within ''The Unicode Standard'' ensures the practical interchangeability of these equivalent encodings. An example of this arises with the Korean alphabet [[Hangul]]: Unicode provides a mechanism for composing Hangul syllables from their individual [[Hangul Jamo]] subcomponents. However, it also provides {{val|11172}} combinations of precomposed syllables made from the most common jamo. [[CJK characters]] presently only have codes for uncomposable radicals and precomposed forms. Most Han characters have either been intentionally composed from, or reconstructed as compositions of, simpler orthographic elements called [[Radical (Chinese characters)|radicals]], so in principle Unicode could have enabled their composition as it did with Hangul. While this could have greatly reduced the number of required code points, as well as allowing the algorithmic synthesis of many arbitrary new characters, the complexities of character etymologies and the post-hoc nature of radical systems add immense complexity to the proposal. Indeed, attempts to design CJK encodings on the basis of composing radicals have been met with difficulties resulting from the reality that Chinese characters do not decompose as simply or as regularly as Hangul does. The [[CJK Radicals Supplement]] block is assigned to the range {{tt|U+2E80}}β{{tt|U+2EFF}}, and the [[Kangxi radicals]] are assigned to {{tt|U+2F00}}β{{tt|U+2FDF}}. The [[Ideographic Description Sequences]] block covers the range {{tt|U+2FF0}}β{{tt|U+2FFB}}, but ''The Unicode Standard'' warns against using its characters as an alternate representation for characters encoded elsewhere: {{blockquote|This process is different from a formal ''encoding'' of an ideograph. There is no canonical description of unencoded ideographs; there is no semantic assigned to described ideographs; there is no equivalence defined for described ideographs. Conceptually, ideographic descriptions are more akin to the English phrase "an 'e' with an acute accent on it" than to the character sequence <U+0065, U+0301>.}}
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)