Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Letter case
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
===Unicode case folding and script identification=== [[Unicode]] defines case folding through the three case-mapping properties of each [[Character (computing)|character]]: upper case, lower case, and title case (in this context, "title case" relates to [[Typographic ligature|ligature]]s and [[Digraph (orthography)|digraph]]s encoded as mixed-case [[Digraph (orthography)#In Unicode|single characters]], in which the first component is in upper case and the second component in lower case).<ref>{{cite web | url = http://unicode.org/faq/casemap_charprop.html#4 | title = Character Properties, Case Mappings & Names FAQ | publisher = Unicode | access-date = 19 February 2017}}</ref> These properties relate all characters in scripts with differing cases to the other case variants of the character. As briefly discussed in [[Unicode]] Technical Note #26,<ref name="Unicode" /> "In terms of implementation issues, any attempt at a unification of Latin, Greek, and Cyrillic would wreak havoc [and] make casing operations an unholy mess, in effect making all casing operations context sensitive […]". In other words, while the shapes of letters like '''A''', '''B''', '''E''', '''H''', '''K''', '''M''', '''O''', '''P''', '''T''', '''X''', '''Y''' and so on are shared between the Latin, Greek, and Cyrillic alphabets (and small differences in their canonical forms may be considered to be of a merely [[Typography|typographical]] nature), it would still be problematic for a multilingual [[character set]] or a [[font]] to provide only a ''single'' [[code point]] for, say, uppercase letter '''B''', as this would make it quite difficult for a wordprocessor to change that single uppercase letter to one of the three different choices for the lower-case letter, the Latin '''b''' (U+0062), Greek '''β''' (U+03B2) or Cyrillic '''в''' (U+0432). Therefore, the corresponding Latin, Greek and Cyrillic upper-case letters (U+0042, U+0392 and U+0412, respectively) are also encoded as separate characters, despite their appearance being identical. Without letter case, a "unified European alphabet"{{spaced ndash}}such as '''ABБCГDΔΕЄЗFΦGHIИJ'''...'''Z''', with an appropriate subset for each language{{spaced ndash}}is feasible; but considering letter case, it becomes very clear that these alphabets are rather distinct sets of symbols.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)