Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Unicode
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==== Localised case pairs ==== For use in the [[Turkish alphabet]] and [[Azeri alphabet]], Unicode includes a separate [[dotless I|dotless lowercase {{serif|I}}]] (ı) and a [[İ|dotted uppercase {{serif|I}}]] ({{serif|İ}}). However, the usual ASCII letters are used for the lowercase dotted {{serif|I}} and the uppercase dotless {{serif|I}}, matching how they are handled in the earlier [[ISO 8859-9]]. As such, case-insensitive comparisons for those languages have to use different rules than case-insensitive comparisons for other languages using the Latin script.<ref>{{cite web |url=https://unicode.org/Public/UNIDATA/CaseFolding.txt |work=Unicode Character Database |title=Case Folding Properties |institution=[[Unicode Consortium]] |date=2023-05-12}}</ref><ref name="microsoft-case-insensitive-locale">{{cite web |url=https://learn.microsoft.com/en-us/dotnet/standard/base-types/regular-expression-options#compare-using-the-invariant-culture |title=Regular expression options § Compare using the invariant culture |work=[[.NET]] fundamentals documentation |publisher=[[Microsoft]] |date=2023-05-12}}</ref> This can have security implications if, for example, [[Code injection#Preventing Code Injection|sanitization]] code or [[access control]] relies on case-insensitive comparison.<ref name="microsoft-case-insensitive-locale"/> By contrast, the [[ð|Icelandic eth (ð)]], the [[đ|barred D (đ)]] and the [[ɖ|retroflex D (ɖ)]], which usually{{efn|Rarely, the uppercase Icelandic eth may instead be written in an [[insular script|insular]] style (Ꝺ) with the crossbar positioned on the stem, particularly if it needs to be distinguished from the uppercase retroflex D (see [[African Reference Alphabet]]).|group=note}} look the same in uppercase (Đ), are given the opposite treatment, and encoded separately in both letter-cases (in contrast to the earlier [[ISO 6937]], which unifies the uppercase forms). Although it allows for case-insensitive comparison without needing to know the language of the text, this approach also has issues, requiring security measures relating to [[homoglyph]] attacks.<ref>{{cite web |url=https://unicode.org/Public/security/latest/confusablesSummary.txt |title=confusablesSummary.txt |work=Unicode Security Mechanisms for UTS #39 |date=2023-08-11 |institution=[[Unicode Consortium]]}}</ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)