Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Unicode collation algorithm
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{Short description|String collation algorithm}} __NOTOC__ The '''Unicode collation algorithm''' ('''UCA''') is an algorithm defined in Unicode Technical Report #10, which is a customizable method to produce binary keys from [[String (computer science)|strings]] representing text in any [[writing system]] and [[language]] that can be represented with [[Unicode]]. These keys can then be efficiently compared byte by byte in order to [[collate]] or sort them according to the rules of the language, with options for ignoring case, accents, etc.<ref name=":0">{{Cite web |last1=Whistler |first1=Ken |last2=Scherer |first2=Markus |last3=Davis |first3=Mark |author-link3=Mark Davis (Unicode) |date=2022-08-26 |title=UTS #10: Unicode Collation Algorithm |url=https://www.unicode.org/reports/tr10/ |access-date=2023-08-16 |website=[[Unicode]]}}</ref> Unicode Technical Report #10 also specifies the ''Default Unicode Collation Element Table'' (DUCET). This data file specifies a default collation ordering. The DUCET is customizable for different languages,<ref name=":0" /><ref name=":1">{{Cite book |last=Hosken |first=Martin |url=https://scriptsource.org/cms/scripts/render_download.php?format=file&media_id=..%2Fsites%2Fs%2Fmedia%2Fdatabase%2Fssproto%2Fentries%2Fpn%2Frn%2Fpnrnlhkrq9_sort_tutorial.pdf&filename=sort_tutorial.pdf |title=Unicode Sort Tailoring: Tutorial |date=2021-09-23 |publisher=[[SIL International|SIL Writing Systems Technology]] |edition=1.3 |pages=2β3 |access-date=2023-08-16}}</ref> and some such customizations can be found in the Unicode [[Common Locale Data Repository]] (CLDR).<ref>{{Cite web |title=CLDR Releases/Downloads |url=https://cldr.unicode.org/index/downloads |access-date=2023-08-16 |website=[[Common Locale Data Repository|Unicode CLDR]] |language=}}</ref> An open source implementation of UCA is included with the [[International Components for Unicode]], ICU.<ref>{{Cite web |title=ICU - International Components for Unicode |url=https://icu.unicode.org/home |access-date=2023-08-16 |website=[[Unicode]]}}</ref><ref>{{Cite web |title=Collations |url=https://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.help.sqlanywhere.12.0.1/dbadmin/natlang-s-7003956.html |access-date=2023-08-16 |website=SyBooks Online}}</ref> ICU supports tailoring, and the collation tailorings from CLDR are included in ICU.<ref>{{Cite web |title=Customization |url=https://unicode-org.github.io/icu/userguide/collation/customization/ |access-date=2023-08-16 |website=ICU Documentation |language=}}</ref><ref name=":1" />
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)