Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Internationalized domain name
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
===ToASCII and ToUnicode=== The conversions between ASCII and non-ASCII forms of a domain name are accomplished by a pair of algorithms called ToASCII and ToUnicode. These algorithms are not applied to the domain name as a whole, but rather to individual labels. For example, if the domain name is www.example.com, then the labels are ''www'', ''example'', and ''com''. ToASCII or ToUnicode is applied to each of these three separately. The details of these two algorithms are complex. They are specified in RFC 3490. Following is an overview of their workings. ToASCII leaves ASCII labels unchanged. It fails if the label is unsuitable for the Domain Name System. For labels containing at least one non-ASCII character, ToASCII applies the [[Nameprep]] algorithm. This converts the label to lowercase and performs other normalization. ToASCII then translates the result to ASCII, using [[Punycode]].<ref name="rfc3492">RFC 3492, ''Punycode: A Bootstring encoding of Unicode for Internationalized Domain Names in Applications (IDNA)'', A. Costello, The Internet Society (March 2003)</ref> Finally, it prepends the four-character string "<code>xn--</code>".<ref>{{cite web |url=http://www.atm.tut.fi/list-archive/ietf-announce/msg13572.html |title=Completion of IANA Selection of IDNA Prefix |author=Internet Assigned Numbers Authority |author-link=Internet Assigned Numbers Authority |website=www.atm.tut.fi |date=2003-02-14 |access-date=2017-09-22 |archive-url=https://web.archive.org/web/20100427154004/http://www.atm.tut.fi/list-archive/ietf-announce/msg13572.html |archive-date=2010-04-27 |url-status=dead }}</ref> This four-character string is called the ASCII Compatible Encoding (''ACE'') prefix. It is used to distinguish labels encoded in Punycode from ordinary ASCII labels. The ToASCII algorithm can fail in several ways. For example, the final string could exceed the 63-character limit of a DNS label. A label for which ToASCII fails cannot be used in an internationalized domain name. The function ToUnicode reverses the action of ToASCII, stripping off the ACE prefix and applying the Punycode decode algorithm. It does not reverse the Nameprep processing, since that is merely a normalization and is by nature irreversible. Unlike ToASCII, ToUnicode always succeeds, because it simply returns the original string if decoding fails. In particular, this means that ToUnicode does not affect a string that does not begin with the ACE prefix.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)