Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Punycode
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
===Variable-length number encoding=== The number is encoded using the letters {{char|a}} through {{char|z}} and the digits {{char|0}} through {{char|9}}. It is not base-36 but a more complex scheme, [[Numeral system#Generalized variable-length integers|generalized variable-length integers]], which allows the numbers to be concatenated with nothing separating them. This is how "kva" is used to represent the code number 745: <blockquote> A number system with [[Endianness#Detailed description|little-endian ordering]] is used which allows variable-length codes without separate delimiters: a digit lower than a threshold value marks that it is the most-significant digit, hence the end of the number. The threshold value depends on the position in the number and also on previous insertions, to increase efficiency. Correspondingly the weights of the digits vary. In this case a number system with 36 symbols is used, with the [[case sensitivity|case-insensitive]] 'a' through 'z' equal to the decimal numbers 0 through 25, and '0' through '9' equal to the decimal numbers 26 through 35. Thus "kva", corresponds to the decimal number string "10 21 0". </blockquote> To decode this string of symbols, a sequence of thresholds will be needed, in this case it's (1, 1, 26, 26, ...).<ref>This is true for the first encoded character (or, in terms of RFC 3492, the first "delta"): see RFC 3492, Sec. 6.</ref> The weight (or [[place value]]) of the least-significant digit is always 1: 'k' (=10) with a weight of 1 equals 10. After this, the weight of the next digit depends on the first threshold: generally, for any ''n'', the weight of the (''n''+1)-th digit is ''w'' × (36 − ''t''), where ''w'' is the previous weight and ''t'' is the threshold of the ''n''-th digit. So in this case, the second symbol has a place value of 36 minus the previous threshold value of 1, which equals 35. Therefore, the sum of the first two symbols 'k' (=10) and 'v' (=21) is 10 × 1 + 21 × 35. Since the second symbol is not less than its threshold value of 1, there is more to come. However, since the third symbol in this example is 'a' (=0), we may ignore calculating its weight. Therefore, "kva" represents the decimal number (10 × 1) + (21 × 35) = 745. Number 745 will be encoded as 10 + 21 × 35 + 0 (base 35 used for second digit, the most significant digit 0 needed as terminator), 10 → 'k', 21 → 'v', 0 → 'a', so "bücher" → "bcher-kva". The thresholds themselves are determined for each successive encoded character by an algorithm keeping them between 1 and 26 inclusive.<ref>RFC 3492, Secs. 3.4, 5.</ref> The case can then be used to provide information about the original case of the string.<ref>RFC 3492, App. A.</ref> Because special characters are sorted by their code points by encoding algorithm, for the insertion of a second special character in "bücher", the first possibility is "büücher" with code "bcher-kvaa", the second "bücüher" with code "bcher-kvab", etc. After "bücherü" with code "bcher-kvae" comes codes representing insertion of ý, the Unicode character following ü, starting with "ýbücher" with code "bcher-kvaf" (different from "übücher" coded "bcher-jvab"), etc.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)