Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Hash function
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== Character folding === The paradigmatic example of folding by characters is to add up the integer values of all the characters in the string. A better idea is to multiply the hash total by a constant, typically a sizable prime number, before adding in the next character, ignoring overflow. Using exclusive-or instead of addition is also a plausible alternative. The final operation would be a modulo, mask, or other function to reduce the word value to an index the size of the table. The weakness of this procedure is that information may cluster in the upper or lower bits of the bytes; this clustering will remain in the hashed result and cause more collisions than a proper randomizing hash. ASCII byte codes, for example, have an upper bit of 0, and printable strings do not use the last byte code or most of the first 32 byte codes, so the information, which uses the remaining byte codes, is clustered in the remaining bits in an unobvious manner. The classic approach, dubbed the [[PJW hash function|PJW hash]] based on the work of [[Peter J. Weinberger]] at [[Bell Labs]] in the 1970s, was originally designed for hashing identifiers into compiler symbol tables as given in the [[Compilers: Principles, Techniques, and Tools|"Dragon Book"]].<ref>{{cite book |first1=A. |last1=Aho |author-link1=Alfred Aho |first2=R. |last2=Sethi |author-link2=Ravi Sethi |first3=J. D. |last3=Ullman |author-link3=Jeffrey Ullman |date=1986 |title=Compilers: Principles, Techniques and Tools |page=435 |publisher=[[Addison-Wesley]] |location=Reading, MA |isbn=0-201-10088-6}}</ref> This hash function offsets the bytes 4 bits before adding them together. When the quantity wraps, the high 4 bits are shifted out and if non-zero, [[Exclusive or|xored]] back into the low byte of the cumulative quantity. The result is a word-size hash code to which a modulo or other reducing operation can be applied to produce the final hash index. Today, especially with the advent of 64-bit word sizes, much more efficient variable-length string hashing by word chunks is available.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)