Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Hash table
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Hash function== A [[hash function]] <math>h : U \rightarrow \{0, ..., m-1\}</math> maps the universe <math>U</math> of keys to indices or slots within the table, that is, <math>h(x) \in \{0, ..., m-1\}</math> for <math>x \in U</math>. The conventional implementations of hash functions are based on the ''integer universe assumption'' that all elements of the table stem from the universe <math>U = \{0, ..., u - 1\}</math>, where the [[bit length]] of <math>u</math> is confined within the [[Word (computer architecture)|word size]] of a [[computer architecture]].{{r|hashhist|p=2}} A hash function <math>h</math> is said to be [[perfect hash function|perfect]] for a given set <math>S</math> if it is [[injective function|injective]] on <math>S</math>, that is, if each element <math>x \in S</math> maps to a different value in <math>{0, ..., m-1}</math>.<ref name="Yi06">{{cite conference | last1 = Lu | first1 = Yi | last2 = Prabhakar | first2 = Balaji | last3 = Bonomi | first3 = Flavio | doi = 10.1109/ISIT.2006.261567 | conference = 2006 IEEE International Symposium on Information Theory | pages = 2774โ2778 | title = Perfect Hashing for Network Applications | year = 2006| isbn = 1-4244-0505-X | s2cid = 1494710 }}</ref><ref name="CHD">{{cite conference | last1 = Belazzougui | first1 = Djamal | last2 = Botelho | first2 = Fabiano C. | last3 = Dietzfelbinger | first3 = Martin | title = Hash, displace, and compress | url = http://cmph.sourceforge.net/papers/esa09.pdf | doi = 10.1007/978-3-642-04128-0_61 | location = Berlin | mr = 2557794 | pages = 682โ693 | publisher = Springer | series = [[Lecture Notes in Computer Science]] | book-title = AlgorithmsโESA 2009: 17th Annual European Symposium, Copenhagen, Denmark, September 7โ9, 2009, Proceedings | volume = 5757 | year = 2009| citeseerx = 10.1.1.568.130}}</ref> A perfect hash function can be created if all the keys are known ahead of time.<ref name="Yi06" /> === Integer universe assumption === The schemes of hashing used in ''integer universe assumption'' include hashing by division, hashing by multiplication, [[universal hashing]], [[dynamic perfect hashing]], and [[Static Hashing|static perfect hashing]].{{r|hashhist|p=2}} However, hashing by division is the commonly used scheme.{{r|cormenalgo01|p=264}}<ref name="owo03">{{cite journal |last1=Owolabi |first1=Olumide |title=Empirical studies of some hashing functions |journal=Information and Software Technology |date=February 2003 |volume=45 |issue=2 |pages=109โ112 |doi=10.1016/S0950-5849(02)00174-X }}</ref>{{rp|p=110}} ==== Hashing by division ==== The scheme in hashing by division is as follows:{{r|hashhist|p=2}} <math display="block">h(x)\ =\ x\, \bmod\, m,</math> where <math>h(x)</math> is the hash value of <math>x \in S</math> and <math>m</math> is the size of the table. ==== Hashing by multiplication ==== The scheme in hashing by multiplication is as follows:{{r|hashhist|pp=2-3}} <math display="block">h(x) = \lfloor m \bigl((xA) \bmod 1\bigr) \rfloor</math> Where <math>A</math> is a non-integer [[Real number|real-valued constant]] and <math>m</math> is the size of the table. An advantage of the hashing by multiplication is that the <math>m</math> is not critical.{{r|hashhist|pp=2-3}} Although any value <math>A</math> produces a hash function, [[Donald Knuth]] suggests using the [[golden ratio]].{{r|hashhist|p=3}} ===Choosing a hash function=== [[Uniform distribution (discrete)|Uniform distribution]] of the hash values is a fundamental requirement of a hash function. A non-uniform distribution increases the number of collisions and the cost of resolving them. Uniformity is sometimes difficult to ensure by design, but may be evaluated empirically using statistical tests, e.g., a [[Pearson's chi-squared test#Discrete uniform distribution|Pearson's chi-squared test]] for discrete uniform distributions.<ref name="chernoff">{{Cite journal | first=Karl |last=Pearson |author1-link=Karl Pearson | year = 1900 | title = On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling | journal = Philosophical Magazine |series=Series 5 | volume = 50 | number = 302 | pages = 157โ175 | doi=10.1080/14786440009463897 |url=https://zenodo.org/record/1430618 }}</ref><ref name="plackett">{{Cite journal |first=Robin |last=Plackett |author1-link=Robin Plackett | year = 1983 | title = Karl Pearson and the Chi-Squared Test | journal = International Statistical Review | volume = 51 | number = 1 | pages = 59โ72 | doi=10.2307/1402731 |jstor=1402731 }}</ref> The distribution needs to be uniform only for table sizes that occur in the application. In particular, if one uses dynamic resizing with exact doubling and halving of the table size, then the hash function needs to be uniform only when the size is a [[power of two]]. Here the index can be computed as some range of bits of the hash function. On the other hand, some hashing algorithms prefer to have the size be a [[prime number]].<ref name=":0">{{Cite web|title = Prime Double Hash Table|url = https://www.concentric.net/~Ttwang/tech/primehash.htm|date = March 1997|access-date = 2015-05-10|last = Wang|first = Thomas|archive-url = https://web.archive.org/web/19990903133921/http://www.concentric.net/~Ttwang/tech/primehash.htm|archive-date = 1999-09-03|url-status=dead}}</ref> For [[open addressing]] schemes, the hash function should also avoid ''[[Primary clustering|clustering]]'', the mapping of two or more keys to consecutive slots. Such clustering may cause the lookup cost to skyrocket, even if the load factor is low and collisions are infrequent. The popular multiplicative hash is claimed to have particularly poor clustering behavior.<ref name=":0" /><ref name="knuth"/> [[K-independent hashing]] offers a way to prove a certain hash function does not have bad keysets for a given type of hashtable. A number of K-independence results are known for collision resolution schemes such as linear probing and cuckoo hashing. Since K-independence can prove a hash function works, one can then focus on finding the fastest possible such hash function.<ref>{{cite journal |last1=Wegman |first1=Mark N. |last2=Carter |first2=J.Lawrence |title=New hash functions and their use in authentication and set equality |journal=Journal of Computer and System Sciences |date=June 1981 |volume=22 |issue=3 |pages=265โ279 |doi=10.1016/0022-0000(81)90033-7 |doi-access=free }}</ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)