Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Perfect hash function
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Construction== A perfect hash function for a specific set {{mvar|S}} that can be evaluated in constant time, and with values in a small range, can be found by a [[randomized algorithm]] in a number of operations that is proportional to the size of S. The original construction of {{harvtxt|Fredman|Komlós|Szemerédi|1984}} uses a two-level scheme to map a set {{mvar|S}} of {{mvar|n}} elements to a range of {{math|''O''(''n'')}} indices, and then map each index to a range of hash values. The first level of their construction chooses a large prime {{mvar|p}} (larger than the size of the [[Universe (mathematics)|universe]] from which {{mvar|S}} is drawn), and a parameter {{mvar|k}}, and maps each element {{mvar|x}} of {{mvar|S}} to the index :<math>g(x)=(kx\bmod p)\bmod n.</math> If {{mvar|k}} is chosen randomly, this step is likely to have collisions, but the number of elements {{mvar|n<sub>i</sub>}} that are simultaneously mapped to the same index {{mvar|i}} is likely to be small. The second level of their construction assigns disjoint ranges of {{math|''O''(''n<sub>i</sub>''<sup>2</sup>)}} integers to each index {{mvar|i}}. It uses a second set of linear modular functions, one for each index {{mvar|i}}, to map each member {{mvar|x}} of {{mvar|S}} into the range associated with {{math|''g''(''x'')}}.<ref name="inventor">{{citation | last1 = Fredman | first1 = Michael L. | author1-link = Michael Fredman | last2 = Komlós | first2 = János | author2-link = János Komlós (mathematician) | last3 = Szemerédi | first3 = Endre | author3-link = Endre Szemerédi | doi = 10.1145/828.1884 | issue = 3 | journal = [[Journal of the ACM]] | mr = 0819156 | page = 538 | title = Storing a Sparse Table with {{math|''O''(1)}} Worst Case Access Time | volume = 31 | year = 1984| s2cid = 5399743 | doi-access = free }}</ref> As {{harvtxt|Fredman|Komlós|Szemerédi|1984}} show, there exists a choice of the parameter {{mvar|k}} such that the sum of the lengths of the ranges for the {{mvar|n}} different values of {{math|''g''(''x'')}} is {{math|''O''(''n'')}}. Additionally, for each value of {{math|''g''(''x'')}}, there exists a linear modular function that maps the corresponding subset of {{mvar|S}} into the range associated with that value. Both {{mvar|k}}, and the second-level functions for each value of {{math|''g''(''x'')}}, can be found in [[polynomial time]] by choosing values randomly until finding one that works.<ref name="inventor"/> The hash function itself requires storage space {{math|''O''(''n'')}} to store {{mvar|k}}, {{mvar|p}}, and all of the second-level linear modular functions. Computing the hash value of a given key {{mvar|x}} may be performed in constant time by computing {{math|''g''(''x'')}}, looking up the second-level function associated with {{math|''g''(''x'')}}, and applying this function to {{mvar|x}}. A modified version of this two-level scheme with a larger number of values at the top level can be used to construct a perfect hash function that maps {{mvar|S}} into a smaller range of length {{math|''n'' + ''o''(''n'')}}.<ref name="inventor"/> A more recent method for constructing a perfect hash function is described by {{harvtxt|Belazzougui|Botelho|Dietzfelbinger|2009}} as "hash, displace, and compress". Here a first-level hash function {{mvar|g}} is also used to map elements onto a range of {{mvar|r}} integers. An element {{math|''x'' ∈ ''S''}} is stored in the Bucket {{mvar|B<sub>g(x)</sub>}}.<ref name="CHD" /> Then, in descending order of size, each bucket's elements are hashed by a hash function of a sequence of independent fully random hash functions {{math|(Φ<sub>1</sub>, Φ<sub>2</sub>, Φ<sub>3</sub>, ...)}}, starting with {{math|Φ<sub>1</sub>}}. If the hash function does not produce any collisions for the bucket, and the resulting values are not yet occupied by other elements from other buckets, the function is chosen for that bucket. If not, the next hash function in the sequence is tested.<ref name="CHD" /> To evaluate the perfect hash function {{math|''h''(''x'')}} one only has to save the mapping σ of the bucket index {{math|''g''(''x'')}} onto the correct hash function in the sequence, resulting in {{math|h(x) {{=}} Φ<sub>σ(g(x))</sub>}}.<ref name="CHD" /> Finally, to reduce the representation size, the ({{math|σ(i))<sub>0 ≤ i < r</sub>}} are compressed into a form that still allows the evaluation in {{math|''O''(''1'')}}.<ref name="CHD" /> This approach needs linear time in {{mvar|n}} for construction, and constant evaluation time. The representation size is in {{math|''O''(''n'')}}, and depends on the achieved range. For example, with {{math|''m'' {{=}} 1.23''n''}} {{harvtxt|Belazzougui|Botelho|Dietzfelbinger|2009}} achieved a representation size between 3.03 bits/key and 1.40 bits/key for their given example set of 10 million entries, with lower values needing a higher computation time. The space lower bound in this scenario is 0.88 bits/key.<ref name="CHD" /> {{missing information|section|RecSplit & "fingerprinting" [https://epubs.siam.org/doi/pdf/10.1137/1.9781611976007.14 recsplit paper]|date=March 2023}} ===Pseudocode=== '''algorithm''' ''hash, displace, and compress'' '''is''' (1) Split S into buckets {{math|B<sub>i</sub> :{{=}} g<sup>−1</sup>({i})∩S,0 ≤ i < r}} (2) Sort buckets B<sub>i</sub> in falling order according to size |B<sub>i</sub>| (3) Initialize array T[0...m-1] with 0's (4) '''for all''' i{{thin space}}∈[r], in the order from (2), '''do''' (5) '''for''' l{{thin space}}←{{thin space}}1,2,... (6) '''repeat''' forming K<sub>i</sub>{{thin space}}←{{thin space}}{{{math|Φ}}<sub>l</sub>(x)|x{{thin space}}∈{{thin space}}B<sub>i</sub>} (6) '''until''' |K<sub>i</sub>|=|B<sub>i</sub>| '''and''' K<sub>i</sub>∩{j|T[j]=1}={{thin space}}∅ (7) '''let''' σ(i):= the successful l (8) '''for all''' j{{thin space}}∈{{thin space}}K<sub>i</sub> '''let''' T[j]:={{thin space}}1 (9) Transform (σ<sub>i</sub>)<sub>0≤i<r</sub> into compressed form, retaining {{math|''O''(''1'')}} access.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)