Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Double hashing
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Variants == Peter Dillinger's PhD thesis<ref name=Dillinger10>{{cite thesis |title=Adaptive Approximate State Storage |first=Peter C. |last=Dillinger |date=December 2010 |publisher=Northeastern University |type=PhD thesis |pages=93β112 |url=http://peterd.org/pcd-diss.pdf#page=93 }}</ref> points out that double hashing produces unwanted equivalent hash functions when the hash functions are treated as a set, as in [[Bloom filter]]s: If <math>h_2(y) = -h_2(x)</math> and <math>h_1(y) = h_1(x) + k\cdot h_2(x)</math>, then <math>h(i, y) = h(k - i, x)</math> and the sets of hashes <math>\left\{h(0, x), ..., h(k, x)\right\} = \left\{h(0, y), ..., h(k, y)\right\}</math> are identical. This makes a collision twice as likely as the hoped-for <math>1/|T|^2</math>. There are additionally a significant number of mostly-overlapping hash sets; if <math>h_2(y) = h_2(x)</math> and <math>h_1(y) = h_1(x) \pm h_2(x)</math>, then <math>h(i, y) = h(i\pm 1, x)</math>, and comparing additional hash values (expanding the range of <math>i</math>) is of no help. === Triple hashing === Adding a quadratic term <math>i^2,</math><ref name=Kirsch08>{{cite journal |title=Less Hashing, Same Performance: Building a Better Bloom Filter |first1=Adam |last1=Kirsch |first2=Michael |last2=Mitzenmacher |authorlink2=Michael Mitzenmacher |journal=Random Structures and Algorithms |volume=33 |issue=2 |pages=187β218 |date=September 2008 |doi=10.1002/rsa.20208 |citeseerx=10.1.1.152.579 |url=https://www.eecs.harvard.edu/~michaelm/postscripts/rsa2008.pdf }}</ref> <math>i(i+1)/2</math> (a [[triangular number]]) or even <math>i^2 \cdot h_3(x)</math> ('''triple hashing''')<ref>Alternatively defined with the triangular number, as in Dillinger 2004.</ref> to the hash function improves the hash function somewhat<ref name=Kirsch08/> but does not fix this problem; if: : <math>h_1(y) = h_1(x) + k \cdot h_2(x) + k^2 \cdot h_3(x),</math> : <math>h_2(y) = -h_2(x) - 2k \cdot h_3(x),</math> and : <math>h_3(y) = h_3(x).</math> then : <math>\begin{align} h(k-i, y) &= h_1(y) + (k - i) \cdot h_2(y) + (k-i)^2 \cdot h_3(y) \\ &= h_1(y) + (k - i) (-h_2(x) - 2k h_3(x)) + (k-i)^2 h_3(x) \\ &= \ldots \\ &= h_1(x) + k h_2(x) + k^2 h_3(x) + (i - k) h_2(x) + (i^2 - k^2) h_3(x) \\ &= h_1(x) + i h_2(x) + i^2 h_3(x) \\ &= h(i, x). \\ \end{align}</math> === Enhanced double hashing === Adding a [[cubic function|cubic term]] <math>i^3</math><ref name=Kirsch08/> or <math>(i^3-i)/6</math> (a [[tetrahedral number]]),<ref name=Dillinger04>{{cite conference |title=Bloom Filters in Probabilistic Verification |first1=Peter C. |last1=Dillinger |first2=Panagiotis |last2=Manolios |conference=5h International Conference on Formal Methods in Computer Aided Design (FMCAD 2004) |location=Austin, Texas |date=November 15β17, 2004 |doi=10.1007/978-3-540-30494-4_26 |citeseerx=10.1.1.119.628 |url=https://www.khoury.northeastern.edu/~pete/pub/bloom-filters-verification.pdf }}</ref> does solve the problem, a technique known as '''enhanced double hashing'''. This can be computed efficiently by [[Forward difference|forward differencing]]: <syntaxhighlight lang="c"> struct key; /// Opaque /// Use other data types when needed. (Must be unsigned for guaranteed wrapping.) extern unsigned int h1(struct key const *), h2(struct key const *); /// Calculate k hash values from two underlying hash functions /// h1() and h2() using enhanced double hashing. On return, /// hashes[i] = h1(x) + i*h2(x) + (i*i*i - i)/6. /// Takes advantage of automatic wrapping (modular reduction) /// of unsigned types in C. void ext_dbl_hash(struct key const *x, unsigned int hashes[], unsigned int n) { unsigned int a = h1(x), b = h2(x), i = 0; hashes[i] = a; for (i = 1; i < n; i++) { a += b; // Add quadratic difference to get cubic b += i; // Add linear difference to get quadratic // i++ adds constant difference to get linear hashes[i] = a; } } </syntaxhighlight> In addition to rectifying the collision problem, enhanced double hashing also removes double-hashing's numerical restrictions on <math>h_2(x)</math>'s properties, allowing a hash function similar in property to (but still independent of) <math>h_1</math> to be used.<ref name=Dillinger04/>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)