Editing Double hashing (section)

== Variants ==

Peter Dillinger's PhD thesis<ref name=Dillinger10>{{cite thesis
 |title=Adaptive Approximate State Storage
 |first=Peter C. |last=Dillinger
 |date=December 2010
 |publisher=Northeastern University
 |type=PhD thesis
 |pages=93–112
 |url=http://peterd.org/pcd-diss.pdf#page=93
}}</ref> points out that double hashing produces unwanted equivalent hash functions when the hash functions are treated as a set, as in [[Bloom filter]]s: If <math>h_2(y) = -h_2(x)</math> and <math>h_1(y) = h_1(x) + k\cdot h_2(x)</math>, then <math>h(i, y) = h(k - i, x)</math> and the sets of hashes <math>\left\{h(0, x), ..., h(k, x)\right\} = \left\{h(0, y), ..., h(k, y)\right\}</math>  are identical.  This makes a collision twice as likely as the hoped-for <math>1/|T|^2</math>.

There are additionally a significant number of mostly-overlapping hash sets; if <math>h_2(y) = h_2(x)</math> and <math>h_1(y) = h_1(x) \pm h_2(x)</math>, then <math>h(i, y) = h(i\pm 1, x)</math>, and comparing additional hash values (expanding the range of <math>i</math>) is of no help.

=== Triple hashing ===
Adding a quadratic term <math>i^2,</math><ref name=Kirsch08>{{cite journal
 |title=Less Hashing, Same Performance: Building a Better Bloom Filter
 |first1=Adam |last1=Kirsch  |first2=Michael |last2=Mitzenmacher |authorlink2=Michael Mitzenmacher
 |journal=Random Structures and Algorithms |volume=33 |issue=2 |pages=187–218
 |date=September 2008 |doi=10.1002/rsa.20208 |citeseerx=10.1.1.152.579
 |url=https://www.eecs.harvard.edu/~michaelm/postscripts/rsa2008.pdf
}}</ref> <math>i(i+1)/2</math> (a [[triangular number]]) or even <math>i^2 \cdot h_3(x)</math> ('''triple hashing''')<ref>Alternatively defined with the triangular number, as in Dillinger 2004.</ref> to the hash function improves the hash function somewhat<ref name=Kirsch08/> but does not fix this problem; if:
: <math>h_1(y) = h_1(x) + k \cdot h_2(x) + k^2 \cdot h_3(x),</math>
: <math>h_2(y) = -h_2(x) - 2k \cdot h_3(x),</math> and
: <math>h_3(y) = h_3(x).</math>
then
: <math>\begin{align}
h(k-i, y) &= h_1(y) + (k - i) \cdot h_2(y) + (k-i)^2 \cdot h_3(y) \\
          &= h_1(y) + (k - i) (-h_2(x) - 2k h_3(x)) + (k-i)^2 h_3(x) \\
          &= \ldots \\
          &= h_1(x) + k h_2(x) + k^2 h_3(x) + (i - k) h_2(x) + (i^2 - k^2) h_3(x) \\
          &= h_1(x) + i h_2(x) + i^2 h_3(x) \\
          &= h(i, x). \\
\end{align}</math>

=== Enhanced double hashing ===

Adding a [[cubic function|cubic term]] <math>i^3</math><ref name=Kirsch08/> or <math>(i^3-i)/6</math> (a [[tetrahedral number]]),<ref name=Dillinger04>{{cite conference
 |title=Bloom Filters in Probabilistic Verification
 |first1=Peter C. |last1=Dillinger  |first2=Panagiotis |last2=Manolios
 |conference=5h International Conference on Formal Methods in Computer Aided Design (FMCAD 2004)
 |location=Austin, Texas |date=November 15–17, 2004
 |doi=10.1007/978-3-540-30494-4_26 |citeseerx=10.1.1.119.628
 |url=https://www.khoury.northeastern.edu/~pete/pub/bloom-filters-verification.pdf
}}</ref> does solve the problem, a technique known as '''enhanced double hashing'''.  This can be computed efficiently by [[Forward difference|forward differencing]]:
<syntaxhighlight lang="c">
struct key;	/// Opaque
/// Use other data types when needed. (Must be unsigned for guaranteed wrapping.)
extern unsigned int h1(struct key const *), h2(struct key const *);

/// Calculate k hash values from two underlying hash functions
/// h1() and h2() using enhanced double hashing.  On return,
///     hashes[i] = h1(x) + i*h2(x) + (i*i*i - i)/6.
/// Takes advantage of automatic wrapping (modular reduction)
/// of unsigned types in C.
void ext_dbl_hash(struct key const *x, unsigned int hashes[], unsigned int n)
{
	unsigned int a = h1(x), b = h2(x), i = 0;

    hashes[i] = a;
	for (i = 1; i < n; i++) {
		a += b;	// Add quadratic difference to get cubic
		b += i;	// Add linear difference to get quadratic
		       	// i++ adds constant difference to get linear
		hashes[i] = a;
	}
}
</syntaxhighlight>

In addition to rectifying the collision problem, enhanced double hashing also removes double-hashing's numerical restrictions on <math>h_2(x)</math>'s properties, allowing a hash function similar in property to (but still independent of) <math>h_1</math> to be used.<ref name=Dillinger04/>