Editing Hash function (section)

=== Uniformity ===
A good hash function should map the expected inputs as evenly as possible over its output range.  That is, every hash value in the output range should be generated with roughly the same [[probability]]. The reason for this last requirement is that the cost of hashing-based methods goes up sharply as the number of ''collisions''—pairs of inputs that are mapped to the same hash value—increases.  If some hash values are more likely to occur than others, then a larger fraction of the lookup operations will have to search through a larger set of colliding table entries.

This criterion only requires the value to be ''uniformly distributed'', not ''random'' in any sense. A good randomizing function is (barring computational efficiency concerns) generally a good choice as a hash function, but the converse need not be true.

Hash tables often contain only a small subset of the valid inputs. For instance, a club membership list may contain only a hundred or so member names, out of the very large set of all possible names. In these cases, the uniformity criterion should hold for almost all typical subsets of entries that may be found in the table, not just for the global set of all possible entries.

In other words, if a typical set of {{math|''m''}} records is hashed to {{math|''n''}} table slots, then the probability of a bucket receiving many more than {{math|''m''/''n''}} records should be vanishingly small. In particular, if {{Math|''m'' < ''n''}}, then very few buckets should have more than one or two records.  A small number of collisions is virtually inevitable, even if {{math|''n''}} is much larger than {{math|''m''}}—see the [[birthday problem]].

In special cases when the keys are known in advance and the key set is static, a hash function can be found that achieves absolute (or collisionless) uniformity.  Such a hash function is said to be ''[[Perfect hash function|perfect]]''.  There is no algorithmic way of constructing such a function—searching for one is a [[factorial]] function of the number of keys to be mapped versus the number of table slots that they are mapped into.  Finding a perfect hash function over more than a very small set of keys is usually computationally infeasible; the resulting function is likely to be more computationally complex than a standard hash function and provides only a marginal advantage over a function with good statistical properties that yields a minimum number of collisions. See [[Universal hashing|universal hash function]].