Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Hash function
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== Uniformity === A good hash function should map the expected inputs as evenly as possible over its output range. That is, every hash value in the output range should be generated with roughly the same [[probability]]. The reason for this last requirement is that the cost of hashing-based methods goes up sharply as the number of ''collisions''—pairs of inputs that are mapped to the same hash value—increases. If some hash values are more likely to occur than others, then a larger fraction of the lookup operations will have to search through a larger set of colliding table entries. This criterion only requires the value to be ''uniformly distributed'', not ''random'' in any sense. A good randomizing function is (barring computational efficiency concerns) generally a good choice as a hash function, but the converse need not be true. Hash tables often contain only a small subset of the valid inputs. For instance, a club membership list may contain only a hundred or so member names, out of the very large set of all possible names. In these cases, the uniformity criterion should hold for almost all typical subsets of entries that may be found in the table, not just for the global set of all possible entries. In other words, if a typical set of {{math|''m''}} records is hashed to {{math|''n''}} table slots, then the probability of a bucket receiving many more than {{math|''m''/''n''}} records should be vanishingly small. In particular, if {{Math|''m'' < ''n''}}, then very few buckets should have more than one or two records. A small number of collisions is virtually inevitable, even if {{math|''n''}} is much larger than {{math|''m''}}—see the [[birthday problem]]. In special cases when the keys are known in advance and the key set is static, a hash function can be found that achieves absolute (or collisionless) uniformity. Such a hash function is said to be ''[[Perfect hash function|perfect]]''. There is no algorithmic way of constructing such a function—searching for one is a [[factorial]] function of the number of keys to be mapped versus the number of table slots that they are mapped into. Finding a perfect hash function over more than a very small set of keys is usually computationally infeasible; the resulting function is likely to be more computationally complex than a standard hash function and provides only a marginal advantage over a function with good statistical properties that yields a minimum number of collisions. See [[Universal hashing|universal hash function]].
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)