Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Hash collision
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{Short description|Hash function phenomenon}} [[File:Hash table 4 1 1 0 0 1 0 LL.svg|thumb|254x254px|John Smith and Sandra Dee share the same hash value of 02, causing a hash collision.]] In [[computer science]], a '''hash collision''' or '''hash clash'''<ref>{{Citation|last=Thomas|first=Cormen|title=Introduction to Algorithms |date=2009|pages=253|publisher=MIT Press|isbn=978-0-262-03384-8}}</ref> is when two distinct pieces of data in a [[hash table]] share the same hash value. The hash value in this case is derived from a [[hash function]] which takes a data input and returns a fixed length of bits.<ref>{{Citation|last=Stapko|first=Timothy|title=Embedded Security|date=2008|url=http://dx.doi.org/10.1016/b978-075068215-2.50006-9|work=Practical Embedded Security|pages=83β114|publisher=Elsevier|doi=10.1016/b978-075068215-2.50006-9|isbn=9780750682152|access-date=2021-12-08}}</ref> Although hash algorithms, especially cryptographic hash algorithms, have been created with the intent of being [[Collision resistance|collision resistant]], they can still sometimes map different data to the same hash (by virtue of the [[pigeonhole principle]]). Malicious users can take advantage of this to mimic, access, or alter data.<ref>{{cite web|last1=Schneier|first1=Bruce|author-link1=Bruce Schneier|title=Cryptanalysis of MD5 and SHA: Time for a New Standard|url=https://www.schneier.com/essays/archives/2004/08/cryptanalysis_of_md5.html|url-status=dead|archive-url=https://web.archive.org/web/20160316114109/https://www.schneier.com/essays/archives/2004/08/cryptanalysis_of_md5.html|archive-date=2016-03-16|access-date=2016-04-20|website=Computerworld|quote=Much more than encryption algorithms, one-way hash functions are the workhorses of modern cryptography.}}</ref> Due to the possible negative applications of hash collisions in [[data management]] and [[computer security]] (in particular, [[cryptographic hash function]]s), collision avoidance has become an important topic in computer security. == Background == Hash collisions can be unavoidable depending on the number of objects in a set and whether or not the bit string they are mapped to is long enough in length. When there is a set of ''n'' objects, if ''n'' is greater than |''R''|, which in this case ''R'' is the range of the hash value, the probability that there will be a hash collision is 1, meaning it is guaranteed to occur.<ref name=":0">{{Cite book|date=2016|title=Cybersecurity and Applied Mathematics|url=http://dx.doi.org/10.1016/c2015-0-01807-x|doi=10.1016/c2015-0-01807-x|isbn=9780128044520}}</ref> Another reason hash collisions are likely at some point in time stems from the idea of the [[birthday problem|birthday paradox]] in mathematics. This problem looks at the probability of a set of two randomly chosen people having the same birthday out of ''n'' number of people.<ref>{{Cite book|last=Soltanian |first=Mohammad Reza Khalifeh |url=http://worldcat.org/oclc/1162249290|title=Theoretical and Experimental Methods for Defending Against DDoS Attacks|date=10 November 2015|isbn=978-0-12-805399-7|oclc=1162249290}}</ref> This idea has led to what has been called the [[birthday attack]]. The premise of this attack is that it is difficult to find a birthday that specifically matches your birthday or a specific birthday, but the probability of finding a set of ''any'' two people with matching birthdays increases the probability greatly. Bad actors can use this approach to make it simpler for them to find hash values that collide with any other hash value β rather than searching for a specific value.<ref>{{Citation|last1=Conrad|first1=Eric|title=Domain 3: Security Engineering (Engineering and Management of Security)|date=2016|url=http://dx.doi.org/10.1016/b978-0-12-802437-9.00004-7|work=CISSP Study Guide|pages=103β217|publisher=Elsevier|access-date=2021-12-08|last2=Misenar|first2=Seth|last3=Feldman|first3=Joshua|doi=10.1016/b978-0-12-802437-9.00004-7|isbn=9780128024379}}</ref> The impact of collisions depends on the application. When hash functions and fingerprints are used to identify similar data, such as [[homology (biology)|homologous]] [[DNA]] sequences or similar audio files, the functions are designed so as to ''maximize'' the probability of collision between distinct but similar data, using techniques like [[locality-sensitive hashing]].<ref name="MOMD">{{cite web|last1=Rajaraman|first1=A.|last2=Ullman|first2=J.|author2-link=Jeffrey Ullman|year=2010|title=Mining of Massive Datasets, Ch. 3.|url=http://infolab.stanford.edu/~ullman/mmds.html}}</ref> [[Checksum]]s, on the other hand, are designed to minimize the probability of collisions between similar inputs, without regard for collisions between very different inputs.<ref name="crypto">{{Cite conference|last1=Al-Kuwari|first1=Saif|last2=Davenport|first2=James H.|last3=Bradford|first3=Russell J.|date=2011|title=Cryptographic Hash Functions: Recent Design Trends and Security Notions|url=https://eprint.iacr.org/2011/565|conference=Inscrypt '10}}</ref> Instances where bad actors attempt to create or find hash collisions are known as [[Collision attack|collision attacks.]]<ref>{{Cite book|last=Schema|first=Mike|title=Hacking Web Apps|year=2012}}</ref> In practice, security-related applications use cryptographic hash algorithms, which are designed to be long enough for random matches to be unlikely, fast enough that they can be used anywhere, and safe enough that it would be extremely hard to find collisions.<ref name="crypto" /> == Collision resolution == {{Main Article|Hash table#Collision resolution}} In hash tables, since hash collisions are inevitable, hash tables have mechanisms of dealing with them, known as collision resolutions. Two of the most common strategies are [[open addressing]] and [[separate chaining]]. The cache-conscious collision resolution is another strategy that has been discussed in the past for string hash tables. [[File:HASHTB12.svg|thumb|275x275px|John Smith and Sandra Dee are both being directed to the same cell. Open addressing will cause the hash table to redirect Sandra Dee to another cell.]] === Open addressing === {{main|Open addressing}} Cells in the hash table are assigned one of three states in this method β occupied, empty, or deleted. If a hash collision occurs, the table will be probed to move the record to an alternate cell that is stated as empty. There are different types of probing that take place when a hash collision happens and this method is implemented. Some types of probing are [[linear probing]], [[double hashing]], and [[quadratic probing]].<ref name=":2">{{Cite journal|last1=Nimbe|first1=Peter|last2=Ofori Frimpong|first2=Samuel |last3=Opoku |first3=Michael |date=2014-08-20 |title=An Efficient Strategy for Collision Resolution in Hash Tables|url=http://dx.doi.org/10.5120/17411-7990 |journal=International Journal of Computer Applications |volume=99|issue=10|pages=35β41|doi=10.5120/17411-7990 |bibcode=2014IJCA...99j..35N|issn=0975-8887|doi-access=free}}</ref> Open Addressing is also known as closed hashing.<ref>{{cite web |title=Closed Hashing |work=CSC241 Data Structures and Algorithms |url=https://www.cs.wcupa.edu/rkline/ds/closed-hashing.html |access-date=2022-04-06 |first=Robert |last=Kline |publisher=West Chester University}}</ref> === Separate chaining === {{further|Hash table#Separate chaining}} This strategy allows more than one record to be "chained" to the cells of a hash table. If two records are being directed to the same cell, both would go into that cell as a linked list. This efficiently prevents a hash collision from occurring since records with the same hash values can go into the same cell, but it has its disadvantages. Keeping track of so many lists is difficult and can cause whatever tool that is being used to become very slow.<ref name=":2" /> Separate chaining is also known as open hashing.<ref>{{cite web |url=https://www.log2base2.com/algorithms/searching/open-hashing.html |title=Open hashing or separate chaining |work=Log{{sub|2}}2}}</ref> === Cache-conscious collision resolution === Although much less used than the previous two, {{harvp|Askitis|Zobel|2005}} has proposed the [[cache (computing)|cache]]-conscious collision resolution method in 2005.<ref>{{cite conference |conference=International Symposium on String Processing and Information Retrieval |last1=Askitis|first1=Nikolas |last2=Zobel |first2=Justin |title=Cache-Conscious Collision Resolution in String Hash Tables |date=2005 |work=String Processing and Information Retrieval SPIRE 2005 |pages=91β102 |place=Berlin, Heidelberg |publisher=Springer Berlin Heidelberg |series=Lecture Notes in Computer Science |volume=3772 |doi=10.1007/11575832_11| isbn=978-3-540-29740-6 |editor1-last=Consens |editor1-first=M. |editor2-last=Navarro |editor2-first=G.}}</ref> It is a similar idea to the separate chaining methods, although it does not technically involve the chained lists. In this case, instead of chained lists, the hash values are represented in a contiguous list of items. This is better suited for string hash tables and the use for numeric values is still unknown.<ref name=":2" /> == See also == * [[List of hash functions]] * {{Annotated link |Universal one-way hash function}} * {{Annotated link |Cryptography}} * {{Annotated link |Universal hashing}} * {{Annotated link |Perfect hash function}} * {{Annotated link |Injective map}} ==References== {{Reflist}} == External links == {{Cryptography navbox | hash}} [[Category:Hashing]] {{DEFAULTSORT:Hash_Collision}}
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)
Pages transcluded onto the current version of this page
(
help
)
:
Template:Annotated link
(
edit
)
Template:Citation
(
edit
)
Template:Cite book
(
edit
)
Template:Cite conference
(
edit
)
Template:Cite journal
(
edit
)
Template:Cite web
(
edit
)
Template:Cryptography navbox
(
edit
)
Template:Further
(
edit
)
Template:Harvp
(
edit
)
Template:Main
(
edit
)
Template:Main Article
(
edit
)
Template:Reflist
(
edit
)
Template:Short description
(
edit
)