Editing Collision attack

{{Short description|Cryptographic attack}}
{{context|date=February 2020}}
In [[cryptography]], a '''collision attack''' on a [[cryptographic hash]] tries to find two inputs producing the same hash value, i.e. a [[hash collision]]. This is in contrast to a [[preimage attack]] where a specific target hash value is specified.

There are roughly two types of collision attacks:
;Classical collision attack: Find two different messages ''m''<sub>1</sub> and ''m''<sub>2</sub> such that ''hash''(''m''<sub>1</sub>) = ''hash''(''m''<sub>2</sub>).
More generally:
;Chosen-prefix collision attack: Given two different prefixes ''p''<sub>1</sub> and ''p''<sub>2</sub>, find two suffixes ''s''<sub>1</sub> and ''s''<sub>2</sub> such that ''hash''(''p''<sub>1</sub> ∥ ''s''<sub>1</sub>) = ''hash''(''p''<sub>2</sub> ∥ ''s''<sub>2</sub>), where ∥ denotes the [[concatenation]] operation.

==Classical collision attack==
Much like [[symmetric-key cipher]]s are vulnerable to [[brute force attack]]s, every [[cryptographic hash function]] is inherently vulnerable to collisions using a [[birthday attack]]. Due to the [[birthday problem]], these attacks are much faster than a brute force would be. A hash of ''n'' bits can be broken in 2<sup>''n''/2</sup> time steps (evaluations of the hash function).

Mathematically stated, a collision attack finds two different messages {{tmath|m_1}} and {{tmath|m_2}}, such that <math>hash(m_1) = hash(m_2)</math>. In a classical collision attack, the attacker has no control over the content of either message, but they are arbitrarily chosen by the algorithm.

More efficient attacks are possible by employing [[cryptanalysis]] to specific hash functions. When a collision attack is discovered and is found to be faster than a birthday attack, a hash function is often denounced as "broken". The [[NIST hash function competition]] was largely induced by published collision attacks against two very commonly used hash functions, [[MD5]]<ref name=md5-2004>Xiaoyun Wang, Dengguo Feng, Xuejia Lai, Hongbo Yu: [http://eprint.iacr.org/2004/199 Collisions for Hash Functions MD4, MD5, HAVAL-128 and RIPEMD], Cryptology ePrint Archive Report 2004/199, 16 Aug 2004, revised 17 Aug 2004. Retrieved July 27, 2008.</ref> and [[SHA-1]]. The collision attacks against MD5 have improved so much that, as of 2007, it takes just a few seconds on a regular computer.<ref>{{cite journal |author=M.M.J. Stevens |date=June 2007 |title=On Collisions for MD5 |url=http://www.win.tue.nl/hashclash/On%20Collisions%20for%20MD5%20-%20M.M.J.%20Stevens.pdf |quote=[...] we are able to find collisions for MD5 in about 2<sup>24.1</sup> compressions for recommended IHVs which takes approx. 6 seconds on a 2.6GHz Pentium 4. }}</ref> Hash collisions created this way are usually constant length and largely unstructured, so cannot directly be applied to attack widespread document formats or protocols.

However, workarounds are possible by abusing dynamic constructs present in many formats. In this way, two documents would be created which are as similar as possible in order to have the same hash value. One document would be shown to an authority to be signed, and then the signature could be copied to the other file. Such a malicious document would contain two different messages in the same document, but conditionally display one or the other through subtle changes to the file:

* Some document formats like [[PostScript]], or [[macro (computer science)|macro]]s in [[Microsoft Word]], have conditional constructs.<ref>{{cite web |author=Magnus Daum |author2=Stefan Lucks |author2-link=Stefan Lucks |title=Hash Collisions (The Poisoned Message Attack) |work=[[Eurocrypt]] 2005 rump session |url=http://th.informatik.uni-mannheim.de/People/lucks/HashCollisions/ |url-status=dead |archive-url=https://web.archive.org/web/20100327141611/http://th.informatik.uni-mannheim.de/people/lucks/HashCollisions/ |archive-date=2010-03-27 }}</ref><ref name=special-file-formats>{{cite journal |author1=Max Gebhardt |author2=Georg Illies |author3=Werner Schindler |title=A Note on the Practical Value of Single Hash Collisions for Special File Formats |date=4 January 2017 |url=http://csrc.nist.gov/groups/ST/hash/documents/Illies_NIST_05.pdf }}</ref> (if-then-else) that allow testing whether a location in the file has one value or another in order to control what is displayed.
*[[TIFF]] files can contain cropped images, with a different part of an image being displayed without affecting the hash value.<ref name=special-file-formats />
*[[PDF]] files are vulnerable to collision attacks by using color value (such that text of one message is displayed with a white color that blends into the background, and text of the other message is displayed with a dark color) which can then be altered to change the signed document's content.<ref name=special-file-formats />

==Chosen-prefix collision attack==
An extension of the collision attack is the chosen-prefix collision attack, which is specific to [[Merkle–Damgård construction|Merkle–Damgård hash functions]]. In this case, the attacker can choose two arbitrarily different documents, and then append different calculated values that result in the whole documents having an equal hash value. This attack is normally harder, a hash of n bits can be broken in 2<sup>(n/2)+1</sup> time steps, but is much more powerful than a classical collision attack.

Mathematically stated, given two different prefixes ''p''<sub>1</sub>, ''p''<sub>2</sub>, the attack finds two suffixes ''s''<sub>1</sub> and ''s''<sub>2</sub> such that ''hash''(''p''<sub>1</sub> ∥ ''s''<sub>1</sub>) = ''hash''(''p''<sub>2</sub> ∥ ''s''<sub>2</sub>) (where ∥ is the [[concatenation]] operation).

More efficient attacks are also possible by employing [[cryptanalysis]] to specific hash functions. In 2007, a chosen-prefix collision attack was found against MD5, requiring roughly 2<sup>50</sup> evaluations of the MD5 function. The paper also demonstrates two [[X.509]] certificates for different domain names, with colliding hash values. This means that a [[certificate authority]] could be asked to sign a certificate for one domain, and then that certificate (specially its signature) could be used to create a new rogue certificate to impersonate another domain.<ref name=md5-chosen-2007>{{cite book |author1=Marc Stevens |author2=Arjen Lenstra |author3=Benne de Weger |title=Advances in Cryptology - EUROCRYPT 2007 |chapter=Chosen-Prefix Collisions for MD5 and Colliding X.509 Certificates for Different Identities |series=Lecture Notes in Computer Science |date=2007-11-30 |volume=4515 |page=1 |doi=10.1007/978-3-540-72540-4_1 |bibcode=2007LNCS.4515....1S |isbn=978-3-540-72539-8 |chapter-url=http://www.win.tue.nl/hashclash/ChosenPrefixCollisions/ |doi-access=free }}</ref>

A real-world collision attack was published in December 2008 when a group of security researchers published a forged [[X.509]] signing certificate that could be used to impersonate a [[certificate authority]], taking advantage of a prefix collision attack against the MD5 hash function. This meant that an attacker could impersonate any [[Transport Layer Security|SSL]]-secured website as a [[man-in-the-middle]], thereby subverting the certificate validation built in every [[web browser]] to protect [[electronic commerce]].  The rogue certificate may not be revokable by real authorities, and could also have an arbitrary forged expiry time. Even though MD5 was known to be very weak in 2004,<ref name=md5-2004 /> certificate authorities were still willing to sign MD5-verified certificates in December 2008,<ref>{{cite web |date=2008-12-30 |author=Alexander Sotirov |title=Creating a rogue CA certificate |url=http://www.phreedom.org/research/rogue-ca/ |display-authors=etal |access-date=2009-10-07 |archive-url=https://web.archive.org/web/20120418172628/http://www.phreedom.org/research/rogue-ca/ |archive-date=2012-04-18 |url-status=dead }}</ref> and at least one Microsoft code-signing certificate was still using MD5 in May 2012.

The [[Flame (malware)|Flame]] malware successfully used a new variation of a chosen-prefix collision attack to spoof [[code signing]] of its components by a Microsoft root certificate that still used the compromised MD5 algorithm.<ref>{{cite web| url=http://blogs.technet.com/b/msrc/archive/2012/06/03/microsoft-releases-security-advisory-2718704.aspx?Redirected=true| title=Microsoft releases Security Advisory 2718704|publisher=[[Microsoft]]|date=3 June 2012|access-date=4 June 2012|archive-url=https://web.archive.org/web/20120607215605/http://blogs.technet.com/b/msrc/archive/2012/06/03/microsoft-releases-security-advisory-2718704.aspx?Redirected=true|archive-date=7 June 2012|url-status=dead}}</ref><ref>{{cite web|url=http://www.cwi.nl/news/2012/cwi-cryptanalist-discovers-new-cryptographic-attack-variant-in-flame-spy-malware|title=CWI Cryptanalist Discovers New Cryptographic Attack Variant in Flame Spy Malware|author=Marc Stevens|publisher=Centrum Wiskunde & Informatica|date=7 June 2012|access-date=9 June 2012}}</ref>

In 2019, researchers found a chosen-prefix collision attack against [[SHA-1]] with computing complexity between 2<sup>66.9</sup> and 2<sup>69.4</sup> and cost less than 100,000 US dollars. <ref name = "SHA-1 collision attacks are now actually practical and a looming danger_2019"> {{cite news | title = SHA-1 collision attacks are now actually practical and a looming danger | url = https://www.zdnet.com/article/sha-1-collision-attacks-are-now-actually-practical-and-a-looming-danger/ | work = ZDNet | author = Catalin Cimpanu | language = en | date = 2019-05-13 }} </ref><ref name = 'Collisions of Chosen-Prefix Collisions SHA-1_2019'> {{ cite web | title = From Collisions to Chosen-Prefix Collisions Application to Full SHA-1 | url = https://eprint.iacr.org/2019/459.pdf | author1 = Gaëtan Leurent |author2=Thomas Peyrin | date = 2019-05-06 }} </ref> In 2020, researchers reduced the complexity of a chosen-prefix collision attack against SHA-1 to 2<sup>63.4</sup>. <ref name = "SHA-1 is a Shambles - First Chosen-Prefix Collision on SHA-1 and Application to the PGP Web of Trust">{{ cite web | title = SHA-1 is a Shambles - First Chosen-Prefix Collision on SHA-1 and Application to the PGP Web of Trust | url = https://eprint.iacr.org/2020/014.pdf | author1 = Gaëtan Leurent |author2=Thomas Peyrin | date = 2020-01-05 }}</ref>

==Attack scenarios==
Many applications of cryptographic hash functions do not rely on [[collision resistance]], thus collision attacks do not affect their security. For example, [[HMAC]]s are not vulnerable.<ref name=collision-qna>{{cite web |date=2005-02-15 |title=Hash Collision Q&A |publisher=Cryptography Research Inc. |url=http://www.cryptography.com/cnews/hash.html  |quote=Because of the way hash functions are used in the HMAC construction, the techniques used in these recent attacks do not apply |archive-url = https://web.archive.org/web/20080717103333/http://www.cryptography.com/cnews/hash.html |archive-date = 2008-07-17}}</ref> For the attack to be useful, the attacker must be in control of the input to the hash function.

===Digital signatures===
Because [[digital signature]] algorithms cannot sign a large amount of data efficiently, most implementations use a hash function to reduce ("compress") the amount of data that needs to be signed down to a constant size. Digital signature schemes often become vulnerable to hash collisions as soon as the underlying hash function is practically broken; techniques like randomized (salted) hashing will buy extra time by requiring the harder [[preimage attack]].<ref>Shai Halevi and Hugo Krawczyk, [http://www.ee.technion.ac.il/~hugo/rhash/ Randomized Hashing and Digital Signatures] {{Webarchive|url=https://web.archive.org/web/20090620072808/http://www.ee.technion.ac.il/~hugo/rhash/ |date=2009-06-20 }}</ref>

The usual attack scenario goes like this:
# Mallory creates two different documents A and B that have an identical hash value, i.e., a collision. Mallory seeks to deceive Bob into accepting document B, ostensibly from Alice.
# Mallory '''sends document A to Alice''', who agrees to what the document says, signs its hash, and sends the signature to Mallory.
# Mallory attaches the signature from document A to document B.
# Mallory then '''sends the signature and document B to Bob''', claiming that Alice signed B. Because the digital signature matches document B's hash, Bob's software is unable to detect the substitution.{{Citation needed|date=September 2023}}

In 2008, researchers used a chosen-prefix collision attack against [[MD5]] using this scenario, to produce a rogue [[certificate authority]] certificate. They created two versions of a [[Transport Layer Security|TLS]] [[public key certificate]], one of which appeared legitimate and was submitted for signing by the RapidSSL certificate authority. The second version, which had the same MD5 hash, contained flags which signal web browsers to accept it as a legitimate authority for issuing arbitrary other certificates.<ref>{{cite conference |url=http://www.win.tue.nl/hashclash/rogue-ca/ |author=Alexander Sotirov |author2=Marc Stevens |author3=Jacob Appelbaum |author4=Arjen Lenstra |author5=David Molnar |author6=Dag Arne Osvik |author7=Benne de Weger |date=30 December 2008 |title=MD5 considered harmful today |conference=[[Chaos Communication Congress]] 2008 }}</ref>

===Hash flooding===
'''Hash flooding''' (also known as '''HashDoS'''<ref>{{cite book |last1=Falkenberg |first1=Andreas |last2=Mainka |first2=Christian |last3=Somorovsky |first3=Juraj |last4=Schwenk |first4=Jörg |title=2013 IEEE 20th International Conference on Web Services |chapter=A New Approach towards DoS Penetration Testing on Web Services |year=2013 |pages=491–498 |doi=10.1109/ICWS.2013.72|isbn=978-0-7695-5025-1 |s2cid=17805370 }}</ref>) is a [[denial of service]] attack that uses hash collisions to exploit the worst-case (linear probe) runtime of [[hash table]] lookups.<ref name=v8>{{cite web |title=About that hash flooding vulnerability in Node.js... · V8 |url=https://v8.dev/blog/hash-flooding |website=v8.dev}}</ref> It was originally described in 2003 as an example of an algorithmic complexity attack.<ref name=crosby>Scott A. Crosby and Dan S. Wallach. 2003. Denial of service via algorithmic complexity attacks. In Proceedings of the 12th conference on USENIX Security Symposium - Volume 12 (SSYM'03), Vol. 12. USENIX Association, Berkeley, CA, USA, 3-3.</ref> To execute such an attack, the attacker sends the server multiple pieces of data that hash to the same value and then tries to get the server to perform slow lookups. As the main focus of hash functions used in hash tables was speed instead of security, most major programming languages were affected,<ref name=crosby /> with new vulnerabilities of this class still showing up a decade after the original presentation.<ref name=v8/>

To prevent hash flooding without making the hash function overly complex, newer [[keyed hash function]]s are introduced, with the security objective that collisions are hard to find as long as the key is unknown. They may be slower than previous hashes, but are still much easier to compute than cryptographic hashes. As of 2021, Jean-Philippe Aumasson and [[Daniel J. Bernstein]]'s [[SipHash]] (2012) is the most widely-used hash function in this class.<ref name="SipHash">{{cite web
 |url=https://131002.net/siphash/siphash.pdf
 |title=SipHash: a fast short-input PRF
 |author1=Jean-Philippe Aumasson
 |author2=Daniel J. Bernstein |author-link2=Daniel J. Bernstein
 |name-list-style=amp
 |date=2012-09-18
}}</ref> (Non-keyed "simple" hashes remain safe to use as long as the application's hash table is not controllable from the outside.)

It is possible to perform an analogous attack to fill up [[Bloom filter]]s using a (partial) preimage attack.<ref>{{cite thesis |last1=Gerbet |first1=Thomas |last2=Kumar |first2=Amrit |last3=Lauradoux |first3=Cédric |title=The Power of Evil Choices in Bloom Filters |url=https://hal.inria.fr/hal-01082158v2 |publisher=INRIA Grenoble |language=en |date=12 November 2014|type=report }}</ref>

==See also==

* [[Puzzle friendliness]]

==References==
{{Reflist}}

==External links==
* [https://web.archive.org/web/20151222161723/http://www.iaik.tugraz.at/content/research/krypto/sha1/MeaningfulCollisions.php "Meaningful Collisions", attack scenarios for exploiting cryptographic hash collisions]
* [http://www.bishopfox.com/resources/tools/other-free-tools/md4md5-collision-code/ Fast MD5 and MD4 Collision Generators] - Bishop Fox (formerly Stach & Liu).  Create MD4 and MD5 hash collisions using groundbreaking new code that improves upon the techniques originally developed by Xiaoyun Wang.  Using a 1.6&nbsp;GHz Pentium 4, MD5 collisions can be generated in an average of 45 minutes, and MD4 collisions can be generated in an average of 5 seconds. Originally released on 22Jun2006.

{{Cryptography navbox|hash}}

[[Category:Cryptographic attacks]]
[[Category:Cryptographic hash functions]]