Editing Rabin–Karp algorithm (section)

==Overview==
A naive string matching algorithm compares the given pattern against all positions in the given text. Each comparison takes time proportional to the length of the pattern,
and the number of positions is proportional to the length of the text. Therefore, the worst-case time for such a method is proportional to the product of the two lengths.
In many practical cases, this time can be significantly reduced by cutting short the comparison at each position as soon as a mismatch is found, but this idea cannot guarantee any speedup.

Several string-matching algorithms, including the [[Knuth–Morris–Pratt algorithm]] and the [[Boyer–Moore string-search algorithm]], reduce the worst-case time for string matching by extracting more information from each mismatch, allowing them to skip over positions of the text that are guaranteed not to match the pattern. The Rabin–Karp algorithm instead achieves its speedup by using a [[hash function]] to quickly perform an approximate check for each position, and then only performing an exact comparison at the positions that pass this approximate check.

A hash function is a function which converts every string into a numeric value, called its ''hash value''; for example, we might have <code>hash("hello")=5</code>. If two strings are equal, their hash values are also equal. For a well-designed hash function, the inverse is true, in an approximate sense: strings that are unequal are very unlikely to have equal hash values. The Rabin–Karp algorithm proceeds by computing, at each position of the text, the hash value of a string starting at that position with the same length as the pattern. If this hash value equals the hash value of the pattern, it performs a full comparison at that position.

In order for this to work well, the hash function should be selected randomly from a family of hash functions that are unlikely to produce many [[false positive]]s, that is, positions of the text which have the same hash value as the pattern but do not actually match the pattern. These positions contribute to the running time of the algorithm unnecessarily, without producing a match. Additionally, the hash function used should be a [[rolling hash]], a hash function whose value can be quickly updated from each position of the text to the next. Recomputing the hash function from scratch at each position would be too slow.