Editing Rabin–Karp algorithm (section)

== Hash function used ==
{{main | Rabin fingerprint}}
The key to the Rabin–Karp algorithm's performance is the efficient computation of [[hash value]]s of the successive substrings of the text. The [[Rabin fingerprint]] is a popular and effective rolling hash function. The hash function described here is not a Rabin fingerprint, but it works equally well. It treats every substring as a number in some base, the base being usually the size of the character set.

For example, if the substring is "hi", the base is 256, and prime modulus is 101, then the hash value would be 
  [(104 &times; 256 ) %{{efn|name=mod}} 101  + 105] % 101  =  65
  ([[ASCII]] of 'h' is 104 and of 'i' is 105)

Technically, this algorithm is only similar to the true number in a non-decimal system representation, since for example we could have the "base" less than one of the "digits". See [[hash function]] for a much more detailed discussion. The essential benefit achieved by using a [[rolling hash]] such as the Rabin fingerprint is that it is possible to compute the hash value of the next substring from the previous one by doing only a constant number of operations, independent of the substrings' lengths.

For example, if we have text "abracadabra" and we are searching for a pattern of length 3, the hash of the first substring, "abr", using 256 as the base, and  101 as the prime modulus is: 
 // ASCII a = 97, b = 98, r = 114. 
 hash("abr") =  [ ( [ ( [  (97 &times; 256) % 101 + 98 ] % 101 ) &times; 256 ] %  101 ) + 114 ]   % 101   =  4

We can then compute the hash of the next substring, "bra", from the hash of "abr" by subtracting the number added for the first 'a' of "abr", i.e.  97 &times; 256<sup>2</sup>, multiplying by the base and adding for the last a of "bra", i.e. 97 &times; 256<sup>0</sup>. Like so:
{{pre|style=font-size:95%|1=
//           ''old hash   (-ve avoider){{efn|name=ua}}   old 'a'   left base offset      base shift    new 'a'''    prime modulus
hash("bra") =     [ ( 4   + 101         -  97 * [(256%101)*256] % 101{{efn|name=mod101}} ) * 256{{efn|name=times256}}       +    97 ] % 101      =  30
}}
If we are matching the search string "bra",  using similar calculation of hash("abr"),

 hash'("bra") =  [ ( [ ( [ ( 98 &times; 256) %101  + 114] % 101 ) &times; 256 ] % 101) + 97 ] % 101 = 30

If the substrings in question are long, this algorithm achieves great savings compared with many other hashing schemes.

Theoretically, there exist other algorithms that could provide convenient recomputation, e.g. multiplying together ASCII values of all characters so that shifting substring would only entail dividing the previous hash by the first character value, then multiplying by the new last character's value. The limitation, however, is the limited size of the integer [[data type]] and the necessity of using [[modular arithmetic]] to scale down the hash results.{{efn|See [[hash function]] article.}} Meanwhile, naive hash functions do not produce large numbers quickly, but, just like adding ASCII values, are likely to cause many [[hash collision]]s and hence slow down the algorithm. Hence the described hash function is typically the preferred one in the Rabin–Karp algorithm.