Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Rabin–Karp algorithm
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Hash function used == {{main | Rabin fingerprint}} The key to the Rabin–Karp algorithm's performance is the efficient computation of [[hash value]]s of the successive substrings of the text. The [[Rabin fingerprint]] is a popular and effective rolling hash function. The hash function described here is not a Rabin fingerprint, but it works equally well. It treats every substring as a number in some base, the base being usually the size of the character set. For example, if the substring is "hi", the base is 256, and prime modulus is 101, then the hash value would be [(104 × 256 ) %{{efn|name=mod}} 101 + 105] % 101 = 65 ([[ASCII]] of 'h' is 104 and of 'i' is 105) Technically, this algorithm is only similar to the true number in a non-decimal system representation, since for example we could have the "base" less than one of the "digits". See [[hash function]] for a much more detailed discussion. The essential benefit achieved by using a [[rolling hash]] such as the Rabin fingerprint is that it is possible to compute the hash value of the next substring from the previous one by doing only a constant number of operations, independent of the substrings' lengths. For example, if we have text "abracadabra" and we are searching for a pattern of length 3, the hash of the first substring, "abr", using 256 as the base, and 101 as the prime modulus is: // ASCII a = 97, b = 98, r = 114. hash("abr") = [ ( [ ( [ (97 × 256) % 101 + 98 ] % 101 ) × 256 ] % 101 ) + 114 ] % 101 = 4 We can then compute the hash of the next substring, "bra", from the hash of "abr" by subtracting the number added for the first 'a' of "abr", i.e. 97 × 256<sup>2</sup>, multiplying by the base and adding for the last a of "bra", i.e. 97 × 256<sup>0</sup>. Like so: {{pre|style=font-size:95%|1= // ''old hash (-ve avoider){{efn|name=ua}} old 'a' left base offset base shift new 'a''' prime modulus hash("bra") = [ ( 4 + 101 - 97 * [(256%101)*256] % 101{{efn|name=mod101}} ) * 256{{efn|name=times256}} + 97 ] % 101 = 30 }} If we are matching the search string "bra", using similar calculation of hash("abr"), hash'("bra") = [ ( [ ( [ ( 98 × 256) %101 + 114] % 101 ) × 256 ] % 101) + 97 ] % 101 = 30 If the substrings in question are long, this algorithm achieves great savings compared with many other hashing schemes. Theoretically, there exist other algorithms that could provide convenient recomputation, e.g. multiplying together ASCII values of all characters so that shifting substring would only entail dividing the previous hash by the first character value, then multiplying by the new last character's value. The limitation, however, is the limited size of the integer [[data type]] and the necessity of using [[modular arithmetic]] to scale down the hash results.{{efn|See [[hash function]] article.}} Meanwhile, naive hash functions do not produce large numbers quickly, but, just like adding ASCII values, are likely to cause many [[hash collision]]s and hence slow down the algorithm. Hence the described hash function is typically the preferred one in the Rabin–Karp algorithm.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)