Editing Rabin–Karp algorithm (section)

{{Short description|String searching algorithm}}
{{no footnotes|date=September 2018}}
{{Infobox algorithm
|name           = Rabin-Karp algorithm
|class          = [[string-searching algorithm|String searching]]
|image          = <!-- filename only, no "File:" or "Image:" prefix, and no enclosing [[brackets]] -->
|caption        =
|data           = 
|time           = <math>O(mn)</math> plus <math>O(m)</math> preprocessing time
|best-time      =
|average-time   = <math>O(n)</math>
|space          = <math>O(1)</math>
}}
In [[computer science]], the '''Rabin–Karp algorithm''' or '''Karp–Rabin algorithm''' is a [[string-searching algorithm]] created by {{harvs|first1=Richard M.|last1=Karp|author1-link=Richard M. Karp|first2=Michael O.|last2=Rabin|author2-link=Michael O. Rabin|year=1987|txt}} that uses [[Hash function|hashing]] to find an exact match of a pattern string in a text. It uses a [[rolling hash]] to quickly filter out positions of the text that cannot match the pattern, and then checks for a match at the remaining positions. Generalizations of the same idea can be used to find more than one match of a single pattern, or to find matches for more than one pattern.

To find a single match of a single pattern, the [[expected time]] of the algorithm is [[linear time|linear]] in the combined length of the pattern and text,
although its [[Worst-case complexity|worst-case time complexity]] is the product of the two lengths. To find multiple matches, the expected time is linear in the input lengths, plus the combined length of all the matches, which could be greater than linear. In contrast, the [[Aho–Corasick algorithm]] can find all matches of multiple patterns in worst-case time and space linear in the input length and the number of matches (instead of the total length of the matches).

A practical application of the algorithm is [[plagiarism detection|detecting plagiarism]]. Given source material, the algorithm can rapidly search through a paper for instances of sentences from the source material, ignoring details such as case and punctuation. Because of the abundance of the sought strings, single-string searching algorithms are impractical.