Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Knuth–Morris–Pratt algorithm
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
===Example of the search algorithm=== To illustrate the algorithm's details, consider a (relatively artificial) run of the algorithm, where <code>W</code> = "ABCDABD" and <code>S</code> = "ABC ABCDAB ABCDABCDABDE". At any given time, the algorithm is in a state determined by two integers: * <code>m</code>, denoting the position within <code>S</code> where the prospective match for <code>W</code> begins, * <code>i</code>, denoting the index of the currently considered character in <code>W</code>. In each step the algorithm compares <code>S[m+i]</code> with <code>W[i]</code> and increments <code>i</code> if they are equal. This is depicted, at the start of the run, like <div style="font-family:monospace;"> 1 2 m: 01234567890123456789012 S: {{color|blue|ABC}}{{color|red| }}{{color|gray|ABCDAB ABCDABCDABDE}} W: {{color|blue|ABC}}{{color|red|D}}{{color|gray|ABD}} i: {{color|blue|012}}{{color|red|3}}{{color|gray|456}} </div> The algorithm compares successive characters of <code>W</code> to "parallel" characters of <code>S</code>, moving from one to the next by incrementing <code>i</code> if they match. However, in the fourth step <code>S[3] = ' '</code> does not match <code>W[3] = 'D'</code>. Rather than beginning to search again at <code>S[1]</code>, we note that no <code>'A'</code> occurs between positions 1 and 2 in <code>S</code>; hence, having checked all those characters previously (and knowing they matched the corresponding characters in <code>W</code>), there is no chance of finding the beginning of a match. Therefore, the algorithm sets <code>m = 3</code> and <code>i = 0</code>. <div style="font-family:monospace;"> 1 2 m: 01234567890123456789012 S: ABC{{color|red| }}{{color|gray|ABCDAB ABCDABCDABDE}} W: {{color|red|A}}{{color|gray|BCDABD}} i: {{color|red|0}}{{color|gray|123456}} </div> This match fails at the initial character, so the algorithm sets <code>m = 4</code> and <code>i = 0</code> <div style="font-family:monospace;"> 1 2 m: 01234567890123456789012 S: ABC {{color|blue|ABCDAB}}{{color|red| }}{{color|gray|ABCDABCDABDE}} W: {{color|blue|ABCDAB}}{{color|red|D}} i: {{color|blue|012345}}{{color|red|6}} </div> Here, <code>i</code> increments through a nearly complete match <code>"ABCDAB"</code> until <code>i = 6</code> giving a mismatch at <code>W[6]</code> and <code>S[10]</code>. However, just prior to the end of the current partial match, there was that substring <code>"AB"</code> that could be the beginning of a new match, so the algorithm must take this into consideration. As these characters match the two characters prior to the current position, those characters need not be checked again; the algorithm sets <code>m = 8</code> (the start of the initial prefix) and <code>i = 2</code> (signaling the first two characters match) and continues matching. Thus the algorithm not only omits previously matched characters of <code>S</code> (the <code>"AB"</code>), but also previously matched characters of <code>W</code> (the prefix <code>"AB"</code>). <div style="font-family:monospace;"> 1 2 m: 01234567890123456789012 S: ABC ABCD{{color|gray|AB}}{{color|red| }}{{color|gray|ABCDABCDABDE}} W: {{color|gray|AB}}{{color|red|C}}{{color|gray|DABD}} i: {{color|gray|01}}{{color|red|2}}{{color|gray|3456}} </div> This search at the new position fails immediately because <code>W[2]</code> (a <code>'C'</code>) does not match <code>S[10]</code> (a <code>' '</code>). As in the first trial, the mismatch causes the algorithm to return to the beginning of <code>W</code> and begins searching at the mismatched character position of <code>S</code>: <code>m = 10</code>, reset <code>i = 0</code>. <div style="font-family:monospace;"> 1 2 m: 01234567890123456789012 S: ABC ABCDAB{{color|red| }}{{color|gray|ABCDABCDABDE}} W: {{color|red|A}}{{color|gray|BCDABD}} i: {{color|red|0}}{{color|gray|123456}} </div> The match at <code>m=10</code> fails immediately, so the algorithm next tries <code>m = 11</code> and <code>i = 0</code>. <div style="font-family:monospace;"> 1 2 m: 01234567890123456789012 S: ABC ABCDAB {{color|blue|ABCDAB}}{{color|red|C}}{{color|gray|DABDE}} W: {{color|blue|ABCDAB}}{{color|red|D}} i: {{color|blue|012345}}{{color|red|6}} </div> Once again, the algorithm matches <code>"ABCDAB"</code>, but the next character, <code>'C'</code>, does not match the final character <code>'D'</code> of the word <code>W</code>. Reasoning as before, the algorithm sets <code>m = 15</code>, to start at the two-character string <code>"AB"</code> leading up to the current position, set <code>i = 2</code>, and continue matching from the current position. <div style="font-family:monospace;"> 1 2 m: 01234567890123456789012 S: ABC ABCDAB ABCD{{color|blue|ABCDABD}}{{color|gray|E}} W: {{color|blue|ABCDABD}} i: {{color|blue|0123456}} </div> This time the match is complete, and the first character of the match is <code>S[15]</code>.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)