Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Knuth–Morris–Pratt algorithm
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
===Working example of the table-building algorithm=== We consider the example of <code>W = "ABCDABD"</code> first. We will see that it follows much the same pattern as the main search, and is efficient for similar reasons. We set <code>T[0] = -1</code>. To find <code>T[1]</code>, we must discover a [[Substring#Suffix|proper suffix]] of <code>"A"</code> which is also a prefix of pattern <code>W</code>. But there are no proper suffixes of <code>"A"</code>, so we set <code>T[1] = 0</code>. To find <code>T[2]</code>, we see that the substring <code>W[0]</code> - <code>W[1]</code> (<code>"AB"</code>) has a proper suffix <code>"B"</code>. However "B" is not a prefix of the pattern <code>W</code>. Therefore, we set <code>T[2] = 0</code>. Continuing to <code>T[3]</code>, we first check the proper suffix of length 1, and as in the previous case it fails. Should we also check longer suffixes? No, we now note that there is a shortcut to checking ''all'' suffixes: let us say that we discovered a [[Substring#Suffix|proper suffix]] which is a [[Substring#Prefix|proper prefix]] (a proper prefix of a string is not equal to the string itself) and ending at <code>W[2]</code> with length 2 (the maximum possible); then its first character is also a proper prefix of <code>W</code>, hence a proper prefix itself, and it ends at <code>W[1]</code>, which we already determined did not occur as <code>T[2] = 0</code> and not <code>T[2] = 1</code>. Hence at each stage, the shortcut rule is that one needs to consider checking suffixes of a given size m+1 only if a valid suffix of size m was found at the previous stage (i.e. <code>T[x] = m</code>) and should not bother to check m+2, m+3, etc. Therefore, we need not even concern ourselves with substrings having length 2, and as in the previous case the sole one with length 1 fails, so <code>T[3] = 0</code>. We pass to the subsequent <code>W[4]</code>, <code>'A'</code>. The same logic shows that the longest substring we need to consider has length 1, and as in the previous case it fails since "D" is not a prefix of <code>W</code>. But instead of setting <code>T[4] = 0</code>, we can do better by noting that <code>W[4] = W[0]</code>, and also that a look-up of <code>T[4]</code> implies the corresponding <code>S</code> character, <code>S[m+4]</code>, was a mismatch and therefore <code>S[m+4] ≠ 'A'</code>. Thus there is no point in restarting the search at <code>S[m+4]</code>; we should begin at 1 position ahead. This means that we may shift pattern <code>W</code> by match length plus one character, so <code>T[4] = -1</code>. Considering now the next character, <code>W[5]</code>, which is <code>'B'</code>: though by inspection the longest substring would appear to be <code>'A'</code>, we still set <code>T[5] = 0</code>. The reasoning is similar to why <code>T[4] = -1</code>. <code>W[5]</code> itself extends the prefix match begun with <code>W[4]</code>, and we can assume that the corresponding character in <code>S</code>, <code>S[m+5] ≠ 'B'</code>. So backtracking before <code>W[5]</code> is pointless, but <code>S[m+5]</code> may be <code>'A'</code>, hence <code>T[5] = 0</code>. Finally, we see that the next character in the ongoing segment starting at <code>W[4] = 'A'</code> would be <code>'B'</code>, and indeed this is also <code>W[5]</code>. Furthermore, the same argument as above shows that we need not look before <code>W[4]</code> to find a segment for <code>W[6]</code>, so that this is it, and we take <code>T[6] = 2</code>. Therefore, we compile the following table: {| class="wikitable" style="background-color:white; font-family:monospace; text-align:right" !<code>i</code> | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |- !<code>W[i]</code> | A | B | C | D | A | B | D | |- !<code>T[i]</code> | -1 | 0 | 0 | 0 | -1 | 0 | 2 | 0 |} Another example: {| class="wikitable" style="background-color:white; font-family:monospace; text-align:right" !<code>i</code> | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |- !<code>W[i]</code> | A | B | A | C | A | B | A | B | C | |- !<code>T[i]</code> | -1 | 0 | -1 | 1 | -1 | 0 | -1 | 3 | 2 | 0 |} Another example (slightly changed from the previous example): {| class="wikitable" style="background-color:white; font-family:monospace; text-align:right" !<code>i</code> | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |- !<code>W[i]</code> | A | B | A | C | A | B | A | B | A | |- !<code>T[i]</code> | -1 | 0 | -1 | 1 | -1 | 0 | -1 | 3 | -1 | 3 |} Another more complicated example: {| class="wikitable" style="background-color:white; font-family:monospace; text-align:right" !<code>i</code> | 00 | 01 | 02 | 03 | 04 | 05 | 06 | 07 | 08 | 09 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 |- !<code>W[i]</code> | P | A | R | T | I | C | I | P | A | T | E | | I | N | | P | A | R | A | C | H | U | T | E | |- !<code>T[i]</code> | -1 | 0 | 0 | 0 | 0 | 0 | 0 | -1 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | -1 | 0 | 0 | 3 | 0 | 0 | 0 | 0 | 0 | 0 |}
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)