Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Knuth–Morris–Pratt algorithm
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
===Efficiency of the search algorithm=== Assuming the prior existence of the table <code>T</code>, the search portion of the Knuth–Morris–Pratt algorithm has [[Computational complexity theory|complexity]] [[Linear time#Linear time|''O''(''n'')]], where ''n'' is the length of <code>S</code> and the ''O'' is [[big-O notation]]. Except for the fixed overhead incurred in entering and exiting the function, all the computations are performed in the <code>'''while'''</code> loop. To bound the number of iterations of this loop; observe that <code>T</code> is constructed so that if a match which had begun at <code>S[m]</code> fails while comparing <code>S[m + i]</code> to <code>W[i]</code>, then the next possible match must begin at <code>S[m + (i - T[i])]</code>. In particular, the next possible match must occur at a higher index than <code>m</code>, so that <code>T[i] < i</code>. This fact implies that the loop can execute at most 2''n'' times, since at each iteration it executes one of the two branches in the loop. The first branch invariably increases <code>i</code> and does not change <code>m</code>, so that the index <code>m + i</code> of the currently scrutinized character of <code>S</code> is increased. The second branch adds <code>i - T[i]</code> to <code>m</code>, and as we have seen, this is always a positive number. Thus the location <code>m</code> of the beginning of the current potential match is increased. At the same time, the second branch leaves <code>m + i</code> unchanged, for <code>m</code> gets <code>i - T[i]</code> added to it, and immediately after <code>T[i]</code> gets assigned as the new value of <code>i</code>, hence <code>new_m + new_i = old_m + old_i - T[old_i] + T[old_i] = old_m + old_i</code>. Now, the loop ends if <code>m + i</code> = ''n''; therefore, each branch of the loop can be reached at most ''n'' times, since they respectively increase either <code>m + i</code> or <code>m</code>, and <code>m ≤ m + i</code>: if <code>m</code> = ''n'', then certainly <code>m + i</code> ≥ ''n'', so that since it increases by unit increments at most, we must have had <code>m + i</code> = ''n'' at some point in the past, and therefore either way we would be done. Thus the loop executes at most 2''n'' times, showing that the time complexity of the search algorithm is ''O''(''n''). Here is another way to think about the runtime: Let us say we begin to match <code>W</code> and <code>S</code> at position <code>i</code> and <code>p</code>. If <code>W</code> exists as a substring of <code>S</code> at p, then <code>W[0..m] = S[p..p+m]</code>. Upon success, that is, the word and the text matched at the positions (<code>W[i] = S[p+i]</code>), we increase <code>i</code> by 1. Upon failure, that is, the word and the text do not match at the positions (<code>W[i] ≠ S[p+i]</code>), the text pointer is kept still, while the word pointer is rolled back a certain amount (<code>i = T[i]</code>, where <code>T</code> is the jump table), and we attempt to match <code>W[T[i]]</code> with <code>S[p+i]</code>. The maximum number of roll-back of <code>i</code> is bounded by <code>i</code>, that is to say, for any failure, we can only roll back as much as we have progressed up to the failure. Then it is clear the runtime is 2''n''.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)