Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Boyer–Moore string-search algorithm
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Shift rules== A shift is calculated by applying two rules: the bad-character rule and the good-suffix rule. The actual shifting offset is the maximum of the shifts calculated by these rules. ===The bad-character rule=== ====Description==== {{float begin|side=right|width=340px}} | -||-||-||-||X||-||-||K||-||-||- |- | A||N||P||A||<span style="color:#FF0000">N</span>||M||A||N||A||M||- |- | -||N||<span style="color:#0000FF">N</span>||A||A||M||A||N||-||-||- |- | -||-||-||N||<span style="color:#0000FF">N</span>||A||A||M||A||N||- {{float end|Demonstration of bad-character rule with pattern '''P''' {{=}} '''NNAAMAN'''. There is a mismatch between '''N''' (in the input text) and '''A''' (in the pattern) in the column marked with an '''X'''. The pattern is shifted right (in this case by 2) so that the next occurrence of the character '''N''' (in the pattern '''P''') to the left of the current character (which is the middle '''A''') is found.}} The bad-character rule considers the character in {{mvar|T}} at which the comparison process failed (assuming such a failure occurred). The next occurrence of that character to the left in {{mvar|P}} is found, and a shift which brings that occurrence in line with the mismatched occurrence in {{mvar|T}} is proposed. If the mismatched character does not occur to the left in {{mvar|P}}, a shift is proposed that moves the entirety of {{mvar|P}} past the point of mismatch. ====Preprocessing==== Methods vary on the exact form the table for the bad-character rule should take, but a simple constant-time lookup solution is as follows: create a 2D table which is indexed first by the index of the character {{mvar|c}} in the alphabet and second by the index {{mvar|i}} in the pattern. This lookup will return the occurrence of {{mvar|c}} in {{mvar|P}} with the next-highest index {{tmath|j < i}} or -1 if there is no such occurrence. The proposed shift will then be {{tmath|i - j}}, with {{tmath|O(1)}} lookup time and {{tmath|O(km)}} space, assuming a finite alphabet of length {{mvar|k}}. The C and Java implementations below have a {{tmath|O(k)}} space complexity (make_delta1, makeCharTable). This is the same as the original delta1 and the [[Boyer–Moore–Horspool algorithm#Description|BMH bad-character table]]. This table maps a character at position {{tmath|i}} to shift by {{tmath|\operatorname{len}(p) - 1 - i}}, with the last instance—the least shift amount—taking precedence. All unused characters are set as {{tmath|\operatorname{len}(p)}} as a sentinel value. ===The good-suffix rule=== ====Description==== {{float begin|side=right|width=380px}} | -||-||-||-||X||-||-||K||-||-||-||-||- |- | M||A||N||P||A||<span style="color:#0000FF">N</span>||<span style="color:#0000FF">A</span>||<span style="color:#0000FF">M</span>||A||N||A||P||- |- | A||<span style="color:#FF0000">N</span>||<span style="color:#FF0000">A</span>||<span style="color:#FF0000">M</span>||P||<span style="color:#0000FF">N</span>||<span style="color:#0000FF">A</span>||<span style="color:#0000FF">M</span>||-||-||-||-||- |- | -||-||-||-||A||<span style="color:#FF0000">N</span>||<span style="color:#FF0000">A</span>||<span style="color:#FF0000">M</span>||P||N||A||M||- {{float end|Demonstration of good-suffix rule with pattern '''P''' {{=}} '''ANAMPNAM'''. Here, '''''t''''' is '''T'''[6..8] and '''''{{prime|t}}''''' is '''P'''[2..4].}} The good-suffix rule is markedly more complex in both concept and implementation than the bad-character rule. Like the bad-character rule, it also exploits the algorithm's feature of comparisons beginning at the end of the pattern and proceeding towards the pattern's start. It can be described as follows:<ref name = "ASTS"> {{Citation | last = Gusfield | first = Dan | title = Algorithms on Strings, Trees, and Sequences | publisher = Cambridge University Press | orig-date =1997 | year = 1999 | edition = 1 | chapter = Chapter 2 - Exact Matching: Classical Comparison-Based Methods | pages = 19–21 | isbn = 0-521-58519-8 }} </ref> <blockquote> Suppose for a given alignment of '''''P''''' and '''''T''''', a substring '''''t''''' of '''''T''''' matches a suffix of '''''P''''' and suppose '''''t''''' is the largest such substring for the given alignment. # Then find, if it exists, the right-most copy '''''{{prime|t}}''''' of '''''t''''' in '''''P''''' such that '''''{{prime|t}}''''' is not a suffix of '''''P''''' and the character to the left of '''''{{prime|t}}''''' in '''''P''''' differs from the character to the left of '''''t''''' in '''''P'''''. Shift '''''P''''' to the right so that substring '''''{{prime|t}}''''' in '''''P''''' aligns with substring '''''t''''' in '''''T'''''. # If '''''{{prime|t}}''''' does not exist, then shift the left end of '''''P''''' to the right by the least amount (past the left end of '''''t''''' in '''''T''''') so that a prefix of the shifted pattern matches a suffix of '''''t''''' in '''''T'''''. This includes cases where '''''t''''' is an exact match of '''''P'''''. # If no such shift is possible, then shift '''''P''''' by '''m''' (length of P) places to the right. </blockquote> ====Preprocessing==== The good-suffix rule requires two tables: one for use in the general case (where a copy '''''{{prime|t}}''''' is found), and another for use when the general case returns no meaningful result. These tables will be designated {{mvar|L}} and {{mvar|H}} respectively. Their definitions are as follows:<ref name = "ASTS" /> <blockquote> For each {{mvar|i}}, {{tmath|L[i]}} is the largest position less than {{mvar|m}} such that string {{tmath|P[i..m]}} matches a suffix of {{tmath|P[1..L[i]]}} and such that the character preceding that suffix is not equal to {{tmath|P[i-1]}}. {{tmath|L[i]}} is defined to be zero if there is no position satisfying the condition. </blockquote> <blockquote> Let {{tmath|H[i]}} denote the length of the largest suffix of {{tmath|P[i..m]}} that is also a prefix of {{mvar|P}}, if one exists. If none exists, let {{tmath|H[i]}} be zero. </blockquote> Both of these tables are constructible in {{tmath|O(m)}} time and use {{tmath|O(m)}} space. The alignment shift for index {{mvar|i}} in {{mvar|P}} is given by {{tmath|m - L[i]}} or {{tmath|m - H[i]}}. {{mvar|H}} should only be used if {{tmath|L[i]}} is zero or a match has been found. ----------------------------------------------- ===Shift Example using pattern ANPANMAN=== Index| Mismatch | Shift 0 | N| 1 1 | AN| 8 2 | MAN| 3 3 | NMAN| 6 4 | ANMAN| 6 5 | PANMAN| 6 6 | NPANMAN| 6 7 | ANPANMAN| 6 Explanation: Index 0, no characters matched, the character read was not an N. The good-suffix length is zero. Since there are plenty of letters in the pattern that are also not N, we have minimal information here - shifting by 1 is the least interesting result. Index 1, we matched the N, and it was preceded by something other than A. Now look at the pattern starting from the end, where do we have N preceded by something other than A? There are two other N's, but both are preceded by A. That means no part of the good suffix can be useful to us -- shift by the full pattern length 8. Index 2: We matched the AN, and it was preceded by not M. In the middle of the pattern there is a AN preceded by P, so it becomes the shift candidate. Shifting that AN to the right to line up with our match is a shift of 3. Index 3 & up: the matched suffixes do not match anything else in the pattern, but the trailing suffix AN matches the start of the pattern, so the shifts here are all 6.<ref>{{cite web |title=Constructing a Good Suffix Table - Understanding an example |url=https://stackoverflow.com/questions/27428605/constructing-a-good-suffix-table-understanding-an-example |website=Stack Overflow |access-date=30 July 2024 |language=en |date=11 December 2014}}{{Creative Commons text attribution notice|cc=bysa3|from this source=yes}}</ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)