Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Boyer–Moore string-search algorithm
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Description== The Boyer–Moore algorithm searches for occurrences of {{mvar|P}} in {{mvar|T}} by performing explicit character comparisons at different alignments. Instead of a [[brute-force search]] of all alignments (of which there are {{tmath|n - m + 1}}), Boyer–Moore uses information gained by preprocessing {{mvar|P}} to skip as many alignments as possible. Previous to the introduction of this algorithm, the usual way to search within text was to examine each character of the text for the first character of the pattern. Once that was found the subsequent characters of the text would be compared to the characters of the pattern. If no match occurred then the text would again be checked character by character in an effort to find a match. Thus almost every character in the text needs to be examined. The key insight in this algorithm is that if the end of the pattern is compared to the text, then jumps along the text can be made rather than checking every character of the text. The reason that this works is that in lining up the pattern against the text, the last character of the pattern is compared to the character in the text. If the characters do not match, there is no need to continue searching backwards along the text. If the character in the text does not match any of the characters in the pattern, then the next character in the text to check is located {{mvar|m}} characters farther along the text, where {{mvar|m}} is the length of the pattern. If the character in the text ''is'' in the pattern, then a partial shift of the pattern along the text is done to line up along the matching character and the process is repeated. Jumping along the text to make comparisons rather than checking every character in the text decreases the number of comparisons that have to be made, which is the key to the efficiency of the algorithm. More formally, the algorithm begins at alignment {{tmath|1= k = m}}, so the start of {{mvar|P}} is aligned with the start of {{mvar|T}}. Characters in {{mvar|P}} and {{mvar|T}} are then compared starting at index {{mvar|m}} in {{mvar|P}} and {{mvar|k}} in {{mvar|T}}, moving backward. The strings are matched from the end of {{mvar|P}} to the start of {{mvar|P}}. The comparisons continue until either the beginning of {{mvar|P}} is reached (which means there is a match) or a mismatch occurs upon which the alignment is shifted forward (to the right) according to the maximum value permitted by a number of rules. The comparisons are performed again at the new alignment, and the process repeats until the alignment is shifted past the end of {{mvar|T}}, which means no further matches will be found. The shift rules are implemented as constant-time table lookups, using tables generated during the preprocessing of {{mvar|P}}.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)