Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Aho–Corasick algorithm
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{Short description|String-searching algorithm}} {{Infobox algorithm |name = Aho–Corasick Algorithm |class = String Searching, String Matching |image = A diagram of the Aho-Corasick string search algorithm.svg |caption = A diagram of an Aho—Corasick automaton |data = [[Finite-state machine]] of strings |time = <!-- Worst time big-O notation --> |best-time = |average-time = |space = <!-- Worst-case space complexity; auxiliary space (excluding input) if not specified --> }} In [[computer science]], the '''Aho–Corasick algorithm''' is a [[string-searching algorithm]] invented by [[Alfred V. Aho]] and Margaret J. Corasick in 1975.<ref>{{cite journal |first1=Alfred V. |last1=Aho |authorlink=Alfred Aho |first2=Margaret J. |last2=Corasick |date=June 1975 |title=Efficient string matching: An aid to bibliographic search |journal=Communications of the ACM |volume=18 |issue=6 |pages=333–340 |doi=10.1145/360825.360855 |mr=0371172|s2cid=207735784 |doi-access=free }}</ref> It is a kind of dictionary-matching [[algorithm]] that locates elements of a finite set of strings (the "dictionary") within an input text. It matches all strings simultaneously. The [[time complexity|complexity]] of the algorithm is linear in the length of the strings plus the length of the searched text plus the number of output matches. Because all matches are found, multiple matches will be returned for one string location if multiple strings from the dictionary match at that location (e.g. dictionary = {{mono|a}}, {{mono|aa}}, {{mono|aaa}}, {{mono|aaaa}} and input string is {{mono|aaaa}}). Informally, the algorithm constructs a [[finite-state machine]] that resembles a [[trie]] with additional links between the various internal nodes. These extra internal links allow fast transitions between failed string matches (e.g. a search for {{mono|cart}} in a trie that does not contain {{mono|cart}}, but contains {{mono|art}}, and thus would fail at the node prefixed by {{mono|car}}), to other branches of the trie that share a common suffix (e.g., in the previous case, a branch for {{mono|attribute}} might be the best lateral transition). This allows the automaton to transition between string matches without the need for backtracking. When the string dictionary is known in advance (e.g. a [[computer virus]] database), the construction of the automaton can be performed once off-line and the compiled automaton stored for later use. In this case, its run time is linear in the length of the input plus the number of matched entries. The Aho—Corasick string-matching algorithm formed the basis of the original [[List of Unix commands|Unix command]] [[grep#Implementations|fgrep]].
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)