Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Aho–Corasick algorithm
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Example== In this example, we will consider a dictionary consisting of the following words: {a, ab, bab, bc, bca, c, caa}. The graph below is the Aho–Corasick data structure constructed from the specified dictionary, with each row in the table representing a node in the trie, with the column path indicating the (unique) sequence of characters from the root to the node. The data structure has one node for every prefix of every string in the dictionary. So if (bca) is in the dictionary, then there will be nodes for (bca), (bc), (b), and (). If a node is in the dictionary then it is a blue node. Otherwise it is a grey node. There is a black directed "child" arc from each node to a node whose name is found by appending one character. So there is a black arc from (bc) to (bca). There is a blue directed "suffix" arc from each node to the node that is the longest possible strict suffix of it in the graph. For example, for node (caa), its strict suffixes are (aa) and (a) and (). The longest of these that exists in the graph is (a). So there is a blue arc from (caa) to (a). The blue arcs can be computed in linear time by performing a [[breadth-first search]] [potential suffix node will always be at lower level] starting from the root. The target for the blue arc of a visited node can be found by following its parent's blue arc to its longest suffix node and searching for a child of the suffix node whose character matches that of the visited node. If the character does not exist as a child, we can find the next longest suffix (following the blue arc again) and then search for the character. We can do this until we either find the character (as child of a node) or we reach the root (which will always be a suffix of every string). There is a green "dictionary suffix" arc from each node to the next node in the dictionary that can be reached by following blue arcs. For example, there is a green arc from (bca) to (a) because (a) is the first node in the dictionary (i.e. a blue node) that is reached when following the blue arcs to (ca) and then on to (a). The green arcs can be computed in linear time by repeatedly traversing blue arcs until a blue node is found, and [[memoization|memoizing]] this information. [[File:Ahocorasick.svg|thumb|left|A visualization of the trie for the dictionary on the right. Suffix links are in blue; dictionary suffix links in green. Nodes corresponding to dictionary entries are highlighted in blue.]] {| class="wikitable" |+ Dictionary {a, ab, bab, bc, bca, c, caa} |- ! Path ! In dictionary ! Suffix link ! Dict suffix link |- | () || – || || |- | (a) || + || () || |- | (ab) || + || (b) || |- | (b) || – || () || |- | (ba) || – || (a) || (a) |- | (bab) || + || (ab) || (ab) |- | (bc) || + || (c) || (c) |- | (bca) || + || (ca) || (a) |- | (c) || + || () || |- | (ca) || – || (a) || (a) |- | (caa) || + || (a) || (a) |} {{clear}} At each step, the current node is extended by finding its child, and if that doesn't exist, finding its suffix's child, and if that doesn't work, finding its suffix's suffix's child, and so on, finally ending in the root node if nothing's seen before. When the algorithm reaches a node, it outputs all the dictionary entries that end at the current character position in the input text. This is done by printing every node reached by following the dictionary suffix links, starting from that node, and continuing until it reaches a node with no dictionary suffix link. In addition, the node itself is printed, if it is a dictionary entry. Execution on input string {{mono|abccab}} yields the following steps: {| class="wikitable" |+ Analysis of input string {{mono|abccab}} |- ! Node ! Remaining string ! Output:end position ! Transition ! Output |- | () || abccab || || start at root || |- | (a) || bccab || a:1 || () to child (a) || Current node |- | (ab) || ccab || ab:2 | (a) to child (ab) || Current node |- | (bc) || cab || bc:3, c:3 | (ab) to suffix (b) to child (bc) || Current Node, Dict suffix node |- | (c) || ab || c:4 | (bc) to suffix (c) to suffix () to child (c) | Current node |- | (ca) || b || a:5 | (c) to child (ca) || Dict suffix node |- | (ab) || || ab:6 | (ca) to suffix (a) to child (ab) || Current node |}
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)