Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Suffix tree
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Functionality== A suffix tree for a string <math>S</math> of length <math>n</math> can be built in <math>\Theta(n)</math> time, if the letters come from an alphabet of integers in a polynomial range (in particular, this is true for constant-sized alphabets).{{sfnp|Farach|1997}} For larger alphabets, the running time is dominated by first [[Sorting algorithm|sorting]] the letters to bring them into a range of size <math>O(n)</math>; in general, this takes <math>O(n\log n)</math> time. The costs below are given under the assumption that the alphabet is constant. Assume that a suffix tree has been built for the string <math>S</math> of length <math>n</math>, or that a [[generalised suffix tree]] has been built for the set of strings <math>D=\{S_1,S_2,\dots,S_K\}</math> of total length <math>n=n_1+n_2+\cdots+n_K</math>. You can: * Search for strings: ** Check if a string <math>P</math> of length <math>m</math> is a substring in <math>O(m)</math> time.<ref>{{harvtxt|Gusfield|1999}}, p.92.</ref> ** Find the first occurrence of the patterns <math>P_1,\dots,P_q</math> of total length <math>m</math> as substrings in <math>O(m)</math> time. ** Find all <math>z</math> occurrences of the patterns <math>P_1,\dots,P_q</math> of total length <math>m</math> as substrings in <math>O(m + z)</math> time.<ref>{{harvtxt|Gusfield|1999}}, p.123.</ref> <!-- ** Search for a pattern <math>P</math> of length <math>m</math> with at most <math>k</math> mismatches in <math>O(m \sum_{r=0}^k * {m \choose r} (|\Sigma| - 1)^k + z)</math> time.--> ** Search for a [[regular expression]] ''P'' in time expected [[Sublinear time algorithm|sublinear]] in <math>n</math>.{{sfnp|Baeza-Yates|Gonnet|1996}} ** Find for each suffix of a pattern <math>P</math>, the length of the longest match between a prefix of <math>P[i\dots m]</math> and a substring in <math>D</math> in <math>\Theta(m)</math> time.<ref>{{harvtxt|Gusfield|1999}}, p.132.</ref> This is termed the '''matching statistics''' for <math>P</math>. * Find properties of the strings: ** Find the [[longest common substring problem|longest common substrings]] of the string <math>S_i</math> and <math>S_j</math> in <math>\Theta(n_i + n_j)</math> time.<ref>{{harvtxt|Gusfield|1999}}, p.125.</ref> ** Find all [[maximal pair]]s, maximal repeats or supermaximal repeats in <math>\Theta(n + z)</math> time.<ref>{{harvtxt|Gusfield|1999}}, p.144.</ref> ** Find the [[Lempel–Ziv]] decomposition in <math>\Theta(n)</math> time.<ref>{{harvtxt|Gusfield|1999}}, p.166.</ref> ** Find the [[longest repeated substring problem|longest repeated substrings]] in <math>\Theta(n)</math> time. ** Find the most frequently occurring substrings of a minimum length in <math>\Theta(n)</math> time. ** Find the shortest strings from <math>\Sigma</math> that do not occur in <math>D</math>, in <math>O(n + z)</math> time, if there are <math>z</math> such strings. ** Find the shortest substrings occurring only once in <math>\Theta(n)</math> time. ** Find, for each <math>i</math>, the shortest substrings of <math>S_i</math> not occurring elsewhere in <math>D</math> in <math>\Theta(n)</math> time. The suffix tree can be prepared for constant time [[lowest common ancestor]] retrieval between nodes in <math>\Theta(n)</math> time.<ref>{{harvtxt|Gusfield|1999}}, Chapter 8.</ref> One can then also: * Find the longest common prefix between the suffixes <math>S_i[p..n_i]</math> and <math>S_j[q..n_j]</math> in <math>\Theta(1)</math>.<ref>{{harvtxt|Gusfield|1999}}, p.196.</ref> * Search for a pattern ''P'' of length ''m'' with at most ''k'' mismatches in <math>O(k n + z)</math> time, where ''z'' is the number of hits.<ref>{{harvtxt|Gusfield|1999}}, p.200.</ref> * Find all <math>z</math> maximal [[palindrome]]s in <math>\Theta(n)</math>,<ref>{{harvtxt|Gusfield|1999}}, p.198.</ref> or <math>\Theta(g n)</math> time if gaps of length <math>g</math> are allowed, or <math>\Theta(k n)</math> if <math>k</math> mismatches are allowed.<ref>{{harvtxt|Gusfield|1999}}, p.201.</ref> * Find all <math>z</math> [[tandem repeats]] in <math>O(n \log n + z)</math>, and ''k''-mismatch tandem repeats in <math>O(k n \log (n/k) + z)</math>.<ref>{{harvtxt|Gusfield|1999}}, p.204.</ref> * Find the [[longest common substring problem|longest common substrings]] to at least <math>k</math> strings in <math>D</math> for <math>k=2,\dots,K</math> in <math>\Theta(n)</math> time.<ref>{{harvtxt|Gusfield|1999}}, p.205.</ref> * Find the [[longest palindromic substring]] of a given string (using the generalized suffix tree of the string and its reverse) in linear time.<ref>{{harvtxt|Gusfield|1999}}, pp.197–199.</ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)