Editing Suffix tree (section)

==Functionality==

A suffix tree for a string <math>S</math> of length <math>n</math> can be built in <math>\Theta(n)</math> time, if the letters come from an alphabet of integers in a polynomial range (in particular, this is true for constant-sized alphabets).{{sfnp|Farach|1997}}
For larger alphabets, the running time is dominated by first [[Sorting algorithm|sorting]] the letters to bring them into a range of size <math>O(n)</math>; in general, this takes <math>O(n\log n)</math> time.
The costs below are given under the assumption that the alphabet is constant.

Assume that a suffix tree has been built for the string <math>S</math> of length <math>n</math>, or that a [[generalised suffix tree]] has been built for the set of strings <math>D=\{S_1,S_2,\dots,S_K\}</math> of total length <math>n=n_1+n_2+\cdots+n_K</math>.
You can:

* Search for strings:
** Check if a string <math>P</math> of length <math>m</math> is a substring in <math>O(m)</math> time.<ref>{{harvtxt|Gusfield|1999}}, p.92.</ref>
** Find the first occurrence of the patterns <math>P_1,\dots,P_q</math> of total length <math>m</math> as substrings in <math>O(m)</math> time.
** Find all <math>z</math> occurrences of the patterns <math>P_1,\dots,P_q</math> of total length <math>m</math> as substrings in <math>O(m + z)</math> time.<ref>{{harvtxt|Gusfield|1999}}, p.123.</ref>
<!-- ** Search for a pattern <math>P</math> of length <math>m</math> with at most <math>k</math> mismatches in <math>O(m \sum_{r=0}^k * {m \choose r} (|\Sigma| - 1)^k + z)</math> time.-->
** Search for a [[regular expression]] ''P'' in time expected [[Sublinear time algorithm|sublinear]] in <math>n</math>.{{sfnp|Baeza-Yates|Gonnet|1996}}
** Find for each suffix of a pattern <math>P</math>, the length of the longest match between a prefix of <math>P[i\dots m]</math> and a substring in <math>D</math> in <math>\Theta(m)</math> time.<ref>{{harvtxt|Gusfield|1999}}, p.132.</ref> This is termed the '''matching statistics''' for <math>P</math>.
* Find properties of the strings:
** Find the [[longest common substring problem|longest common substrings]] of the string <math>S_i</math> and <math>S_j</math> in <math>\Theta(n_i + n_j)</math> time.<ref>{{harvtxt|Gusfield|1999}}, p.125.</ref>
** Find all [[maximal pair]]s, maximal repeats or supermaximal repeats in <math>\Theta(n + z)</math> time.<ref>{{harvtxt|Gusfield|1999}}, p.144.</ref>
** Find the [[Lempel–Ziv]] decomposition in <math>\Theta(n)</math> time.<ref>{{harvtxt|Gusfield|1999}}, p.166.</ref>
** Find the [[longest repeated substring problem|longest repeated substrings]] in <math>\Theta(n)</math> time.
** Find the most frequently occurring substrings of a minimum length in <math>\Theta(n)</math> time.
** Find the shortest strings from <math>\Sigma</math> that do not occur in <math>D</math>, in <math>O(n + z)</math> time, if there are <math>z</math> such strings.
** Find the shortest substrings occurring only once in <math>\Theta(n)</math> time.
** Find, for each <math>i</math>, the shortest substrings of <math>S_i</math> not occurring elsewhere in <math>D</math> in <math>\Theta(n)</math> time.

The suffix tree can be prepared for constant time [[lowest common ancestor]] retrieval between nodes in <math>\Theta(n)</math> time.<ref>{{harvtxt|Gusfield|1999}}, Chapter 8.</ref> One can then also:
* Find the longest common prefix between the suffixes <math>S_i[p..n_i]</math> and <math>S_j[q..n_j]</math> in <math>\Theta(1)</math>.<ref>{{harvtxt|Gusfield|1999}}, p.196.</ref>
* Search for a pattern ''P'' of length ''m'' with at most ''k'' mismatches in <math>O(k n + z)</math> time, where ''z'' is the number of hits.<ref>{{harvtxt|Gusfield|1999}}, p.200.</ref>
* Find all <math>z</math> maximal [[palindrome]]s in <math>\Theta(n)</math>,<ref>{{harvtxt|Gusfield|1999}}, p.198.</ref> or <math>\Theta(g n)</math> time if gaps of length <math>g</math> are allowed, or <math>\Theta(k n)</math> if <math>k</math> mismatches are allowed.<ref>{{harvtxt|Gusfield|1999}}, p.201.</ref>
* Find all <math>z</math> [[tandem repeats]] in <math>O(n \log n + z)</math>, and ''k''-mismatch tandem repeats in <math>O(k n \log (n/k) + z)</math>.<ref>{{harvtxt|Gusfield|1999}}, p.204.</ref>
* Find the [[longest common substring problem|longest common substrings]] to at least <math>k</math> strings in <math>D</math> for <math>k=2,\dots,K</math> in <math>\Theta(n)</math> time.<ref>{{harvtxt|Gusfield|1999}}, p.205.</ref>
* Find the [[longest palindromic substring]] of a given string (using the generalized suffix tree of the string and its reverse) in linear time.<ref>{{harvtxt|Gusfield|1999}}, pp.197–199.</ref>