Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Suffix tree
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==External construction== Though linear, the memory usage of a suffix tree is significantly higher than the actual size of the sequence collection. For a large text, construction may require external memory approaches. There are theoretical results for constructing suffix trees in external memory. The algorithm by {{harvtxt|Farach-Colton|Ferragina|Muthukrishnan|2000}} is theoretically optimal, with an I/O complexity equal to that of sorting. However the overall intricacy of this algorithm has prevented, so far, its practical implementation.{{sfnp|Smyth|2003}} On the other hand, there have been practical works for constructing disk-based suffix trees which scale to (few) GB/hours. The state of the art methods are TDD,<ref name="tdd">{{harvtxt|Tata|Hankins|Patel|2003}}.</ref> TRELLIS,<ref name="trellis">{{harvtxt|Phoophakdee|Zaki|2007}}.</ref> DiGeST,<ref name="digest">{{harvtxt|Barsky|Stege|Thomo|Upton|2008}}.</ref> and B<sup>2</sup>ST.<ref name="b2st">{{harvtxt|Barsky|Stege|Thomo|Upton|2009}}.</ref> TDD and TRELLIS scale up to the entire human genome resulting in a disk-based suffix tree of a size in the tens of gigabytes.<ref name="tdd" /><ref name="trellis" /> However, these methods cannot handle efficiently collections of sequences exceeding 3 GB.<ref name="digest" /> DiGeST performs significantly better and is able to handle collections of sequences in the order of 6 GB in about 6 hours.<ref name="digest" /> All these methods can efficiently build suffix trees for the case when the tree does not fit in main memory, but the input does. The most recent method, B<sup>2</sup>ST,<ref name="b2st" /> scales to handle inputs that do not fit in main memory. ERA is a recent parallel suffix tree construction method that is significantly faster. ERA can index the entire human genome in 19 minutes on an 8-core desktop computer with 16 GB RAM. On a simple Linux cluster with 16 nodes (4 GB RAM per node), ERA can index the entire human genome in less than 9 minutes.{{sfnp|Mansour|Allam|Skiadopoulos|Kalnis|2011}}
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)