Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Suffix tree
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Implementation== If each node and edge can be represented in <math>\Theta(1)</math> space, the entire tree can be represented in <math>\Theta(n)</math> space. The total length of all the strings on all of the edges in the tree is <math>O(n^2)</math>, but each edge can be stored as the position and length of a substring of {{mvar|S}}, giving a total space usage of <math>\Theta(n)</math> computer words. The worst-case space usage of a suffix tree is seen with a [[fibonacci word]], giving the full <math>2n</math> nodes. An important choice when making a suffix tree implementation is the parent-child relationships between nodes. The most common is using [[linked list]]s called '''sibling lists'''. Each node has a pointer to its first child, and to the next node in the child list it is a part of. Other implementations with efficient running time properties use [[hash map]]s, sorted or unsorted [[array data structure|array]]s (with [[dynamic array|array doubling]]), or [[Self-balancing binary search tree|balanced search tree]]s. We are interested in: * The cost of finding the child on a given character. * The cost of inserting a child. * The cost of enlisting all children of a node (divided by the number of children in the table below). Let {{mvar|σ}} be the size of the alphabet. Then you have the following costs:{{citation needed|date=October 2024}} {| class="wikitable" ! !Lookup !Insertion !Traversal |- | style="text-align: right;" |Sibling lists / unsorted arrays |{{math|''O''(''Ο'')}} |{{math|Ξ(1)}} |{{math|Ξ(1)}} |- | style="text-align: right;" |Bitwise sibling trees |{{math|''O''(log ''Ο'')}} |{{math|Ξ(1)}} |{{math|Ξ(1)}} |- | style="text-align: right;" |Hash maps |{{math|Ξ(1)}} |{{math|Ξ(1)}} |{{math|''O''(''Ο'')}} |- | style="text-align: right;" |Balanced search tree |{{math|''O''(log ''Ο'')}} |{{math|''O''(log ''Ο'')}} |{{math|''O''(1)}} |- | style="text-align: right;" |Sorted arrays |{{math|''O''(log ''Ο'')}} |{{math|''O''(''Ο'')}} |{{math|''O''(1)}} |- | style="text-align: right;" |Hash maps + sibling lists |{{math|''O''(1)}} |{{math|''O''(1)}} |{{math|''O''(1)}} |} The insertion cost is amortised, and that the costs for hashing are given for perfect hashing. The large amount of information in each edge and node makes the suffix tree very expensive, consuming about 10 to 20 times the memory size of the source text in good implementations. The [[suffix array]] reduces this requirement to a factor of 8 (for array including [[LCP array|LCP]] values built within 32-bit address space and 8-bit characters.) This factor depends on the properties and may reach 2 with usage of 4-byte wide characters (needed to contain any symbol in some [[UNIX-like]] systems, see [[wchar_t]]<!--- Don't replace by "wchar t": the type name in the C programming language uses the "_". --->) on 32-bit systems.{{citation needed|date=October 2024}} Researchers have continued to find smaller indexing structures.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)