Van Emde Boas tree

Template:Short description {{ safesubst:#invoke:Unsubst||date=__DATE__ |$B= Template:Ambox Template:DMCA }} Template:Infobox data structure

A van Emde Boas tree ({{#invoke:IPA|main}}), also known as a vEB tree or van Emde Boas priority queue, is a tree data structure which implements an associative array with Template:Math-bit integer keys. It was invented by a team led by Dutch computer scientist Peter van Emde Boas in 1975.<ref>Peter van Emde Boas: Preserving order in a forest in less than logarithmic time (Proceedings of the 16th Annual Symposium on Foundations of Computer Science 10: 75-84, 1975)</ref> It performs all operations in Template:Math time (assuming that an <math>m</math> bit operation can be performed in constant time), or equivalently in <math>O(\log \log M)</math> time, where <math>M = 2^m</math> is the largest element that can be stored in the tree. The parameter <math>M</math> is not to be confused with the actual number of elements stored in the tree, by which the performance of other tree data-structures is often measured.

The standard vEB tree has inadequate space efficiency. For example, for storing 32-bit integers (i.e., when <math>m = 32</math>), it requires <math>M = 2^{32}</math> bits of storage. However, similar data structures with equally good time efficiency and space efficiency of <math>O(n)</math> (where <math>n</math> is the number of stored elements) exist, and vEB trees can be modified to require only <math>O(n \log M)</math> space.

Supported operationsEdit

A vEB supports the operations of an ordered associative array, which includes the usual associative array operations along with two more order operations, FindNext and FindPrevious:<ref>Gudmund Skovbjerg Frandsen: Dynamic algorithms: Course notes on van Emde Boas trees (PDF) (University of Aarhus, Department of Computer Science) Template:Dead link</ref>

Insert: insert a key/value pair with an Template:Math-bit key
Delete: remove the key/value pair with a given key
Lookup: find the value associated with a given key
FindNext: find the key/value pair with the smallest key which is greater than a given Template:Math
FindPrevious: find the key/value pair with the largest key which is smaller than a given Template:Math

A vEB tree also supports the operations Minimum and Maximum, which return the minimum and maximum element stored in the tree respectively.<ref> Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms, Third Edition. MIT Press, 2009. Template:ISBN. Chapter 20: The van Emde Boas tree, pp. 531–560. </ref> These both run in <math>O(1)</math> time, since the minimum and maximum element are stored as attributes in each tree.

FunctionEdit

File:VebDiagram.svg

An example van Emde Boas tree with dimension 5 and the root's aux structure after 1, 2, 3, 5, 8 and 10 have been inserted.

Let <math>\log_2 m = k</math> for some integer <math>k</math>. Define <math>M = 2^m</math>. A vEB tree Template:Mono over the universe <math>\{0, \ldots, M - 1\}</math> has a root node that stores an array Template:Mono of length <math>\sqrt{M}</math>. Template:Mono is a pointer to a vEB tree that is responsible for the values <math>\{i \sqrt{M}, \ldots, (i + 1) \sqrt{M} - 1\}</math>. Additionally, T stores two values Template:Mono and Template:Mono as well as an auxiliary vEB tree Template:Mono.

Data is stored in a vEB tree as follows: The smallest value currently in the tree is stored in Template:Mono and largest value is stored in Template:Mono. Note that Template:Mono is not stored anywhere else in the vEB tree, while Template:Mono is. If T is empty then we use the convention that Template:Mono and Template:Mono. Any other value <math>x</math> is stored in the subtree Template:Mono where <math>i = \lfloor x / \sqrt{M} \rfloor</math>. The auxiliary tree Template:Mono keeps track of which children are non-empty, so Template:Mono contains the value <math>j</math> if and only if Template:Mono is non-empty.

FindNextEdit

The operation Template:Mono that searches for the successor of an element x in a vEB tree proceeds as follows: If Template:Mono then the search is complete, and the answer is Template:Mono. If Template:Mono then the next element does not exist, return M. Otherwise, let <math>i = \lfloor x / \sqrt{M} \rfloor</math>. If Template:Mono, then the value being searched for is contained in Template:Mono, so the search proceeds recursively in Template:Mono. Otherwise, we search for the successor of the value i in Template:Mono. This gives us the index j of the first subtree that contains an element larger than x. The algorithm then returns Template:Mono. The element found on the children level needs to be composed with the high bits to form a complete next element.

function FindNext(T, x)
    if x < T.min then
        return T.min
    if x ≥ T.max then // no next element
        return M
    i = floor(x/<math>\sqrt{M}</math>)
    lo = x mod <math>\sqrt{M}</math>
    
    if lo < T.children[i].max then
        return (<math>\sqrt{M}</math> i) + FindNext(T.children[i], lo)
    j = FindNext(T.aux, i)
    return (<math>\sqrt{M}</math> j) + T.children[j].min
end

Note that, in any case, the algorithm performs <math>O(1)</math> work and then possibly recurses on a subtree over a universe of size <math>M^{1/2}</math> (an <math>m/2</math> bit universe). This gives a recurrence for the running time of <math>T(m) = T(m/2) + O(1)</math>, which resolves to <math>O(\log m) = O(\log \log M)</math>.

InsertEdit

The call Template:Mono that inserts a value Template:Mono into a vEB tree Template:Mono operates as follows:

If T is empty then we set Template:Mono and we are done.
Otherwise, if Template:Mono then we insert Template:Mono into the subtree Template:Mono responsible for Template:Mono and then set Template:Mono. If Template:Mono was previously empty, then we also insert Template:Mono into Template:Mono
Otherwise, if Template:Mono then we insert Template:Mono into the subtree Template:Mono responsible for Template:Mono and then set Template:Mono. If Template:Mono was previously empty, then we also insert Template:Mono into Template:Mono
Otherwise, Template:Mono so we insert Template:Mono into the subtree Template:Mono responsible for Template:Mono. If Template:Mono was previously empty, then we also insert Template:Mono into Template:Mono.

In code:

function Insert(T, x)
    if T.min == x || T.max == x then // x is already inserted
        return
    if T.min > T.max then // T is empty
        T.min = T.max = x;
        return
    if x < T.min then
        swap(x, T.min)
    if x > T.max then
        T.max = x
    i = floor(x / <math>\sqrt{M}</math>)
    lo = x mod <math>\sqrt{M}</math>
    Insert(T.children[i], lo)
    if T.children[i].min == T.children[i].max then
        Insert(T.aux, i)
end

The key to the efficiency of this procedure is that inserting an element into an empty vEB tree takes Template:Math time. So, even though the algorithm sometimes makes two recursive calls, this only occurs when the first recursive call was into an empty subtree. This gives the same running time recurrence of Template:Tmath as before.

DeleteEdit

Deletion from vEB trees is the trickiest of the operations. The call Template:Mono that deletes a value x from a vEB tree T operates as follows:

If Template:Mono then x is the only element stored in the tree and we set Template:Mono and Template:Mono to indicate that the tree is empty.
Otherwise, if Template:Mono then we need to find the second-smallest value y in the vEB tree, delete it from its current location, and set Template:Mono. The second-smallest value y is Template:Mono, so it can be found in Template:Math time. We delete y from the subtree that contains it.
If Template:Mono and Template:Mono then we delete x from the subtree Template:Mono that contains x.
If Template:Mono then we will need to find the second-largest value y in the vEB tree and set Template:Mono. We start by deleting x as in previous case. Then value y is either Template:Mono or Template:Mono, so it can be found in Template:Math time.
In any of the above cases, if we delete the last element x or y from any subtree Template:Mono then we also delete i from Template:Mono.

In code:

function Delete(T, x)
    if T.min == T.max == x then
        T.min = M
        T.max = −1
        return
    if x == T.min then
        hi = T.aux.min * <math>\sqrt{M}</math>
        j = T.aux.min
        T.min = x = hi + T.children[j].min
    i = floor(x / <math>\sqrt{M}</math>)
    lo = x mod <math>\sqrt{M}</math>
    Delete(T.children[i], lo)
    if T.children[i] is empty then
        Delete(T.aux, i)
    if x == T.max then
        if T.aux is empty then
            T.max = T.min
        else
            hi = T.aux.max * <math>\sqrt{M}</math>
            j = T.aux.max
            T.max = hi + T.children[j].max
end

Again, the efficiency of this procedure hinges on the fact that deleting from a vEB tree that contains only one element takes only constant time. In particular, the second Delete call only executes if x was the only element in Template:Mono prior to the deletion.

In practiceEdit

The assumption that Template:Math is an integer is unnecessary. The operations <math>x\sqrt{M}</math> and <math>x\bmod\sqrt{M}</math> can be replaced by taking only higher-order Template:Math and the lower-order Template:Math bits of Template:Mvar, respectively. On any existing machine, this is more efficient than division or remainder computations.

In practical implementations, especially on machines with shift-by-k and find first zero instructions, performance can further be improved by switching to a bit array once Template:Mvar equal to the word size (or a small multiple thereof) is reached. Since all operations on a single word are constant time, this does not affect the asymptotic performance, but it does avoid the majority of the pointer storage and several pointer dereferences, achieving a significant practical savings in time and space with this trick.

An optimization of vEB trees is to discard empty subtrees. This makes vEB trees quite compact when they contain many elements, because no subtrees are created until something needs to be added to them. Initially, each element added creates about Template:Math new trees containing about Template:Math pointers all together. As the tree grows, more and more subtrees are reused, especially the larger ones.

The implementation described above uses pointers and occupies a total space of Template:Math, proportional to the size of the key universe. This can be seen as follows. The recurrence is <math> S(M) = O( \sqrt{M}) + (\sqrt{M}+1) \cdot S(\sqrt{M}) </math>. Resolving that would lead to <math> S(M) \in (1 + \sqrt{M})^{\log \log M} + \log \log M \cdot O( \sqrt{M} )</math>. One can, fortunately, also show that Template:Math by induction.<ref>{{#invoke:citation/CS1|citation |CitationClass=web }}</ref>

Similar structuresEdit

The Template:Math space usage of vEB trees is an enormous overhead unless a large fraction of the universe of keys is being stored. This is one reason why vEB trees are not popular in practice. This limitation can be addressed by changing the array used to store children to another data structure. One possibility is to use only a fixed number of bits per level, which results in a trie. Alternatively, each array may be replaced by a hash table, reducing the space to Template:Math (where Template:Mvar is the number of elements stored in the data structure) at the expense of making the data structure randomized.

x-fast tries and the more complicated y-fast tries have comparable update and query times to vEB trees and use randomized hash tables to reduce the space used. x-fast tries use Template:Math space while y-fast tries use Template:Math space.

Fusion trees are another type of tree data structure that implements an associative array on w-bit integers on a finite universe. They use word-level parallelism and bit manipulation techniques to achieve Template:Math time for predecessor/successor queries and updates, where Template:Math is the word size.<ref>{{#invoke:citation/CS1|citation |CitationClass=web }}</ref> Fusion trees use Template:Math space and can be made dynamic with hashing or exponential trees.

ImplementationsEdit

There is a verified implementation in Isabelle (proof assistant).<ref>{{#invoke:citation/CS1|citation |CitationClass=web }}</ref> Both functional correctness and time bounds are proved. Efficient imperative Standard ML code can be generated.

ReferencesEdit

Template:Reflist