Editing B-tree (section)

==Algorithms==
{{mi|
{{confusing|reason=the discussion below uses "element", "value", "key", "separator", and "separation value" to mean essentially the same thing. The terms are not clearly defined. There are some subtle issues at the root and leaves|date=February 2012}}
{{copyedit|date=December 2024}}
}}

===Search===
Searching is similar to searching a binary search tree. Starting at the root, the tree is recursively traversed from top to bottom. At each level, the search reduces its field of view to the child pointer (subtree) whose range includes the search value. A subtree's range is defined by the values, or keys, contained in its parent node. These limiting values are also known as separation values.

[[Binary search algorithm|Binary search]] is typically (but not necessarily) used within nodes to find the separation values and child tree of interest.

===Insertion===
[[Image:B tree insertion example.png|thumb|A B-tree insertion example with each iteration. The nodes of this B-tree have at most 3 children (Knuth order 3).]]

All insertions start at a leaf node. To insert a new element, search the tree to find the leaf node where the new element should be added. Insert the new element into that node with the following steps:

# If the node contains fewer than the maximum allowed number of elements, then there is room for the new element. Insert the new element in the node, keeping the node's elements ordered.
# Otherwise the node is full, evenly split it into two nodes so:
## A single median is chosen from among the leaf's elements and the new element that is being inserted.
## Values less than the median are put in the new left node, and values greater than the median are put in the new right node, with the median acting as a separation value.
## The separation value is inserted in the node's parent, which may cause it to be split, and so on. If the node has no parent (i.e., the node was the root), create a new root above this node (increasing the height of the tree).

If the splitting goes all the way up to the root, it creates a new root with a single separator value and two children, which is why the lower bound on the size of internal nodes does not apply to the root. The maximum number of elements per node is ''U''−1. When a node is split, one element moves to the parent, but one element is added. So, it must be possible to divide the maximum number ''U''−1 of elements into two legal nodes. If this number is odd, then ''U''=2''L'' and one of the new nodes contains (''U''−2)/2 = ''L''−1 elements, and hence is a legal node, and the other contains one more element, and hence it is legal too. If ''U''−1 is even, then ''U''=2''L''−1, so there are 2''L''−2 elements in the node. Half of this number is ''L''−1, which is the minimum number of elements allowed per node.

An alternative algorithm supports a single pass down the tree from the root to the node where the insertion will take place, splitting any full nodes encountered on the way pre-emptively. This prevents the need to recall the parent nodes into memory, which may be expensive if the nodes are on secondary storage. However, to use this algorithm, we must be able to send one element to the parent and split the remaining ''U''−2 elements into two legal nodes, without adding a new element. This requires ''U'' = 2''L'' rather than ''U'' = 2''L''−1, which accounts for why some{{which|date=September 2019}} textbooks impose this requirement in defining B-trees.

===Deletion===
There are two popular strategies for deletion from a B-tree.

# Locate and delete the item, then restructure the tree to retain its invariants, '''OR'''
# Do a single pass down the tree, but before entering (visiting) a node, restructure the tree so that once the key to be deleted is encountered, it can be deleted without triggering the need for any further restructuring

The algorithm below uses the former strategy.

There are two special cases to consider when deleting an element:

# The element in an internal node is a separator for its child nodes
# Deleting an element may put its node under the minimum number of elements and children

The procedures for these cases are in order below.

====Deletion from a leaf node====
# Search for the value to delete.
# If the value is in a leaf node, simply delete it from the node.
# If underflow happens, rebalance the tree as described in section "Rebalancing after deletion" below.

====Deletion from an internal node====
Each element in an internal node acts as a separation value for two subtrees, therefore we need to find a replacement for separation. Note that the largest element in the left subtree is still less than the separator. Likewise, the smallest element in the right subtree is still greater than the separator. Both of those elements are in leaf nodes, and either one can be the new separator for the two subtrees. Algorithmically described below:

# Choose a new separator (either the largest element in the left subtree or the smallest element in the right subtree), remove it from the leaf node it is in, and replace the element to be deleted with the new separator.
# The previous step deleted an element (the new separator) from a leaf node. If that leaf node is now deficient (has fewer than the required number of nodes), then rebalance the tree starting from the leaf node.

====Rebalancing after deletion====
Rebalancing starts from a leaf and proceeds toward the root until the tree is balanced. If deleting an element from a node has brought it under the minimum size, then some elements must be redistributed to bring all nodes up to the minimum. Usually, the redistribution involves moving an element from a sibling node that has more than the minimum number of nodes. That redistribution operation is called a '''rotation'''. If no sibling can spare an element, then the deficient node must be '''merged''' with a sibling. The merge causes the parent to lose a separator element, so the parent may become deficient and need rebalancing. The merging and rebalancing may continue all the way to the root. Since the minimum element count doesn't apply to the root, making the root be the only deficient node is not a problem. The algorithm to rebalance the tree is as follows:
* If the deficient node's right sibling exists and has more than the minimum number of elements, then rotate left
*# Copy the separator from the parent to the end of the deficient node (the separator moves down; the deficient node now has the minimum number of elements)
*# Replace the separator in the parent with the first element of the right sibling (right sibling loses one node but still has at least the minimum number of elements)
*# The tree is now balanced
* Otherwise, if the deficient node's left sibling exists and has more than the minimum number of elements, then rotate right
*# Copy the separator from the parent to the start of the deficient node (the separator moves down; deficient node now has the minimum number of elements)
*# Replace the separator in the parent with the last element of the left sibling (left sibling loses one node but still has at least the minimum number of elements)
*# The tree is now balanced
* Otherwise, if both immediate siblings have only the minimum number of elements, then merge with a sibling sandwiching their separator taken off from their parent
*# Copy the separator to the end of the left node (the left node may be the deficient node or it may be the sibling with the minimum number of elements)
*# Move all elements from the right node to the left node (the left node now has the maximum number of elements, and the right node – empty)
*# Remove the separator from the parent along with its empty right child (the parent loses an element)
*#* If the parent is the root and now has no elements, then free it and make the merged node the new root (tree becomes shallower)
*#* Otherwise, if the parent has fewer than the required number of elements, then rebalance the parent<ref>{{cite web |url=https://www.cs.rhodes.edu/~kirlinp/courses/db/f16/handouts/btrees-deletion.pdf |archive-url=https://ghostarchive.org/archive/20221009/https://www.cs.rhodes.edu/~kirlinp/courses/db/f16/handouts/btrees-deletion.pdf |archive-date=2022-10-09 |url-status=live |title=Deletion in a B-tree |website=cs.rhodes.edu |access-date=24 May 2022}}</ref>

:<small>'''Note''': The rebalancing operations are different for B+ trees (e.g., rotation is different because parent has copy of the key) and B<sup>*</sup>-tree (e.g., three siblings are merged into two siblings).</small>

===Sequential access===
While freshly loaded databases tend to have good sequential behaviour, this behaviour becomes increasingly difficult to maintain as a database grows, resulting in more random I/O and performance challenges.<ref>{{cite web |url=http://www.cs.sunysb.edu/~bender/pub/cache-oblivious-btree.ps |title=Cache Oblivious B-trees |publisher=State University of New York (SUNY) at Stony Brook |access-date=2011-01-17}}</ref>

===Initial construction===
{{Unreferenced section|date=April 2018}}
A common special case is adding a large amount of ''pre-sorted'' data into an initially empty B-tree. While it is quite possible to simply perform a series of successive inserts, inserting sorted data results in a tree composed almost entirely of half-full nodes. Instead, a special "bulk loading" [[algorithm]] can be used to produce a more efficient tree with a higher branching factor.

When the input is sorted, all insertions are at the rightmost edge of the tree, and in particular any time a node is split, we are guaranteed that no more insertions will take place in the left half. When bulk loading, we take advantage of this, and instead of splitting overfull nodes evenly, split them as ''unevenly'' as possible: leave the left node completely full and create a right node with zero keys and one child (in violation of the usual B-tree rules).

At the end of bulk loading, the tree is composed almost entirely of completely full nodes; only the rightmost node on each level may be less than full. Because those nodes may also be less than ''half'' full, to re-establish the normal B-tree rules, combine such nodes with their (guaranteed full) left siblings and divide the keys to produce two nodes at least half full. The only node which lacks a full left sibling is the root, which is permitted to be less than half full.