Editing B-tree (section)

==Informal description==
[[File:B-tree.svg|thumb|400px|right|A B-tree {{Harv|Bayer|McCreight|1972}} of order 5 {{Harv|Knuth|1998}}.]]

===Node structure===
As with other trees, B-trees can be represented as a collection of three types of nodes: ''root'', ''internal'' (a.k.a. interior), and ''leaf''. 

Note the following variable definitions:

* {{mvar|K}}&thinsp;: Maximum number of potential search keys for each node in a B-tree. (this value is constant over the entire tree).
* {{mvar|pt{{sub|i}}}}&thinsp;: The pointer to a child node which starts a sub-tree.
* {{mvar|pr{{sub|i}}}}&thinsp;: The pointer to a record which stores the data.
* {{mvar|k{{sub|i}}}}&thinsp;: The search key at the zero-based node index {{mvar|i}}.

In B-trees, the following properties are maintained for these nodes:

* If {{mvar|k{{sub|i}}}} exists in any node in a B-tree, then {{math|''k''{{sub|''i''-1}}}} exists in that node where <math>i \ge 1</math>.
* All leaf nodes have the same number of ancestors (i.e., they are all at the same depth).

Each internal node in a B-tree has the following format:
{| class="wikitable" style="margin-left: auto; margin-right: auto; border: none;"
|+Internal node structure
!''pt''<sub>0</sub>
!''k''<sub>0</sub>
!''pt''<sub>1</sub>
!''pr''<sub>0</sub>
!''k''<sub>1</sub>
!''pt''<sub>2</sub>
!''pr''<sub>1</sub>
!...
!''k''<sub>''K''-1</sub>
!''pt''<sub>''K''</sub>
!''pr''<sub>''K''-1</sub>
|}
{| class="wikitable" style="margin-left: auto; margin-right: auto; border: none;"
|+Internal node pointer structure
|
! {{math|''pt''{{sub|0}}}}
! colspan="3" | {{mvar|pt{{sub|i}}}} 
|
! colspan="2" | {{mvar|pr{{sub|i}}}} 
|-
! {{rh|align=right}} | when 
! {{math|''k''{{sub|0}}}} exists
! {{math|''k''{{sub|''i''-1}}}} and {{mvar|k{{sub|i}}}}<br/>exist
! {{math|''k''{{sub|''i''-1}}}} exists,<br/>and {{mvar|k{{sub|i}}}} does not exist
! {{math|''k''{{sub|''i''-1}}}} and {{mvar|k{{sub|i}}}}<br/>do not exist
|
! {{mvar|k{{sub|i}}}} exists
! {{mvar|k{{sub|i}}}} does not exist
|-
! {{rh|align=right}} | {{mvar|P{{sub|t}}}}&thinsp;:<br/>Points to subtree<br/>in which all search keys
| {{math|''P{{sub|t}}'' < ''k''{{sub|0}}}}
| {{math|''k''{{sub|''i''-1}} < {{var|P{{sub|t}}}} < {{var|k{{sub|i}}}}}}
| {{math|''P{{sub|t}}'' > ''k''{{sub|''i''-1}}}}
| {{mvar|pt{{sub|i}}}} is empty.
|
| Points to a record with a value<br/>{{math|''P{{sub|r}}'' {{=}} ''k{{sub|i}}''}}
| {{mvar|pr{{sub|i}}}} is empty.
|}
Each leaf node in a B-tree has the following format:
{| class="wikitable" style="margin-left: auto; margin-right: auto; border: none;"
|+Leaf node structure
!''pr''<sub>0</sub>
!''k''<sub>0</sub>
!''pr''<sub>1</sub>
!''k''<sub>1</sub>
!...
!''pr''<sub>''K''-1</sub>
!''k''<sub>''K''-1</sub>
|}
{| class="wikitable" style="margin-left: auto; margin-right: auto; border: none;"
|+Leaf node pointer structure
!{{mvar|pr{{sub|i}}}} when {{mvar|k{{sub|i}}}} exists
!{{mvar|pr{{sub|i}}}} when {{mvar|k{{sub|i}}}} does not exist
|-
|Points to a record with a value equal to {{mvar|k{{sub|i}}}}.
|Here, {{mvar|pr{{sub|i}}}} is empty.
|}
The node bounds are summarized in the table below:
{| class="wikitable" style="margin-left: auto; margin-right: auto; border: none;"
! rowspan="2" | Node type
! colspan="2" | number<br/> of keys
! colspan="2" | number<br/> of child nodes
|-
! Min !! Max
! Min !! Max
|-
|Root node when it is a leaf node
|0
|{{mvar|K}}
|0
|0
|-
|Root node when it is an internal node
|1
|{{mvar|K}}
|2<ref name="Navathe2">{{cite book |last=Navathe |first=Ramez Elmasri, Shamkant B. |title=Fundamentals of database systems |publisher=Pearson Education |year=2010 |isbn=9780136086208 |edition=6th |location=Upper Saddle River, N.J. |pages=652–660}}</ref>
|<math> K + 1 </math>
|-
|Internal node
|<math> \lfloor K/2 \rfloor </math>
|{{mvar|K}}
|<math> \lceil (K + 1)/2 \rceil \equiv \lfloor K / 2 \rfloor + 1 </math>
|<math> K + 1 </math>
|-
|Leaf node
|<math>  \lceil K / 2 \rceil </math>
|{{mvar|K}}
|0
|0
|}

===Insertion and deletion===
To maintain the predefined range of child nodes, internal nodes may be joined or split.

Usually, the number of keys is chosen to vary between {{mvar|d}} and <math>2d</math>, where {{mvar|d}} is the minimum number of keys, and <math>d+1</math> is the minimum [[Outdegree#Indegree and outdegree|degree]] or [[branching factor]] of the tree. The factor of 2 will guarantee that nodes can be split or combined.

If an internal node has <math>2d</math> keys, then adding a key to that node can be accomplished by splitting the hypothetical <math>2d+1</math> key node into two {{mvar|d}} key nodes and moving the key that would have been in the middle to the parent node. Each split node has the required minimum number of keys. Similarly, if an internal node and its neighbor each have {{mvar|d}} keys, then a key may be deleted from the internal node by combining it with its neighbor. Deleting the key would make the internal node have <math>d-1</math> keys; joining the neighbor would add {{mvar|d}} keys plus one more key brought down from the neighbor's parent. The result is an entirely full node of <math>2d</math> keys.

A B-tree is kept balanced after insertion by splitting a would-be overfilled node, of <math>2d+1</math> keys, into two {{mvar|d}}-key siblings and inserting the mid-value key into the parent. Depth only increases when the root is split, maintaining balance. Similarly, a B-tree is kept balanced after deletion by merging or redistributing keys among siblings to maintain the {{mvar|d}}-key minimum for non-root nodes. A merger reduces the number of keys in the parent potentially forcing it to merge or redistribute keys with its siblings, and so on. The only change in depth occurs when the root has two children, of {{mvar|d}} and (transitionally) <math>d-1</math> keys, in which case the two siblings and parent are merged, reducing the depth by one.

This depth will increase slowly as elements are added to the tree, but an increase in the overall depth is infrequent, and results in all leaf nodes being one more node farther away from the root.

===Comparison to other trees===
Because a range of child nodes is permitted, B-trees do not need re-balancing as frequently as other self-balancing search trees, but may waste some space, since nodes are not entirely full.

B-trees have substantial advantages over alternative implementations when the time to access the data of a node greatly exceeds the time spent processing that data, because then the cost of accessing the node may be amortized over multiple operations within the node. This usually occurs when the node data are in [[secondary storage]] such as [[hard drive|disk drives]]. By maximizing the number of keys within each [[internal node]], the height of the tree decreases and the number of expensive node accesses is reduced. In addition, rebalancing of the tree occurs less often. The maximum number of child nodes depends on the information that must be stored for each child node and the size of a full [[Block (data storage)|disk block]] or an analogous size in secondary storage. While 2–3 B-trees are easier to explain, practical B-trees using secondary storage need a large number of child nodes to improve performance.

===Variants===
The term '''B-tree''' may refer to a specific design or a general class of designs. In the narrow sense, a B-tree stores keys in its internal nodes but need not store those keys in the records at the leaves. The general class includes variations such as the [[B+ tree]], the B<sup>*</sup> tree and the B<sup>*+</sup> tree.
* In the [[B+ tree]], the internal nodes do not store any pointers to records, thus all pointers to records are stored in the leaf nodes. In addition, a leaf node may include a pointer to the next leaf node to speed up sequential access.{{sfn|Comer|1979}} Because B+ tree internal nodes have fewer pointers, each node can hold more keys, causing the tree to be shallower and thus faster to search.
* The B<sup>*</sup> tree balances more neighboring internal nodes to keep the internal nodes more densely packed.{{sfn|Comer|1979}} This variant ensures non-root nodes are at least 2/3 full instead of 1/2.{{sfn|Knuth|1998|p=488}} As the most costly part of operation of inserting the node in B-tree is splitting the node, B<sup>*</sup>-trees are created to postpone splitting operation as long as they can.<ref name="Milo">{{cite book |last=Tomašević |first=Milo |title=Algorithms and Data Structures |year=2008 |publisher=Akademska misao |location=Belgrade, Serbia |isbn=978-86-7466-328-8 |pages=274–275}}</ref> To maintain this, instead of immediately splitting up a node when it gets full, its keys are shared with a node next to it. This spill operation is less costly to do than split, because it requires only shifting the keys between existing nodes, not allocating memory for a new one.{{r|Milo}} For inserting, first it is checked whether the node has some free space in it, and if so, the new key is just inserted in the node. However, if the node is full (it has {{math|''m'' − 1}} keys, where {{mvar| m}} is the order of the tree as maximum number of pointers to subtrees from one node), it needs to be checked whether the right sibling exists and has some free space. If the right sibling has {{math|''j'' < ''m'' − 1}} keys, then keys are redistributed between the two sibling nodes as evenly as possible. For this purpose, {{mvar|''m'' &minus; 1}} keys from the current node, the new key inserted, one key from the parent node and {{mvar|j}} keys from the sibling node are seen as an ordered array of {{math|''m'' + ''j'' + 1}} keys. The array becomes split by half, so that {{math|{{floor|(''m'' + ''j'' + 1)/2}}}} lowest keys stay in the current node, the next (middle) key is inserted in the parent and the rest go to the right sibling.{{r|Milo}} (The newly inserted key might end up in any of the three places.) The situation when right sibling is full, and left isn't is analogous.{{r|Milo}} When both the sibling nodes are full, then the two nodes (current node and a sibling) are split into three and one more key is shifted up the tree, to the parent node.{{r|Milo}} If the parent is full, then spill/split operation propagates towards the root node.{{r|Milo}} Deleting nodes is somewhat more complex than inserting however.
* The B<sup>*+</sup> tree combines the main [[B+ tree]] and B<sup>*</sup> tree features together.<ref>{{Cite journal|language=en|url=https://ispranproceedings.elpub.ru/jour/article/view/1188|title=SQLite RDBMS Extension for Data Indexing Using B-tree Modifications|author=Rigin A. M., Shershakov S. A.|journal=Proceedings of the Institute for System Programming of the RAS|date=2019-09-10|volume=31 |issue=3 |pages=203–216 |publisher=Institute for System Programming of the RAS (ISP RAS)|doi=10.15514/ispras-2019-31(3)-16|s2cid=203144646 |access-date=2021-08-29|doi-access=free}}</ref>
* B-trees can be turned into [[order statistic tree]]s to allow rapid searches for the Nth record in key order, or counting the number of records between any two records, and various other related operations.<ref>{{Cite web |title=Counted B-Trees |url=https://www.chiark.greenend.org.uk/~sgtatham/algorithms/cbtree.html |access-date=2024-12-27 |website=www.chiark.greenend.org.uk}}</ref>