Editing Red–black tree (section)

== Parallel algorithms ==
Parallel algorithms for constructing red–black trees from sorted lists of items can run in constant time or <math>O(\log \log n)</math> time, depending on the computer model, if the number of processors available is asymptotically proportional to the number <math>n</math> of items where <math>n\to\infty</math>. Fast search, insertion, and deletion parallel algorithms are also known.<ref>
{{cite journal
 | journal=Theoretical Computer Science
 | title=Parallel algorithms for red–black trees
 | first1=Heejin |last1=Park |first2=Kunsoo |last2=Park
 | volume=262
 | issue=1–2
 | pages=415–435
 | year=2001
 | doi=10.1016/S0304-3975(00)00287-5
 |quote=Our parallel algorithm for constructing a red–black tree from a sorted list of <math>n</math> items runs in <math>O(1)</math> time with <math>n</math> processors on the CRCW PRAM and runs in <math>O(\log \log n)</math> time with <math>n / \log \log n</math> processors on the EREW PRAM.
 | doi-access=free
 }}</ref>

The [[Join-based tree algorithms|join-based algorithms]] for red–black trees are parallel for bulk operations, including union, intersection, construction, filter, map-reduce, and so on.

=== Parallel bulk operations ===
Basic operations like insertion, removal or update can be parallelised by defining operations that process bulks of multiple elements. It is also possible to process bulks with several basic operations, for example bulks may contain elements to insert and also elements to remove from the tree.

The algorithms for bulk operations aren't just applicable to the red–black tree, but can be adapted to other sorted sequence data structures also, like the [[2–3 tree]], [[2–3–4 tree]] and [[(a,b)-tree]]. In the following different algorithms for bulk insert will be explained, but the same algorithms can also be applied to removal and update. Bulk insert is an operation that inserts each element of a sequence <math>I</math> into a tree <math>T</math>.

==== Join-based ====
This approach can be applied to every sorted sequence data structure that supports efficient join- and split-operations.<ref>{{cite book
 | last=Sanders
 | first=Peter
 | editor1-last=Mehlhorn
 | editor1-first=Kurt
 | editor2-last=Dietzfelbinger
 | editor2-first=Martin
 | editor3-last=Dementiev
 | editor3-first=Roman
 | year=2019
 | title=Sequential and Parallel Algorithms and Data Structures : The Basic Toolbox
 | series=Springer eBooks
 | location=Cham
 | publisher=Springer
 | pages=252–253
 | isbn=9783030252090
| doi=10.1007/978-3-030-25209-0
 | s2cid=201692657
 }}</ref>
The general idea is to split {{mvar|I}} and {{mvar|T}} in multiple parts and perform the insertions on these parts in parallel.

# First the bulk {{mvar|I}} of elements to insert must be sorted.
# After that, the algorithm splits {{mvar|I}} into <math>k \in \mathbb{N}^+</math> parts <math>\langle I_1, \cdots, I_k \rangle</math> of about equal sizes.
# Next the tree {{mvar|T}} must be split into {{mvar|k}} parts <math>\langle T_1, \cdots, T_k \rangle</math> in a way, so that for every <math>j \in \mathbb{N}^+ | \, 1 \leq j < k</math> following constraints hold:
##<math>\text{last}(I_j) < \text{first}(T_{j + 1})</math>
## <math>\text{last}(T_j) < \text{first}(I_{j + 1})</math>
# Now the algorithm inserts each element of <math>I_j</math> into <math>T_j</math> sequentially. This step must be performed for every {{mvar|j}}, which can be done by up to {{mvar|k}} processors in parallel.
# Finally, the resulting trees will be joined to form the final result of the entire operation.

Note that in Step 3 the constraints for splitting {{mvar|I}} assure that in Step 5 the trees can be joined again and the resulting sequence is sorted.
<gallery widths="400px" heights="250px" perrow="2">
BulkInsert JoinBased InitialTree.svg|initial tree
BulkInsert JoinBased SplitTree.svg|split I and T
BulkInsert JoinBased SplitTreeInserted.svg|insert into the split T
BulkInsert JoinBased JoinedTree.svg|join T
</gallery>

The pseudo code shows a simple divide-and-conquer implementation of the join-based algorithm for bulk-insert.
Both recursive calls can be executed in parallel.
The join operation used here differs from the version explained in this article, instead [[Join-based tree algorithms#Join2|join2]] is used, which misses the second parameter k.

 '''bulkInsert'''(T, I, k):
     I.sort()
     bulklInsertRec(T, I, k)
 
 '''bulkInsertRec'''(T, I, k):
     '''if''' k = 1:
         '''forall''' e '''in''' I: T.insert(e)
     '''else'''
         m := ⌊size(I) / 2⌋
         (T<sub>1</sub>, _, T<sub>2</sub>) := split(T, I[m])
         bulkInsertRec(T<sub>1</sub>, I[0 .. m], ⌈k / 2⌉)
             || bulkInsertRec(T<sub>2</sub>, I[m + 1 .. size(I) - 1], ⌊k / 2⌋)
         T ← join2(T<sub>1</sub>, T<sub>2</sub>)

===== Execution time =====
Sorting {{mvar|I}} is not considered in this analysis.
:{| 
|-
| #recursion levels || <math>\in O(\log k)</math>
|-
| T(split) + T(join) || <math>\in O(\log |T|)</math>
|-
| insertions per thread || <math>\in O\left(\frac{|I|}{k}\right)</math>
|-
| T(insert) || <math>\in O(\log |T|)</math>
|-
| {{nowrap|1='''T(bulkInsert) with {{mvar|k}} = #processors'''}} || <math>\in O\left(\log k \log |T| + \frac{|I|}{k} \log |T|\right)</math>
|}
This can be improved by using parallel algorithms for splitting and joining.
In this case the execution time is <math>\in O\left(\log |T| + \frac{|I|}{k} \log |T|\right)</math>.<ref>{{cite book
 | last1=Akhremtsev | first1=Yaroslav
 | last2=Sanders | first2=Peter
 | title=2016 IEEE 23rd International Conference on High Performance Computing (HiPC)
 | chapter=Fast Parallel Operations on Search Trees
 | pages=291–300
 | year=2016
 | isbn=978-1-5090-5411-4
 | arxiv=1510.05433
 | doi=10.1109/HiPC.2016.042
 }}</ref>

===== Work =====
:{| 
|-
| #splits, #joins || <math>\in O(k)</math>
|-
| W(split) + W(join) || <math>\in O(\log |T|)</math>
|-
| #insertions || <math>\in O(|I|)</math>
|-
| W(insert) || <math>\in O(\log |T|)</math>
|-
| '''W(bulkInsert)''' || <math>\in O(k \log |T| + |I| \log |T|)</math>
|}

==== Pipelining ====
Another method of parallelizing bulk operations is to use a [[Pipeline (computing)|pipelining]] approach.<ref>{{cite book
 | title=An introduction to parallel algorithms
 | last=Jájá
 | first=Joseph
 | location=Reading, Mass. [u.a.]
 | publisher=Addison-Wesley
 | year=1992
 | isbn=0201548569
 | url=http://zbmath.org/?q=an:0781.68009
 | pages=65–70
| zbl=0781.68009
 }}</ref>
This can be done by breaking the task of processing a basic operation up into a sequence of subtasks.
For multiple basic operations the subtasks can be processed in parallel by assigning each subtask to a separate processor.

# First the bulk {{mvar|I}} of elements to insert must be sorted.
# For each element in {{mvar|I}} the algorithm locates the according insertion position in {{mvar|T}}. This can be done in parallel for each element <math>\in I</math> since {{mvar|T}} won't be mutated in this process. Now {{mvar|I}} must be divided into subsequences {{mvar|S}} according to the insertion position of each element. For example <math>s_{n, \mathit{left}}</math> is the subsequence of {{mvar|I}} that contains the elements whose insertion position would be to the left of node {{mvar|n}}.
# The middle element <math>m_{n, \mathit{dir}}</math> of every subsequence <math>s_{n, \mathit{dir}}</math> will be inserted into {{mvar|T}} as a new node <math>n'</math>. This can be done in parallel for each <math>m_{n, \mathit{dir}}</math> since by definition the insertion position of each <math>m_{n, \mathit{dir}}</math> is unique. If <math>s_{n, \mathit{dir}}</math> contains elements to the left or to the right of <math>m_{n, \mathit{dir}}</math>, those will be contained in a new set of subsequences {{mvar|S}} as <math>s_{n', \mathit{left}}</math> or <math>s_{n', \mathit{right}}</math>.
# Now {{mvar|T}} possibly contains up to two consecutive red nodes at the end of the paths form the root to the leaves, which needs to be repaired. Note that, while repairing, the insertion position of elements <math>\in S</math> have to be updated, if the corresponding nodes are affected by rotations.
#* If two nodes have different nearest black ancestors, they can be repaired in parallel. Since at most four nodes can have the same nearest black ancestor, the nodes at the lowest level can be repaired in a constant number of parallel steps.
#* This step will be applied successively to the black levels above until {{mvar|T}} is fully repaired.
# The steps 3 to 5 will be repeated on the new subsequences until {{mvar|S}} is empty. At this point every element <math>\in I</math> has been inserted. Each application of these steps is called a ''stage''. Since the length of the subsequences in {{mvar|S}} is <math>\in O(|I|)</math> and in every stage the subsequences are being cut in half, the number of stages is <math>\in O(\log |I|)</math>.
#* Since all stages move up the black levels of the tree, they can be parallelised in a pipeline. Once a stage has finished processing one black level, the next stage is able to move up and continue at that level.
<gallery perrow="3" widths="250px" heights="180px">
BulkInsert Pipelining InitialTree.svg|Initial tree
BulkInsert Pipelining InsertPositions.svg|Find insert positions
BulkInsert Pipelining Stage1Insert.svg|Stage 1 inserts elements
BulkInsert Pipelining Stage1Repair.svg|Stage 1 begins to repair nodes
BulkInsert Pipelining Stage2Insert.svg|Stage 2 inserts elements
BulkInsert Pipelining Stage2Repair.svg|Stage 2 begins to repair nodes
BulkInsert Pipelining Stage3Insert.svg|Stage 3 inserts elements
BulkInsert Pipelining Stage3Repair1.svg|Stage 3 begins to repair nodes
BulkInsert Pipelining Stage3Repair2.svg|Stage 3 continues to repair nodes
</gallery>

===== Execution time =====
Sorting {{mvar|I}} is not considered in this analysis.
Also, <math>|I|</math> is assumed to be smaller than <math>|T|</math>, otherwise it would be more efficient to construct the resulting tree from scratch.
:{|
|-
| T(find insert position) || <math>\in O(\log |T|)</math>
|-
| #stages || <math>\in O(\log |I|)</math>
|-
| T(insert) + T(repair) || <math>\in O(\log |T|)</math>
|- style="vertical-align:top"
| '''T(bulkInsert) with <math>|I|</math> ~ #processors''' || <math>\in O(\log |I| + 2 \cdot \log |T|)</math><br/><math> = O(\log |T|)</math>
|}

===== Work =====
:{|
|-
| W(find insert positions) || <math>\in O(|I| \log |T|)</math>
|-
| #insertions, #repairs || <math>\in O(|I|)</math>
|-
| W(insert) + W(repair) || <math>\in O(\log |T|)</math>
|- style="vertical-align:top"
| '''W(bulkInsert)''' || <math>\in O(2 \cdot |I| \log |T|)</math><br/><math>= O(|I| \log |T|)</math>
|}