Editing Priority queue (section)

== Parallel priority queue ==
Parallelization can be used to speed up priority queues, but requires some changes to the priority queue interface. The reason for such changes is that a sequential update usually only has <math display="inline">O(1)</math> or <math display="inline">O(\log n)</math> cost, and there is no practical gain to parallelize such an operation. One possible change is to allow the concurrent access of multiple processors to the same priority queue. The second possible change is to allow batch operations that work on <math display="inline">k</math> elements, instead of just one element. For example, ''extractMin'' will remove the first <math display="inline">k</math> elements with the highest priority.

=== Concurrent parallel access ===
If the priority queue allows concurrent access, multiple processes can perform operations concurrently on that priority queue. However, this raises two issues. First of all, the definition of the semantics of the individual operations is no longer obvious. For example, if two processes want to extract the element with the highest priority, should they get the same element or different ones? This restricts parallelism on the level of the program using the priority queue. In addition, because multiple processes have access to the same element, this leads to contention.
[[File:concurrent_prio_queue_conflict.svg|thumb|Node 3 is inserted and sets the pointer of node 2 to node 3. Immediately after that, node 2 is deleted and the pointer of node 1 is set to node 4. Now node 3 is no longer reachable.]]

The concurrent access to a priority queue can be implemented on a Concurrent Read, Concurrent Write (CRCW) PRAM model. In the following the priority queue is implemented as a [[skip list]].<ref name = skiplist>{{cite journal |last1= Sundell |first1=Håkan |last2= Tsigas |first2= Philippas |date=2005 |title=Fast and lock-free concurrent priority queues for multi-thread systems |url= https://doi.org/10.1016/j.jpdc.2004.12.005 |journal= Journal of Parallel and Distributed Computing |volume= 65 |issue= 5 |pages= 609–627 |doi= 10.1109/IPDPS.2003.1213189|s2cid=20995116 |citeseerx= 10.1.1.67.1310 }}</ref><ref>{{citation|surname1=Lindén, Jonsson|periodical=Technical Report 2018-003|title=A Skiplist-Based Concurrent Priority Queue with Minimal Memory Contention|date=2013|language=de|url=http://www.it.uu.se/research/publications/reports/2018-003/
}}</ref> In addition, an atomic synchronization primitive, [[Compare-and-swap|CAS]], is used to make the skip list [[Lock (computer science)|lock]]-free. The nodes of the skip list consists of a unique key, a priority, an [[array data structure|array]] of [[Pointer (computer programming)|pointers]], for each level, to the next nodes and a ''delete'' mark. The ''delete'' mark marks if the node is about to be deleted by a process. This ensures that other processes can react to the deletion appropriately.
*''insert(e)'': First, a new node with a key and a priority is created. In addition, the node is assigned a number of levels, which dictates the size of the array of pointers. Then a search is performed to find the correct position where to insert the new node. The search starts from the first node and from the highest level. Then the skip list is traversed down to the lowest level until the correct position is found. During the search, for every level the last traversed node will be saved as parent node for the new node at that level. In addition, the node to which the pointer, at that level, of the parent node points towards, will be saved as the successor node of the new node at that level. Afterwards, for every level of the new node, the pointers of the parent node will be set to the new node. Finally, the pointers, for every level, of the new node will be set to the corresponding successor nodes.
*''extract-min'': First, the skip list is traversed until a node is reached whose ''delete'' mark is not set. This ''delete'' mark is than set to true for that node. Finally the pointers of the parent nodes of the deleted node are updated.

If the concurrent access to a priority queue is allowed, conflicts may arise between two processes. For example, a conflict arises if one process is trying to insert a new node, but at the same time another process is about to delete the predecessor of that node.<ref name = skiplist/> There is a risk that the new node is added to the skip list, yet it is not longer reachable. ([[:File:concurrent_prio_queue_conflict.svg|See image]])

=== ''K''-element operations ===
In this setting, operations on a priority queue is generalized to a batch of <math display="inline">k</math> elements.
For instance, ''k_extract-min'' deletes the <math display="inline">k</math> smallest elements of the priority queue and returns those. 

In a [[Parallel programming model|shared-memory setting]], the parallel priority queue can be easily implemented using parallel [[binary search trees]] and [[join-based tree algorithms]].  In particular, ''k_extract-min'' corresponds to a ''split'' on the binary search tree that has <math display="inline">O(\log n)</math> cost and yields a tree that contains the <math display="inline">k</math> smallest elements.  ''k_insert'' can be applied by a ''union'' of the original priority queue and the batch of insertions.  If the batch is already sorted by the key, ''k_insert'' has <math display="inline">O(k\log (1+\frac{n}{k}))</math> cost.  Otherwise, we need to first sort the batch, so the cost will be <math display="inline">O(k\log (1+\frac{n}{k})+k\log k)=O(k\log n)</math>.  Other operations for priority queue can be applied similarly.  For instance, ''k_decrease-key'' can be done by first applying ''difference'' and then ''union'', which first deletes the elements and then inserts them back with the updated keys.  All these operations are highly parallel, and the theoretical and practical efficiency can be found in related research papers.<ref name="join-based">{{citation
 | last1 = Blelloch | first1 = Guy E.
 | last2 = Ferizovic | first2 = Daniel
 | last3 = Sun | first3 = Yihan
 | contribution = Just Join for Parallel Ordered Sets
 | doi = 10.1145/2935764.2935768
 | isbn = 978-1-4503-4210-0
 | pages = 253–264
 | publisher = ACM
 | title = Symposium on Parallel Algorithms and Architectures, Proc. of 28th ACM Symp. Parallel Algorithms and Architectures (SPAA 2016)
 | year = 2016 | arxiv = 1602.02120
 | s2cid = 2897793
 }}</ref><ref name="pam">{{citation
 | last1 = Blelloch | first1 = Guy E.
 | last2 = Ferizovic | first2 = Daniel
 | last3 = Sun | first3 = Yihan
 | contribution = PAM: parallel augmented maps
 | pages = 290–304
 | title = Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
 | publisher = ACM
 | year = 2018}}</ref>

The rest of this section discusses a queue-based algorithm on distributed memory. We assume each processor has its own local memory and a local (sequential) priority queue. The elements of the global (parallel) priority queue are distributed across all processors.

[[File:BulkDeletionPQ.svg|thumb|''k_extract-min'' is executed on a priority queue with three processors. The green elements are returned and removed from the priority queue.]]

A ''k_insert'' operation assigns the elements uniformly random to the processors which insert the elements into their local queues. Note that single elements can still be inserted into the queue. Using this strategy the global smallest elements are in the union of the local smallest elements of every processor with high probability. Thus each processor holds a representative part of the global priority queue.

This property is used when ''k_extract-min'' is executed, as the smallest <math display="inline">m</math> elements of each local queue are removed and collected in a result set. The elements in the result set are still associated with their original processor. The number of elements <math display="inline">m</math> that is removed from each local queue depends on <math display="inline">k</math> and the number of processors <math display="inline">p</math>. 
<ref name=AlgToolbox>{{cite book | first1=Peter | last1=Sanders | first2=Kurt | last2=Mehlhorn | first3=Martin | last3=Dietzfelbinger | first4=Roman | last4= Dementiev | title=Sequential and Parallel Algorithms and Data Structures - The Basic Toolbox | publisher=Springer International Publishing | date=2019 | pages=226–229 | doi=10.1007/978-3-030-25209-0| isbn=978-3-030-25208-3 | s2cid=201692657 }}</ref>
By parallel selection the <math display="inline">k</math> smallest elements of the result set are determined. With high probability these are the global <math display="inline">k</math> smallest elements. If not, <math display="inline">m</math> elements are again removed from each local queue and put into the result set. This is done until the global <math display="inline">k</math> smallest elements are in the result set. Now these <math display="inline">k</math> elements can be returned. All other elements of the result set are inserted back into their local queues. The running time of ''k_extract-min'' is expected <math display="inline"> O(\frac{k}{p} \log(n)) </math>, where <math display="inline">k = \Omega(p \cdot \log (p))</math> and <math display="inline">n</math> is the size of the priority queue.<ref name="AlgToolbox"/>

The priority queue can be further improved by not moving the remaining elements of the result set directly back into the local queues after a ''k_extract-min'' operation. This saves moving elements back and forth all the time between the result set and the local queues.

By removing several elements at once a considerable speedup can be reached. But not all algorithms can use this kind of priority queue. Dijkstra's algorithm for example can not work on several nodes at once. The algorithm takes the node with the smallest distance from the priority queue and calculates new distances for all its neighbor nodes. If you would take out <math display="inline">k</math> nodes, working at one node could change the distance of another one of the <math display="inline">k</math> nodes. So using k-element operations destroys the label setting property of Dijkstra's algorithm.