Editing Heapsort (section)

== Variations ==
=== Williams' heap construction ===
{{see also|Binary heap#Building a heap}}
The description above uses Floyd's improved heap-construction algorithm, which operates in {{math|''O''(''n'')}} time and uses the same {{code|siftDown}} primitive as the heap-extraction phase.  Although this algorithm, being both faster and simpler to program, is used by all practical heapsort implementations, Williams' original algorithm may be easier to understand, and is needed to implement a more general binary heap [[priority queue]].

Rather than merging many small heaps, Williams' algorithm maintains one single heap at the front of the array and repeatedly appends an additional element using a {{code|siftUp}} primitive.  Being at the end of the array, the new element is a leaf and has no children to worry about, but may violate the heap property by being greater than its parent.  In this case, exchange it with its parent and repeat the test until the parent is greater or there is no parent (we have reached the root).  In pseudocode, this is:
  
  '''procedure''' siftUp(a, end) '''is'''
      '''input:''' ''a is the array, which heap-ordered up to end-1.''
             ''end is the node to sift up.''
      '''while''' end > 0
          parent := iParent(end)
          '''if''' a[parent] < a[end] '''then''' ''(out of max-heap order)''
              swap(a[parent], a[end])
              end := parent          ''(continue sifting up)''
          '''else'''
              '''return'''
     
  '''procedure''' heapify(a, count) '''is'''
      ''(start with a trivial single-element heap)''
      end := 1
      
      '''while''' end < count
          ''(sift up the node at index end to the proper place such that''
          '' all nodes above the end index are in heap order)''
          siftUp(a, end)
          end := end + 1
      ''(after sifting up the last node all nodes are in heap order)''

[[File:Binary heap bottomup vs topdown.svg|thumb|right|Difference in time complexity between the "siftDown" version and the "siftUp" version.]]
To understand why this algorithm can take asymptotically more time to build a heap ({{math|''O''(''n'' log ''n'')}} vs. {{math|''O''(''n'')}} worst case), note that in Floyd's algorithm, almost all the calls to {{code|siftDown}} operations apply to very ''small'' heaps.  Half the heaps are height-1 trivial heaps and can be skipped entirely, half of the remainder are height-2, and so on.  Only two calls are on heaps of size {{math|''n''/2}}, and only one {{code|siftDown}} operation is done on the full {{mvar|n}}-element heap.  The overall average {{code|siftDown}} operation takes {{math|''O''(1)}} time.

In contrast, in Williams' algorithm most of the calls to {{code|siftUp}} are made on ''large'' heaps of height {{math|''O''(log ''n'')}}.  Half of the calls are made with a heap size of {{math|''n''/2}} or more, three-quarters are made with a heap size of {{math|''n''/4}} or more, and so on.  Although the ''average'' number of steps is similar to Floyd's technique, {{r|Bojesen00|p=3}} pre-sorted input will cause the worst case: each added node is sifted up to the root, so the average call to {{code|siftUp}} will require approximately {{math|1=(log{{sub|2}}''n'' − 1)/2 + (log{{sub|2}}''n'' − 2)/4  + (log{{sub|2}}''n'' − 3)/8 + ⋯}} {{math|1== log{{sub|2}}''n'' − (1 + 1/2 + 1/4 + ⋯)}} {{math|1== log{{sub|2}}''n'' − 2}} iterations.

Because it is dominated by the second heap-extraction phase, the heapsort algorithm itself has {{math|''O''(''n'' log ''n'')}} time complexity using either version of heapify.

=== Bottom-up heapsort ===
Bottom-up heapsort is a variant that reduces the number of comparisons required by a significant factor. While ordinary "top-down" heapsort requires {{math|2''n'' log<sub>2</sub> ''n'' + ''O''(''n'')}} comparisons worst-case and on average,{{r|Wegener}} the bottom-up variant requires {{math|''n'' log<sub>2</sub>''n'' + ''O''(1)}} comparisons on average,{{r|Wegener}} and {{math|1.5''n'' log<sub>2</sub>''n'' + ''O''(''n'')}} in the worst case.{{r|fleischer}}

If comparisons are cheap (e.g. integer keys) then the difference is unimportant,{{r|Melhorn}} as top-down heapsort compares values that have already been loaded from memory. If, however, comparisons require a [[function call]] or other complex logic, then bottom-up heapsort is advantageous.

This is accomplished by using a more elaborate {{code|siftDown}} procedure. The change improves the linear-time heap-building phase slightly,{{r|McDiarmid}} but is more significant in the second phase. Like top-down heapsort, each iteration of the second phase extracts the top of the heap, {{code|a[0]}}, and fills the gap it leaves with {{code|a[end]}}, then sifts this latter element down the heap. But this element came from the lowest level of the heap, meaning it is one of the smallest elements in the heap, so the sift-down will likely take many steps to move it back down.<ref name=MacKay05>{{cite web
 |title=Heapsort, Quicksort, and Entropy
 |first=David J. C. |last=MacKay |author-link=David J. C. MacKay
 |date=December 2005
 |url=https://inference.org.uk/mackay/sorting/sorting.html
 |access-date=2021-02-12
}}</ref> In top-down heapsort, each step of {{code|siftDown}} requires two comparisons, to find the minimum of three elements: the new node and its two children.

Bottom-up heapsort conceptually replaces the root with a value of −∞ and sifts it down using only one comparison per level (since no child can possibly be less than −∞) until the leaves are reached, then replaces the −∞ with the correct value and sifts it ''up'' (again, using one comparison per level) until the correct position is found.

This places the root in the same location as top-down {{code|siftDown}}, but fewer comparisons are required to find that location.  For any single {{code|siftDown}} operation, the bottom-up technique is advantageous if the number of downward movements is at least {{frac|2|3}} of the height of the tree (when the number of comparisons is {{frac|4|3}} times the height for both techniques), and it turns out that this is more than true on average, even for worst-case inputs.<ref name="fleischer">{{cite journal |last=Fleischer |first=Rudolf |title=A tight lower bound for the worst case of Bottom-Up-Heapsort |journal=Algorithmica |volume=11 |issue=2 |date=February 1994 |pages=104–115 |doi=10.1007/bf01182770 |url=http://staff.gutech.edu.om/~rudolf/Paper/buh_algorithmica94.pdf |hdl=11858/00-001M-0000-0014-7B02-C|s2cid=21075180 |hdl-access=free }} Also available as {{cite tech report |last=Fleischer |first=Rudolf |title=A tight lower bound for the worst case of Bottom-Up-Heapsort |date=April 1991 |institution=[[Max Planck Institute for Informatics|MPI-INF]] |number=MPI-I-91-104 |url=http://pubman.mpdl.mpg.de/pubman/item/escidoc:1834997:3/component/escidoc:2463941/MPI-I-94-104.pdf}}</ref>

A naïve implementation of this conceptual algorithm would cause some redundant data copying, as the sift-up portion undoes part of the sifting down.  A practical implementation searches downward for a leaf where −∞ ''would'' be placed, then upward for where the root ''should'' be placed.  Finally, the upward traversal continues to the root's starting position, performing no more comparisons but exchanging nodes to complete the necessary rearrangement.  This optimized form performs the same number of exchanges as top-down {{code|siftDown}}.

Because it goes all the way to the bottom and then comes back up, it is called '''heapsort with bounce''' by some authors.<ref>{{cite book |first1=Bernard |last1=Moret |author-link1=Bernard Moret |first2=Henry D. |last2=Shapiro |title=Algorithms from P to NP Volume 1: Design and Efficiency |chapter=8.6 Heapsort |page=528 |publisher=Benjamin/Cummings |year=1991 |isbn=0-8053-8008-6 |quote=For lack of a better name we call this enhanced program 'heapsort with bounce.{{'-}}}}</ref>

 '''function''' leafSearch(a, i, end) '''is'''
     j ← i
     '''while''' iRightChild(j) < end '''do'''
         ''(Determine which of j's two children is the greater)''
         '''if''' a[iRightChild(j)] > a[iLeftChild(j)] '''then'''
             j ← iRightChild(j)
         '''else'''
             j ← iLeftChild(j)
     ''(At the last level, there might be only one child)''
     '''if''' iLeftChild(j) < end '''then'''
         j ← iLeftChild(j)
     '''return''' j

The return value of the <code>leafSearch</code> is used in the modified <code>siftDown</code> routine:<ref name="fleischer"/>

 '''procedure''' siftDown(a, i, end) '''is'''
     j ← leafSearch(a, i, end)
     '''while''' a[i] > a[j] '''do'''
         j ← iParent(j)
     '''while''' j > i '''do'''
         swap(a[i], a[j])
         j ← iParent(j)

Bottom-up heapsort was announced as beating quicksort (with median-of-three pivot selection) on arrays of size ≥16000.{{refn|name=Wegener|{{cite journal |last=Wegener |first=Ingo |author-link=Ingo Wegener |title={{sc|Bottom-Up Heapsort}}, a new variant of {{sc|Heapsort}} beating, on an average, {{sc|Quicksort}} (if {{mvar|n}} is not very small) |journal=Theoretical Computer Science |volume=118 |issue=1 |date=13 September 1993 |pages=81–98 |doi=10.1016/0304-3975(93)90364-y |doi-access=free |url=https://core.ac.uk/download/pdf/82350265.pdf}} Although this is a reprint of work first published in 1990 (at the Mathematical Foundations of Computer Science conference), the technique was published by Carlsson in 1987.<ref name=Carlsson/>}}

A 2008 re-evaluation of this algorithm showed it to be no faster than top-down heapsort for integer keys, presumably because modern [[branch prediction]] nullifies the cost of the predictable comparisons that bottom-up heapsort manages to avoid.<ref name=Melhorn>{{cite book |last1=Mehlhorn |first1=Kurt |author1-link=Kurt Mehlhorn |first2=Peter |last2=Sanders |author2-link=Peter Sanders (computer scientist) |title=Algorithms and Data Structures: The Basic Toolbox |chapter=Priority Queues |publisher=Springer |year=2008 |page=142 |isbn=978-3-540-77977-3 |url=http://people.mpi-inf.mpg.de/~mehlhorn/Toolbox.html |chapter-url=http://people.mpi-inf.mpg.de/~mehlhorn/ftp/Toolbox/PriorityQueues.pdf#page=16}}</ref>

A further refinement does a [[binary search]] in the upward search, and sorts in a worst case of {{math|(''n''+1)(log<sub>2</sub>(''n''+1) + log<sub>2</sub> log<sub>2</sub>(''n''+1) + 1.82) + ''O''(log<sub>2</sub>''n'')}} comparisons, approaching [[Comparison sort#Number of comparisons required to sort a list|the information-theoretic lower bound]] of {{math|''n'' log<sub>2</sub>''n'' − 1.4427''n''}} comparisons.<ref name=Carlsson>{{cite journal |first=Scante |last=Carlsson |title=A variant of heapsort with almost optimal number of comparisons |journal=Information Processing Letters |volume=24 |issue=4 |pages=247–250 |date=March 1987 |doi=10.1016/0020-0190(87)90142-6 |s2cid=28135103 |url=https://pdfs.semanticscholar.org/caec/6682ffd13c6367a8c51b566e2420246faca2.pdf |archive-url=https://web.archive.org/web/20161227055904/https://pdfs.semanticscholar.org/caec/6682ffd13c6367a8c51b566e2420246faca2.pdf |url-status=dead |archive-date=2016-12-27 }}</ref>

A variant that uses two extra bits per internal node (''n''−1 bits total for an ''n''-element heap) to cache information about which child is greater (two bits are required to store three cases: left, right, and unknown)<ref name=McDiarmid>{{Cite journal |title=Building heaps fast |last1=McDiarmid |first1=C. J. H. |last2=Reed |first2=B. A. |date=September 1989 |journal=Journal of Algorithms |volume=10 |issue=3 |pages=352–365 |doi=10.1016/0196-6774(89)90033-3 |url=http://cgm.cs.mcgill.ca/~breed/2016COMP610/BUILDINGHEAPSFAST.pdf}}</ref> uses less than {{math|''n'' log<sub>2</sub>''n'' + 1.1''n''}} compares.<ref>{{cite journal |title=The worst-case complexity of McDiarmid and Reed's variant of {{sc|Bottom-Up Heapsort}} is less than ''n''&nbsp;log&nbsp;''n'' + 1.1''n'' |first=Ingo |last=Wegener |author-link=Ingo Wegener |journal=Information and Computation |volume=97 |issue=1 |pages=86–96 |date=March 1992 |doi=10.1016/0890-5401(92)90005-Z |doi-access=free}}</ref>

=== Other variations ===
*Ternary heapsort uses a [[ternary heap]] instead of a binary heap; that is, each element in the heap has three children. It is more complicated to program but does a constant number of times fewer swap and comparison operations. This is because each sift-down step in a ternary heap requires three comparisons and one swap, whereas in a binary heap, two comparisons and one swap are required. Two levels in a ternary heap cover 3<sup>2</sup> = 9 elements, doing more work with the same number of comparisons as three levels in the binary heap, which only cover 2<sup>3</sup> = 8.{{Citation needed|date=September 2014}} This is primarily of academic interest, or as a student exercise,<ref>{{cite book
 |title=Data Structures Using Pascal
 |chapter=Chapter 8: Sorting
 |first1=Aaron M. |last1=Tenenbaum
 |first2=Moshe J. |last2=Augenstein
 |year=1981
 |page=405
 |publisher=Prentice-Hall
 |isbn=0-13-196501-8
 |quote=<!--Exercise 8. Define an almost complete ternary tree as a tree in which every node has at most three sons and such that the nodes can be numbered from 1 to n so that the sons of node[i] are node[3*i−l], node[3*i] and node[3*i+l]. Define a ternary heap as an almost complete ternary tree in which the content of each node is greater than or equal to the contents of all its descendants. -->Write a sorting routine similar to the heapsort except that it uses a ternary heap.
}}</ref> as the additional complexity is not worth the minor savings, and bottom-up heapsort beats both.
*Memory-optimized heapsort{{r|LaMarca99|p=87}} improves heapsort's [[locality of reference]] by increasing the number of children even more. This increases the number of comparisons, but because all children are stored consecutively in memory, reduces the number of [[cache line]]s accessed during heap traversal, a net performance improvement.
*The standard implementation of Floyd's heap-construction algorithm causes a large number of [[cache miss]]es once the size of the data exceeds that of the [[CPU cache]].{{r|LaMarca99|p=87}} Better performance on large data sets can be obtained by merging in [[depth-first]] order, combining subheaps as soon as possible, rather than combining all subheaps on one level before proceeding to the one above.<ref name=Bojesen00>{{cite journal |title=Performance Engineering Case Study: Heap Construction |first1=Jesper |last1=Bojesen |first2=Jyrki |last2=Katajainen |first3=Maz |last3=Spork |journal=ACM Journal of Experimental Algorithmics |date=2000 |volume=5 |pages=15–es |number=15 |doi=10.1145/351827.384257 |citeseerx=10.1.1.35.3248 |s2cid=30995934 |url=http://hjemmesider.diku.dk/~jyrki/Paper/katajain.ps |format=PostScript}} [https://www.semanticscholar.org/paper/Performance-Engineering-Case-Study-Heap-Bojesen-Katajainen/6f4ada5912c1da64e16453d67ec99c970173fb5b Alternate PDF source].</ref><ref>{{cite conference |chapter=In-place Heap Construction with Optimized Comparisons, Moves, and Cache Misses |first1=Jingsen |last1=Chen |first2=Stefan |last2=Edelkamp |first3=Amr |last3=Elmasry |first4=Jyrki |last4=Katajainen |title=Mathematical Foundations of Computer Science 2012 |series=Lecture Notes in Computer Science |doi=10.1007/978-3-642-32589-2_25 |conference=37th international conference on Mathematical Foundations of Computer Science |pages=259–270 |location=Bratislava, Slovakia |date=27–31 August 2012 |volume=7464 |isbn=978-3-642-32588-5 |s2cid=1462216 |chapter-url=https://pdfs.semanticscholar.org/9cc6/36d7998d58b3937ba0098e971710ff039612.pdf |archive-url=https://web.archive.org/web/20161229031307/https://pdfs.semanticscholar.org/9cc6/36d7998d58b3937ba0098e971710ff039612.pdf |url-status=dead |archive-date=29 December 2016 }} See particularly Fig. 3.</ref>
*Out-of-place heapsort<ref>{{cite conference
 |title=QuickHeapsort, an efficient mix of classical sorting algorithms
 |first1=Domenico |last1=Cantone
 |first2=Gianluca |last2=Concotti
 |conference=4th Italian Conference on Algorithms and Complexity
 |series=Lecture Notes in Computer Science |volume=1767
 |pages=150–162
 |isbn=3-540-67159-5
 |date=1–3 March 2000
 |location=Rome
 |url=https://archive.org/details/springer_10.1007-3-540-46521-9
}}</ref><ref>{{cite journal
 |title=QuickHeapsort, an efficient mix of classical sorting algorithms
 |first1=Domenico |last1=Cantone
 |first2=Gianluca |last2=Concotti
 |journal=Theoretical Computer Science
 |volume=285 |issue=1 |pages=25–42
 |date=August 2002
 |doi=10.1016/S0304-3975(01)00288-2 |doi-access=free
 |zbl=1016.68042
 |url=https://core.ac.uk/download/pdf/81957449.pdf
}}</ref>{{r|MacKay05}} improves on bottom-up heapsort by eliminating the worst case, guaranteeing {{math|''n'' log<sub>2</sub>''n'' + ''O''(''n'')}} comparisons. When the maximum is taken, rather than fill the vacated space with an unsorted data value, fill it with a {{math|−∞}} sentinel value, which never "bounces" back up. It turns out that this can be used as a primitive in an in-place (and non-recursive) "QuickHeapsort" algorithm.<ref>{{cite journal
 |title=QuickHeapsort: Modifications and improved analysis
 |first1=Volker |last1=Diekert
 |first2=Armin |last2=Weiß
 |journal= Theory of Computing Systems
 |volume=59 |issue=2 |pages=209–230
 |date=August 2016
 |doi=10.1007/s00224-015-9656-y <!-- |doi=10.1007/978-3-642-38536-0_3 Conference version, published September 2012-->
 |arxiv=1209.4214
|s2cid=792585 }}</ref> First, you perform a quicksort-like partitioning pass, but reversing the order of the partitioned data in the array. Suppose ([[without loss of generality]]) that the smaller partition is the one greater than the pivot, which should go at the end of the array, but our reversed partitioning step places it at the beginning. Form a heap out of the smaller partition and do out-of-place heapsort on it, exchanging the extracted maxima with values from the end of the array. These are less than the pivot, meaning less than any value in the heap, so serve as {{math|−∞}} sentinel values. Once the heapsort is complete (and the pivot moved to just before the now-sorted end of the array), the order of the partitions has been reversed, and the larger partition at the beginning of the array may be sorted in the same way. (Because there is no non-[[tail recursion]], this also eliminates quicksort's {{math|''O''(log ''n'')}} stack usage.)
*The [[smoothsort]] algorithm<ref>{{Cite EWD|796a|Smoothsort – an alternative to sorting in situ}}</ref> is a variation of heapsort developed by [[Edsger W. Dijkstra]] in 1981. Like heapsort, smoothsort's upper bound is {{math|''O''(''n'' log ''n'')}}. The advantage of smoothsort is that it comes closer to {{math|''O''(''n'')}} time if the [[Adaptive sort|input is already sorted to some degree]], whereas heapsort averages {{math|''O''(''n'' log ''n'')}} regardless of the initial sorted state. Due to its complexity, smoothsort is rarely used.{{citation needed|date=November 2016}}
*Levcopoulos and Petersson<ref>{{cite book
 | last1 = Levcopoulos | first1 = Christos
 | last2 = Petersson | first2 = Ola
 | chapter = Heapsort—Adapted for Presorted Files
 | location = London, UK
 | pages = 499–509
 | publisher = Springer-Verlag
 | series = Lecture Notes in Computer Science
 | title = WADS '89: Proceedings of the Workshop on Algorithms and Data Structures
 | volume = 382 | doi = 10.1007/3-540-51542-9_41
 | year = 1989| isbn = 978-3-540-51542-5
 }} {{Q|56049336}}.</ref> describe a variation of heapsort based on a heap of [[Cartesian tree]]s. First, a Cartesian tree is built from the input in {{math|''O''(''n'')}} time, and its root is placed in a 1-element binary heap. Then we repeatedly extract the minimum from the binary heap, output the tree's root element, and add its left and right children (if any) which are themselves Cartesian trees, to the binary heap.<ref>{{cite web
 |title=CartesianTreeSort.hh
 |website=Archive of Interesting Code
 |url=http://www.keithschwarz.com/interesting/code/?dir=cartesian-tree-sort
 |first=Keith |last=Schwartz
 |date=27 December 2010 |access-date=2019-03-05
}}</ref> As they show, if the input is already nearly sorted, the Cartesian trees will be very unbalanced, with few nodes having left and right children, resulting in the binary heap remaining small, and allowing the algorithm to sort more quickly than {{math|''O''(''n'' log ''n'')}} for inputs that are already nearly sorted.
* Several variants such as [[Weak heap#Weak-heap sort|weak heapsort]] require {{math|''n'' log<sub>2</sub> ''n''+''O''(n)}} comparisons in the worst case, close to the theoretical minimum, using one extra bit of state per node. While this extra bit makes the algorithms not truly in-place, if space for it can be found inside the element, these algorithms are simple and efficient,{{r|Bojesen00|p=40}} but still slower than binary heaps if key comparisons are cheap enough (e.g. integer keys) that a constant factor does not matter.<ref name=Kat2014-11-14P>{{cite conference |title=Seeking for the best priority queue: Lessons learnt |first=Jyrki |last=Katajainen |date=23 September 2013 |conference=Algorithm Engineering (Seminar 13391) |location=Dagstuhl |pages=19–20, 24 |url=http://hjemmesider.diku.dk/~jyrki/Myris/Kat2013-09-23P.html}}</ref>
* Katajainen's "ultimate heapsort" requires no extra storage, performs {{math|''n'' log<sub>2</sub> ''n''+''O''(n)}} comparisons, and a similar number of element moves.<ref>{{cite conference |title=The Ultimate Heapsort |date=2–3 February 1998 |first=Jyrki |last=Katajainen |conference=Computing: the 4th Australasian Theory Symposium |url=http://hjemmesider.diku.dk/~jyrki/Myris/Kat1998C.html |journal=Australian Computer Science Communications |volume=20 |issue=3 |pages=87–96 |location=Perth }}</ref> It is, however, even more complex and not justified unless comparisons are very expensive.