Editing Merge sort (section)

==Use with tape drives==
[[File:IBM 729 Tape Drives.nasa.jpg|thumb|Merge sort type algorithms allowed large data sets to be sorted on early computers that had small random access memories by modern standards. Records were stored on [[magnetic tape]] and processed on banks of magnetic tape drives, such as these [[IBM 729]]s.]]

An [[External sorting|external]] merge sort is practical to run using [[disk storage|disk]] or [[tape drive|tape]] drives when the data to be sorted is too large to fit into [[primary storage|memory]]. [[External sorting]] explains how merge sort is implemented with disk drives. A typical tape drive sort uses four tape drives. All I/O is sequential (except for rewinds at the end of each pass). A minimal implementation can get by with just two record buffers and a few program variables.

Naming the four tape drives as A, B, C, D, with the original data on A, and using only two record buffers, the algorithm is similar to [[#Bottom-up_implementation|the bottom-up implementation]], using pairs of tape drives instead of arrays in memory. The basic algorithm can be described as follows:

# Merge pairs of records from A; writing two-record sublists alternately to C and D.
# Merge two-record sublists from C and D into four-record sublists; writing these alternately to A and B.
# Merge four-record sublists from A and B into eight-record sublists; writing these alternately to C and D
# Repeat until you have one list containing all the data, sorted—in log<sub>2</sub>(''n'') passes.

Instead of starting with very short runs, usually a [[hybrid algorithm]] is used, where the initial pass will read many records into memory, do an internal sort to create a long run, and then distribute those long runs onto the output set. The step avoids many early passes. For example, an internal sort of 1024 records will save nine passes. The internal sort is often large because it has such a benefit. In fact, there are techniques that can make the initial runs longer than the available internal memory. One of them, the Knuth's 'snowplow' (based on a [[binary heap|binary min-heap]]), generates runs twice as long (on average) as a size of memory used.<ref>{{citation| last=Ferragina| first=Paolo| title=The magic of Algorithms!| chapter=5. Sorting Atomic Items| page=5-4| chapter-url=http://didawiki.cli.di.unipi.it/lib/exe/fetch.php/magistraleinformaticanetworking/ae/ae2018/ferragina_notes_algoeng_2019_vers_5_.pdf| date=2009–2019| archive-url=https://web.archive.org/web/20210512230253/http://didawiki.cli.di.unipi.it/lib/exe/fetch.php/magistraleinformaticanetworking/ae/ae2018/ferragina_notes_algoeng_2019_vers_5_.pdf| archive-date=2021-05-12| url-status=live}}</ref>

With some overhead, the above algorithm can be modified to use three tapes. ''O''(''n'' log ''n'') running time can also be achieved using two [[queue (abstract data type)|queues]], or a [[stack (abstract data type)|stack]] and a queue, or three stacks. In the other direction, using ''k'' > two tapes (and ''O''(''k'') items in memory), we can reduce the number of tape operations in ''O''(log ''k'') times by using a [[k-way merge algorithm|k/2-way merge]].

A more sophisticated merge sort that optimizes tape (and disk) drive usage is the [[polyphase merge sort]].