Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
External sorting
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Performance == The Sort Benchmark, created by computer scientist [[Jim Gray (computer scientist)|Jim Gray]], compares external sorting algorithms implemented using finely tuned hardware and software. Winning implementations use several techniques: * '''Using parallelism''' ** Multiple disk drives can be used in parallel in order to improve sequential read and write speed. This can be a very cost-efficient improvement: a Sort Benchmark winner in the cost-centric Penny Sort category uses six hard drives in an otherwise midrange machine.<ref>Nikolas Askitis, [http://sortbenchmark.org/ozsort-2010.pdf OzSort 2.0: Sorting up to 252GB for a Penny]</ref> ** Sorting software can use [[Thread (computer science)|multiple threads]], to speed up the process on modern multicore computers. ** Software can use [[asynchronous I/O]] so that one run of data can be sorted or merged while other runs are being read from or written to disk. ** Multiple machines connected by fast network links can each sort part of a huge dataset in parallel.<ref>Rasmussen et al., [http://sortbenchmark.org/tritonsort_2010_May_15.pdf TritonSort]</ref> * '''Increasing hardware speed''' ** Using more RAM for sorting can reduce the number of disk seeks and avoid the need for more passes. ** Fast external memory like [[solid-state drives]] can speed sorts, either if the data is small enough to fit entirely on SSDs or, more rarely, to accelerate sorting SSD-sized chunks in a three-pass sort. ** ''Many'' other factors can affect hardware's maximum sorting speed: CPU speed and number of cores, RAM access latency, input/output bandwidth, disk read/write speed, disk seek time, and others. "Balancing" the hardware to minimize bottlenecks is an important part of designing an efficient sorting system. ** Cost-efficiency as well as absolute speed can be critical, especially in cluster environments where lower node costs allow purchasing more nodes. * '''Increasing software speed''' ** Some Sort Benchmark entrants use a variation on [[radix sort]] for the first phase of sorting: they separate data into one of many "bins" based on the beginning of its value. Sort Benchmark data is random and especially well-suited to this optimization. ** Compacting the input, intermediate files, and output can reduce time spent on I/O, but is not allowed in the Sort Benchmark. ** Because the Sort Benchmark sorts long (100-byte) records using short (10-byte) keys, sorting software sometimes rearranges the keys separately from the values to reduce memory I/O volume.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)