Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Selection algorithm
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
===Pivoting=== Many methods for selection are based on choosing a special "pivot" element from the input, and using comparisons with this element to divide the remaining <math>n-1</math> input values into two subsets: the set <math>L</math> of elements less than the pivot, and the set <math>R</math> of elements greater than the pivot. The algorithm can then determine where the {{nowrap|<math>k</math>th}} smallest value is to be found, based on a comparison of <math>k</math> with the sizes of these sets. In particular, {{nowrap|if <math>k\le|L|</math>,}} the {{nowrap|<math>k</math>th}} smallest value is {{nowrap|in <math>L</math>,}} and can be found recursively by applying the same selection algorithm {{nowrap|to <math>L</math>.}} {{nowrap|If <math>k=|L|+1</math>,}} then the {{nowrap|<math>k</math>th}} smallest value is the pivot, and it can be returned immediately. In the remaining case, the {{nowrap|<math>k</math>th}} smallest value is {{nowrap|in <math>R</math>,}} and more specifically it is the element in position <math>k-|L|-1</math> {{nowrap|of <math>R</math>.}} It can be found by applying a selection algorithm recursively, seeking the value in this position {{nowrap|in <math>R</math>.{{r|kletar}}}} As with the related pivoting-based [[quicksort]] algorithm, the partition of the input into <math>L</math> and <math>R</math> may be done by making new collections for these sets, or by a method that partitions a given list or array data type in-place. Details vary depending on how the input collection is {{nowrap|represented.<ref>For instance, Cormen et al. use an in-place array partition, while Kleinberg and Tardos describe the input as a set and use a method that partitions it into two new sets.</ref>}} The time to compare the pivot against all the other values {{nowrap|is <math>O(n)</math>.{{r|kletar}}}} However, pivoting methods differ in how they choose the pivot, which affects how big the subproblems in each recursive call will be. The efficiency of these methods depends greatly on the choice of the pivot. If the pivot is chosen badly, the running time of this method can be as slow {{nowrap|as <math>O(n^2)</math>.{{r|erickson}}}} *If the pivot were exactly at the median of the input, then each recursive call would have at most half as many values as the previous call, and the total times would add in a [[geometric series]] {{nowrap|to <math>O(n)</math>.}} However, finding the median is itself a selection problem, on the entire original input. Trying to find it by a recursive call to a selection algorithm would lead to an infinite recursion, because the problem size would not decrease in each {{nowrap|call.{{r|kletar}}}} *[[Quickselect]] chooses the pivot uniformly at random from the input values. It can be described as a [[prune and search]] algorithm,{{r|gootam}} a variant of [[quicksort]], with the same pivoting strategy, but where quicksort makes two recursive calls to sort the two subcollections <math>L</math> {{nowrap|and <math>R</math>,}} quickselect only makes one of these two calls. Its [[expected time]] {{nowrap|is <math>O(n)</math>.{{r|clrs|kletar|gootam}}}} For any constant <math>C</math>, the probability that its number of comparisons exceeds <math>Cn</math> is superexponentially small {{nowrap|in <math>C</math>.{{r|devroye}}}} *The [[Floyd–Rivest algorithm]], a variation of quickselect, chooses a pivot by randomly sampling a subset of <math>r</math> data values, for some sample {{nowrap|size <math>r</math>,}} and then recursively selecting two elements somewhat above and below position <math>rk/n</math> of the sample to use as pivots. With this choice, it is likely that <math>k</math> is sandwiched between the two pivots, so that after pivoting only a small number of data values between the pivots are left for a recursive call. This method can achieve an expected number of comparisons that is {{nowrap|<math>n+\min(k,n-k)+o(n)</math>.{{r|floriv}}}} In their original work, Floyd and Rivest claimed that the <math>o(n)</math> term could be made as small as <math>O(\sqrt n)</math> by a recursive sampling scheme, but the correctness of their analysis has been {{nowrap|questioned.{{r|brown|prt}}}} Instead, more rigorous analysis has shown that a version of their algorithm achieves <math>O(\sqrt{n\log n})</math> for this {{nowrap|term.{{r|knuth}}}} Although the usual analysis of both quickselect and the Floyd–Rivest algorithm assumes the use of a [[true random number generator]], a version of the Floyd–Rivest algorithm using a [[pseudorandom number generator]] seeded with only logarithmically many true random bits has been proven to run in linear time with high probability.{{r|karrag}} [[File:Mid-of-mid.png|thumb|upright=1.35|Visualization of pivot selection for the [[median of medians]] method. Each set of five elements is shown as a column of dots in the figure, sorted in increasing order from top to bottom. If their medians (the green and purple dots in the middle row) are sorted in increasing order from left to right, and the median of medians is chosen as the pivot, then the <math>3n/10</math> elements in the upper left quadrant will be less than the pivot, and the <math>3n/10</math> elements in the lower right quadrant will be greater than the pivot, showing that many elements will be eliminated by pivoting.]] *The [[median of medians]] method partitions the input into sets of five elements, and uses some other non-recursive method to find the median of each of these sets in constant time per set. It then recursively calls itself to find the median of these <math>n/5</math> medians. Using the resulting median of medians as the pivot produces a partition with {{nowrap|<math>\max(|L|,|R|)\le 7n/10</math>.}} Thus, a problem on <math>n</math> elements is reduced to two recursive problems on <math>n/5</math> elements (to find the pivot) and at most <math>7n/10</math> elements (after the pivot is used). The total size of these two recursive subproblems is at {{nowrap|most <math>9n/10</math>,}} allowing the total time to be analyzed as a geometric series adding {{nowrap|to <math>O(n)</math>.}} Unlike quickselect, this algorithm is deterministic, not {{nowrap|randomized.{{r|clrs|erickson|bfprt}}}} It was the first linear-time deterministic selection algorithm {{nowrap|known,{{r|bfprt}}}} and is commonly taught in undergraduate algorithms classes as an example of a [[Divide-and-conquer algorithm|divide and conquer]] that does not divide into two equal {{nowrap|subproblems.{{r|clrs|erickson|gootam|gurwitz}}}} However, the high constant factors in its <math>O(n)</math> time bound make it slower than quickselect in {{nowrap|practice,{{r|skiena|gootam}}}} and slower even than sorting for inputs of moderate {{nowrap|size.{{r|erickson}}}} *Hybrid algorithms such as [[introselect]] can be used to achieve the practical performance of quickselect with a fallback to medians of medians guaranteeing worst-case <math>O(n)</math> {{nowrap|time.{{r|musser}}}}
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)