Editing B-tree (section)

===Sorted file search time===
Sorting and searching algorithms can be characterized by the number of comparison operations that must be performed using [[Big O notation|order notation]]. A [[binary search]] of a sorted table with {{mvar|N}} records, for example, can be done in roughly {{math|⌈ log<sub>2</sub> ''N'' ⌉}} comparisons. If the table had 1,000,000 records, then a specific record could be located with at most 20 comparisons: {{math|1=⌈ log<sub>2</sub> (1,000,000) ⌉ = 20}}.

Large databases have historically been kept on disk drives. The time to read a record on a disk drive far exceeds the time needed to compare keys once the record is available due to [[seek time]] and a rotational delay. The seek time may be 0 to 20 or more milliseconds, and the rotational delay averages about half the rotation period. For a 7200 RPM drive, the rotation period is 8.33 milliseconds. For a drive such as the Seagate ST3500320NS, the track-to-track seek time is 0.8 milliseconds and the average reading seek time is 8.5 milliseconds.<ref>{{cite book |publisher=Seagate Technology LLC |title=Product Manual: Barracuda ES.2 Serial ATA, Rev. F., publication 100468393 |date=2008 |url=http://www.seagate.com/staticfiles/support/disc/manuals/NL35%20Series%20&%20BC%20ES%20Series/Barracuda%20ES.2%20Series/100468393f.pdf |page=6}}</ref> For simplicity, assume reading from disk takes about 10 milliseconds.

The time to locate one record out of a million in the example above would take 20 disk reads times 10 milliseconds per disk read, which is 0.2 seconds.

The search time is reduced because individual records are grouped together in a disk '''block'''. A disk block might be 16 kilobytes. If each record is 160 bytes, then 100 records could be stored in each block. The disk read time above was actually for an entire block. Once the disk head is in position, one or more disk blocks can be read with little delay. With 100 records per block, the last 6 or so comparisons don't need to do any disk reads—the comparisons are all within the last disk block read.

To speed up the search further, the time to do the first 13 to 14 comparisons (which each required a disk access) must be reduced.