Editing R-tree (section)

==== Splitting an overflowing node ====
Since redistributing all objects of a node into two nodes has an exponential number of options, a heuristic needs to be employed to find the best split. In the classic R-tree, Guttman proposed two such heuristics, called QuadraticSplit and LinearSplit. In quadratic split, the algorithm searches for the pair of rectangles that is the worst combination to have in the same node, and puts them as initial objects into the two new groups. It then searches for the entry which has the strongest preference for one of the groups (in terms of area increase) and assigns the object to this group until all objects are assigned (satisfying the minimum fill).

There are other splitting strategies such as Greene's Split,<ref name="greene">{{Cite book | last1 = Greene | first1 = D. | chapter = An implementation and performance analysis of spatial data access methods | doi = 10.1109/ICDE.1989.47268 | title = [1989] Proceedings. Fifth International Conference on Data Engineering | pages = 606–615 | year = 1989 | isbn = 978-0-8186-1915-1 | s2cid = 7957624 }}</ref> the [[R*-tree]] splitting heuristic<ref name="rstar">{{Cite book | last1 = Beckmann | first1 = N. | last2 = Kriegel | first2 = H. P. | author-link2 = Hans-Peter Kriegel| last3 = Schneider | first3 = R. | last4 = Seeger | first4 = B. | chapter = The R*-tree: an efficient and robust access method for points and rectangles | doi = 10.1145/93597.98741 | title = Proceedings of the 1990 ACM SIGMOD international conference on Management of data – SIGMOD '90 | pages = 322 | year = 1990 | isbn = 978-0897913652 | chapter-url = http://dbs.mathematik.uni-marburg.de/publications/myPapers/1990/BKSS90.pdf| citeseerx = 10.1.1.129.3731 | s2cid = 11567855 }}</ref> (which again tries to minimize overlap, but also prefers quadratic pages) or the linear split algorithm proposed by Ang and Tan<ref name="ang-tan">{{cite conference | last1= Ang | first1 = C. H. | last2 = Tan | first2 = T. C. | editor1-first = Michel | editor1-last = Scholl | editor2-first = Agnès | editor2-last = Voisard | year = 1997 | title = New linear node splitting algorithm for R-trees | book-title = Proceedings of the 5th International Symposium on Advances in Spatial Databases (SSD '97), Berlin, Germany, July 15–18, 1997 | volume = 1262 |series=Lecture Notes in Computer Science | publisher=Springer | pages = 337–349 | doi = 10.1007/3-540-63238-7_38}}</ref> (which however can produce very irregular rectangles, which are less performant for many real world range and window queries). In addition to having a more advanced splitting heuristic, the [[R*-tree]] also tries to avoid splitting a node by reinserting some of the node members, which is similar to the way a [[B-tree]] balances overflowing nodes. This was shown to also reduce overlap and thus increase tree performance.

Finally, the [[X-tree]]<ref name="xtree2">{{Cite journal| first1 = Stefan | last1 = Berchtold| first2 = Daniel A. | last2 = Keim| first3 = Hans-Peter | last3 = Kriegel| author3-link = Hans-Peter Kriegel| title = The X-Tree: An Index Structure for High-Dimensional Data| journal = Proceedings of the 22nd VLDB Conference| place = Mumbai, India| year = 1996| pages = 28–39| url = http://www.dbs.ifi.lmu.de/Publikationen/Papers/x-tree.ps}}</ref> can be seen as a R*-tree variant that can also decide to not split a node, but construct a so-called super-node containing all the extra entries, when it doesn't find a good split (in particular for high-dimensional data).

{{Gallery
|title=Effect of different splitting heuristics on a database with US postal districts
|width=300 | height=300 | align=center |File:R-tree_with_Guttman's_quadratic_split.png|Guttman's quadratic split.<ref name="guttman" /><br />Pages in this tree overlap a lot.
|File:R-tree built with Guttman's linear split.png|Guttman's linear split.<ref name="guttman" /><br />Even worse structure, but also faster to construct.
|File:R-tree built with Greenes Split.png|Greene's split.<ref name="greene" /> Pages overlap much less than with Guttman's strategy.
|File:R-tree built with Ang-Tan linear split.png|Ang-Tan linear split.<ref name="ang-tan" /><br />This strategy produces sliced pages, which often yield bad query performance.
|File:R*-tree built using topological split.png |[[R* tree]] topological split.<ref name="rstar" /><br /> The pages overlap much less since the R*-tree tries to minimize page overlap, and the reinsertions further optimized the tree. The split strategy prefers quadratic pages, which yields better performance for common map applications.
|File:R*-tree bulk loaded with sort-tile-recursive.png|Bulk loaded [[R* tree]] using Sort-Tile-Recursive (STR).<br />The leaf pages do not overlap at all, and the directory pages overlap only little. This is a very efficient tree, but it requires the data to be completely known beforehand.
|File:M-tree built with MMRad split.png|[[M-tree]]s are similar to the R-tree, but use nested spherical pages.<br />Splitting these pages is, however, much more complicated and pages usually overlap much more.
}}