Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Join (SQL)
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Implementation== {{multiple image | width = 140 | image1 = Triangle-query-join-query-plan-r-st.svg | alt1 = A query plan for the triangle query R(A, B) β S(B, C) β T(A, C) that uses binary joins. It joins S and T first, then joins the result with R. | image2 = Triangle-query-join-query-plan-rs-t.svg | alt2 = A query plan for the triangle query R(A, B) β S(B, C) β T(A, C) that uses binary joins. It joins R and S first, then joins the result with T. | footer = Two possible [[query plan]]s for the {{dfni|triangle query}} {{math|R(A, B) β S(B, C) β T(A, C)}}; the first joins {{mvar|S}} and {{mvar|T}} first and joins the result with {{mvar|R}}, the second joins {{mvar|R}} and {{mvar|S}} first and joins the result with {{mvar|T}} }} Much work in database-systems has aimed at efficient implementation of joins, because relational systems commonly call for joins, yet face difficulties in optimising their efficient execution. The problem arises because inner joins operate both [[commutative]]ly and [[associative]]ly. In practice, this means that the user merely supplies the list of tables for joining and the join conditions to use, and the database system has the task of determining the most efficient way to perform the operation. The choices become more complex as the number of tables involved in a query increases, with each table having different characteristics in record count, average record length (considering NULL fields) and available indexes. Where Clause filters can also significantly impact query volume and cost. A [[query optimizer]] determines how to execute a query containing joins. A query optimizer has two basic freedoms: # '''Join order''': Because it joins functions commutatively and associatively, the order in which the system joins tables does not change the final result set of the query. However, join-order '''could''' have an enormous impact on the cost of the join operation, so choosing the best join order becomes very important. # '''Join method''': Given two tables and a join condition, multiple [[algorithm]]s can produce the result set of the join. Which algorithm runs most efficiently depends on the sizes of the input tables, the number of rows from each table that match the join condition, and the operations required by the rest of the query. Many join-algorithms treat their inputs differently. One can refer to the inputs to a join as the "outer" and "inner" join operands, or "left" and "right", respectively. In the case of nested loops, for example, the database system will scan the entire inner relation for each row of the outer relation. One can classify query-plans involving joins as follows:<ref name="Yu1998">{{Harvnb|Yu|Meng|1998|p=213}} </ref> ; left-deep : using a base table (rather than another join) as the inner operand of each join in the plan ; right-deep : using a base table as the outer operand of each join in the plan ; bushy : neither left-deep nor right-deep; both inputs to a join may themselves result from joins These names derive from the appearance of the [[query plan]] if drawn as a [[Tree data structure|tree]], with the outer join relation on the left and the inner relation on the right (as convention dictates). ===Join algorithms=== [[Image:Comparison of join algorithms.png|thumb|An illustration of properties of join algorithms. When performing a join between more than two relations on more than two attributes, binary join algorithms such as [[hash join]] operate over two relations at a time, and join them on all attributes in the join condition; [[Worst-case optimal join algorithm|worst-case optimal algorithms]] such as generic join operate on a single attribute at a time but join all the relations on this attribute.<ref>{{Cite arXiv |last1=Wang |first1=Yisu Remy |last2=Willsey |first2=Max |last3=Suciu |first3=Dan |date=2023-01-27 |title=Free Join: Unifying Worst-Case Optimal and Traditional Joins |class=cs.DB |eprint=2301.10841 }}</ref>]] Three fundamental algorithms for performing a binary join operation exist: [[nested loop join]], [[sort-merge join]] and [[hash join]]. [[Worst-case optimal join algorithm]]s are asymptotically faster than binary join algorithms for joins between more than two relations in the [[worst case]]. ===Join indexes=== Join indexes are [[database index]]es that facilitate the processing of join queries in [[data warehouse]]s: they are currently (2012) available in implementations by [[Oracle database|Oracle]]<ref>Oracle Bitmap Join Indexes. {{cite web |url=https://docs.oracle.com/en/database/oracle/oracle-database/23/cncpt/indexes-and-index-organized-tables.html#GUID-3286EBA4-0D5B-423D-815B-997A3E4B4B6C |title=Database Concepts - 5 Indexes and Index-Organized Tables - Bitmap Join Indexes |access-date=2024-06-23 <!-- |url-status=dead |archive-url=https://web.archive.org/web/20240623094250/https://docs.oracle.com/en/database/oracle/oracle-database/23/cncpt/indexes-and-index-organized-tables.html#GUID-3286EBA4-0D5B-423D-815B-997A3E4B4B6C |archive-date=2024-06-23 --> }}</ref> and [[Teradata]].<ref>Teradata Join Indexes. {{cite web |url=https://docs.teradata.com/r/Enterprise_IntelliFlex_VMware/SQL-Data-Definition-Language-Syntax-and-Examples/Index-Statements/CREATE-JOIN-INDEX |title=SQL Data Definition Language Syntax and Examples - CREATE JOIN INDEX |access-date=2024-06-23 <!-- |url-status=dead |archive-url=At the moment the archiving sites are unable to archive the page. |archive-date=2024-06-23 --> }}</ref> In the Teradata implementation, specified columns, aggregate functions on columns, or components of date columns from one or more tables are specified using a syntax similar to the definition of a [[database view]]: up to 64 columns/column expressions can be specified in a single join index. Optionally, a column that defines the [[primary key]] of the composite data may also be specified: on parallel hardware, the column values are used to partition the index's contents across multiple disks. When the source tables are updated interactively by users, the contents of the join index are automatically updated. Any query whose WHERE clause specifies any combination of columns or column expressions that are an exact subset of those defined in a join index (a so-called "covering query") will cause the join index, rather than the original tables and their indexes, to be consulted during query execution. The Oracle implementation limits itself to using [[bitmap index]]es. A ''bitmap join index'' is used for low-cardinality columns (i.e., columns containing fewer than 300 distinct values, according to the Oracle documentation): it combines low-cardinality columns from multiple related tables. The example Oracle uses is that of an inventory system, where different suppliers provide different parts. The [[database schema|schema]] has three linked tables: two "master tables", Part and Supplier, and a "detail table", Inventory. The last is a many-to-many table linking Supplier to Part, and contains the most rows. Every part has a Part Type, and every supplier is based in the US, and has a State column. There are not more than 60 states+territories in the US, and not more than 300 Part Types. The bitmap join index is defined using a standard three-table join on the three tables above, and specifying the Part_Type and Supplier_State columns for the index. However, it is defined on the Inventory table, even though the columns Part_Type and Supplier_State are "borrowed" from Supplier and Part respectively. As for Teradata, an Oracle bitmap join index is only utilized to answer a query when the query's WHERE clause specifies columns limited to those that are included in the join index. === Straight join === Some database systems allow the user to force the system to read the tables in a join in a particular order. This is used when the join optimizer chooses to read the tables in an inefficient order. For example, in [[MySQL]] the command <code>STRAIGHT_JOIN</code> reads the tables in exactly the order listed in the query.<ref>{{Cite web|title = 13.2.9.2 JOIN Syntax|url = https://dev.mysql.com/doc/refman/5.7/en/join.html|website = MySQL 5.7 Reference Manual|access-date = 2015-12-03|publisher = [[Oracle Corporation]]}}</ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)