Editing Join (SQL) (section)

==Inner join==

An '''inner join''' (or '''join''') requires each row in the two joined tables to have matching column values, and is a commonly used join operation in [[Application software|applications]] but should not be assumed to be the best choice in all situations. Inner join creates a new result table by combining column values of two tables (A and B) based upon the join-predicate. The query compares each row of A with each row of B to find all pairs of rows that satisfy the join-predicate. When the join-predicate is satisfied by matching non-[[Null (SQL)|NULL]] values, column values for each matched pair of rows of A and B are combined into a result row.

The result of the join can be defined as the outcome of first taking the [[cartesian product]] (or [[#Cross join|cross join]]) of all rows in the tables (combining every row in table A with every row in table B) and then returning all rows that satisfy the join predicate. Actual SQL implementations normally use other approaches, such as [[hash join]]s or [[sort-merge join]]s, since computing the Cartesian product is slower and would often require a prohibitively large amount of memory to store.

SQL specifies two different syntactical ways to express joins: the "explicit join notation" and the "implicit join notation". The "implicit join notation" is no longer considered a best practice{{By whom|date=September 2022}}, although database systems still support it.

The "explicit join notation" uses the <code>JOIN</code> keyword, optionally preceded by the <code>INNER</code> keyword, to specify the table to join, and the <code>ON</code> keyword to specify the predicates for the join, as in the following example:

<syntaxhighlight lang="sql">
SELECT employee.LastName, employee.DepartmentID, department.DepartmentName 
FROM employee 
INNER JOIN department ON
employee.DepartmentID = department.DepartmentID;
</syntaxhighlight>

{| class="wikitable"
! Employee.LastName !! Employee.DepartmentID !! Department.DepartmentName
|-
| Robinson || 34 || Clerical
|-
| Jones || 33 || Engineering
|-
|| Smith || 34 || Clerical
|-
|| Heisenberg || 33 || Engineering
|-
|| Rafferty || 31 || Sales
|}

The "implicit join notation" simply lists the tables for joining, in the <code>FROM</code> clause of the <code>SELECT</code> statement, using commas to separate them. Thus it specifies a [[#Cross join|cross join]], and the <code>WHERE</code> clause may apply additional filter-predicates (which function comparably to the join-predicates in the explicit notation).

The following example is equivalent to the previous one, but this time using implicit join notation:

<syntaxhighlight lang=sql>
SELECT employee.LastName, employee.DepartmentID, department.DepartmentName 
FROM employee, department
WHERE employee.DepartmentID = department.DepartmentID;
</syntaxhighlight>

The queries given in the examples above will join the Employee and department tables using the DepartmentID column of both tables. Where the DepartmentID of these tables match (i.e. the join-predicate is satisfied), the query will combine the ''LastName'', ''DepartmentID'' and ''DepartmentName'' columns from the two tables into a result row. Where the DepartmentID does not match, no result row is generated.

Thus the result of the [[Query plan|execution]] of the query above will be:

{| class="wikitable"
! Employee.LastName !! Employee.DepartmentID !! Department.DepartmentName
|-
| Robinson || 34 || Clerical
|-
| Jones || 33 || Engineering
|-
|| Smith || 34 || Clerical
|-
|| Heisenberg || 33 || Engineering
|-
|| Rafferty || 31 || Sales
|}

The employee "Williams" and the department "Marketing" do not appear in the query execution results. Neither of these has any matching rows in the other respective table: "Williams" has no associated department, and no employee has the department ID 35 ("Marketing"). Depending on the desired results, this behavior may be a subtle bug, which can be avoided by replacing the inner join with an [[#Outer join|outer join]].

=== Inner join and NULL values ===
Programmers should take special care when joining tables on columns that can contain [[Null (SQL)|NULL]] values, since NULL will never match any other value (not even NULL itself), unless the join condition explicitly uses a combination predicate that first checks that the joins columns are <code> NOT NULL</code> before applying the remaining predicate condition(s). The Inner Join can only be safely used in a database that enforces [[referential integrity]] or where the join columns are guaranteed not to be NULL. Many [[transaction processing]] relational databases rely on [[ACID|atomicity, consistency, isolation, durability]] (ACID) data update standards to ensure data integrity, making inner joins an appropriate choice.  However, transaction databases usually also have desirable join columns that are allowed to be NULL.  Many reporting relational database and [[data warehouse]]s use high volume [[extract, transform, load]] (ETL) batch updates which make referential integrity difficult or impossible to enforce, resulting in potentially NULL join columns that an SQL query author cannot modify and which cause inner joins to omit data with no indication of an error.  The choice to use an inner join depends on the database design and data characteristics.  A left outer join can usually be substituted for an inner join when the join columns in one table may contain NULL values.

Any data column that may be NULL (empty) should never be used as a link in an inner join, unless the intended result is to eliminate the rows with the NULL value. If NULL join columns are to be deliberately removed from the [[result set]], an inner join can be faster than an outer join because the table join and filtering is done in a single step. Conversely, an inner join can result in disastrously slow performance or even a server crash when used in a large volume query in combination with database functions in an SQL Where clause.<ref>Greg Robidoux, "Avoid SQL Server functions in the WHERE clause for Performance", MSSQL Tips, 3 May 2007</ref><ref>Patrick Wolf, "Inside Oracle APEX "Caution when using PL/SQL functions in a SQL statement", 30 November 2006</ref><ref>Gregory A. Larsen, "T-SQL Best Practices - Don't Use Scalar Value Functions in Column List or WHERE Clauses", 29 October 2009,</ref> A function in an SQL Where clause can result in the database ignoring relatively compact table indexes. The database may read and inner join the selected columns from both tables before reducing the number of rows using the filter that depends on a calculated value, resulting in a relatively enormous amount of inefficient processing.

When a result set is produced by joining several tables, including master tables used to look up full-text descriptions of numeric identifier codes (a [[Lookup table]]), a NULL value in any one of the foreign keys can result in the entire row being eliminated from the result set, with no indication of error.  A complex SQL query that includes one or more inner joins and several outer joins has the same risk for NULL values in the inner join link columns.

A commitment to SQL code containing inner joins assumes NULL join columns will not be introduced by future changes, including vendor updates, design changes and bulk processing outside of the application's data validation rules such as data conversions, migrations, bulk imports and merges.

One can further classify inner joins as equi-joins, as natural joins, or as cross-joins.

===Equi-join===
The '''equi-join''', also known as "the only eligible operation", is a specific type of comparator-based join, that uses only [[equality (mathematics)|equality]] comparisons in the join-predicate. Using other comparison operators (such as <code>&lt;</code>) disqualifies a join as an equi-join. The query shown above has already provided an example of an equi-join:
<syntaxhighlight lang=sql>
SELECT *
FROM employee JOIN department
  ON employee.DepartmentID = department.DepartmentID;
</syntaxhighlight>

We can write equi-join as below,

<syntaxhighlight lang=sql>
SELECT *
FROM employee, department
WHERE employee.DepartmentID = department.DepartmentID;
</syntaxhighlight>

If columns in an equi-join have the same name, [[SQL-92]] provides an optional shorthand notation for expressing equi-joins, by way of the <code>USING</code> construct:<ref>[http://www.java2s.com/Tutorial/Oracle/0140__Table-Joins/SimplifyingJoinswiththeUSINGKeyword.htm Simplifying Joins with the USING Keyword]</ref>
<syntaxhighlight lang=sql>
SELECT *
FROM employee INNER JOIN department USING (DepartmentID);
</syntaxhighlight>

The <code>USING</code> construct is more than mere [[syntactic sugar]], however, since the result set differs from the result set of the version with the explicit predicate. Specifically, any columns mentioned in the <code>USING</code> list will appear only once, with an unqualified name, rather than once for each table in the join. In the case above, there will be a single <code>DepartmentID</code> column and no <code>employee.DepartmentID</code> or <code>department.DepartmentID</code>.

The <code>USING</code> clause is not supported by MS SQL Server and Sybase.

===={{anchor|Natural join in inner join}}Natural join====
The natural join is a special case of equi-join. Natural join (⋈) is a [[Binary relation|binary operator]] that is written as (''R'' ⋈ ''S'') where ''R'' and ''S'' are [[relation (database)|relations]].<ref>In [[Unicode]], the bowtie symbol is ⋈ (U+22C8).</ref> The result of the natural join is the set of all combinations of [[tuples]] in ''R'' and ''S'' that are equal on their common attribute names. For an example consider the tables ''Employee'' and ''Dept'' and their natural join:

{| style="margin: 0 auto;" cellpadding="20"
|- valign="top"
|
{| class="wikitable"
 |+           ''Employee''
 |-
 ! Name   !! EmpId !! DeptName
 |-
 | Harry  || 3415  || Finance
 |-
 | Sally  || 2241  || Sales
 |-
 | George || 3401  || Finance
 |-
 | Harriet || 2202 || Sales
|}
||
{| class="wikitable"
 |+           ''Dept''
 |-
 ! DeptName   !! Manager
 |-
 | Finance  || George
 |-
 | Sales  || Harriet
 |-
 | Production || Charles
|}
||
{| class="wikitable"
 |+  ''Employee''&nbsp;<math> \bowtie </math>&nbsp;''Dept''
 |-
 ! Name   !! EmpId !!  DeptName   !! Manager
 |-
 | Harry  || 3415   || Finance  || George
 |-
 | Sally  || 2241   || Sales  || Harriet
 |-
 | George || 3401  || Finance || George
 |-
 | Harriet || 2202 || Sales || Harriet
|}
|}

This can also be used to define [[composition of relations]].  For example, the composition of ''Employee'' and ''Dept'' is their join as shown above, projected on all but the common attribute ''DeptName''.  In [[category theory]], the join is precisely the [[fiber product]].

The natural join is arguably one of the most important operators since it is the relational counterpart of logical AND.  Note that if the same  variable appears in each of two predicates that are connected by AND, then that variable stands for the same thing and both appearances must always be substituted by the same value.  In particular, the natural join allows the combination of relations that are associated by a [[foreign key]]. For example, in the above example a foreign key probably holds from ''Employee''.''DeptName'' to ''Dept''.''DeptName'' and then the natural join of ''Employee'' and ''Dept'' combines all employees with their departments. This works because the foreign key holds between attributes with the same name. If this is not the case such as in the foreign key from ''Dept''.''manager'' to ''Employee''.''Name'' then these columns have to be renamed before the natural join is taken. Such a join is sometimes also referred to as an '''equi-join'''.

More formally the semantics of the natural join are defined as follows:

:<math>R \bowtie S = \left\{ t \cup s \mid t \in R \ \land \ s \in S \ \land \ \mathit{Fun}(t \cup s) \right\}</math>,

where ''Fun'' is a [[Predicate (mathematics)|predicate]] that is true for a [[Relation (mathematics)|relation]] ''r'' [[if and only if]] ''r'' is a function. It is usually required that ''R'' and ''S'' must have at least one common attribute, but if this constraint is omitted, and ''R'' and ''S'' have no common attributes, then the natural join becomes exactly the Cartesian product.

The natural join can be simulated with Codd's primitives as follows. Let ''c''<sub>1</sub>, ..., ''c''<sub>''m''</sub> be the attribute names common to ''R'' and ''S'', ''r''<sub>1</sub>, ..., ''r''<sub>''n''</sub> be the attribute names unique to ''R'' and let ''s''<sub>1</sub>, ..., ''s''<sub>''k''</sub> be the attributes unique to ''S''. Furthermore, assume that the attribute names  ''x''<sub>1</sub>, ..., ''x''<sub>''m''</sub> are neither in ''R'' nor in ''S''. In a first step the common attribute names in ''S'' can now be renamed:

:<math>T = \rho_{x_1/c_1,\ldots,x_m/c_m}(S) = \rho_{x_1/c_1}(\rho_{x_2/c_2}(\ldots\rho_{x_m/c_m}(S)\ldots))</math>

Then we take the Cartesian product and select the tuples that are to be joined:

:<math>U = \pi_{r_1,\ldots,r_n,c_1,\ldots,c_m,s_1,\ldots,s_k}(P)</math>

A [[natural join]] is a type of equi-join where the '''join''' predicate arises implicitly by comparing all columns in both tables that have the same column-names in the joined tables. The resulting joined table contains only one column for each pair of equally named columns. In the case that no columns with the same names are found, the result is a [[cross join]].

Most experts agree that NATURAL JOINs are dangerous and therefore strongly discourage their use.<ref>[http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:13430766143199 Ask Tom "Oracle support of ANSI joins."] [http://awads.net/wp/2006/03/20/back-to-basics-inner-joins/#comment-2837 Back to basics: inner joins » Eddie Awad's Blog] {{Webarchive|url=https://web.archive.org/web/20101119182541/http://awads.net/wp/2006/03/20/back-to-basics-inner-joins/#comment-2837 |date=2010-11-19 }}</ref> The danger comes from inadvertently adding a new column, named the same as another column in the other table. An existing natural join might then "naturally" use the new column for comparisons, making comparisons/matches using different criteria (from different columns) than before. Thus an existing query could produce different results, even though the data in the tables have not been changed, but only augmented.  The use of column names to automatically determine table links is not an option in large databases with hundreds or thousands of tables where it would place an unrealistic constraint on naming conventions.  Real world databases are commonly designed with [[foreign key]] data that is not consistently populated (NULL values are allowed), due to business rules and context.  It is common practice to modify column names of similar data in different tables and this lack of rigid consistency relegates natural joins to a theoretical concept for discussion.

The above sample query for inner joins can be expressed as a natural join in the following way:
<syntaxhighlight lang=sql>
SELECT *
FROM employee NATURAL JOIN department;
</syntaxhighlight>

As with the explicit <code>USING</code> clause, only one DepartmentID column occurs in the joined table, with no qualifier:

{| class="wikitable" style="text-align:center"
|-
! DepartmentID !! Employee.LastName !! Department.DepartmentName
|-
| 34 || Smith || Clerical
|-
| 33 || Jones || Engineering
|-
| 34 || Robinson || Clerical
|-
| 33 || Heisenberg || Engineering
|-
| 31 || Rafferty || Sales
|}

PostgreSQL, MySQL and Oracle support natural joins; Microsoft T-SQL and IBM DB2 do not. The columns used in the join are implicit so the join code does not show which columns are expected, and a change in column names may change the results. In the [[SQL:2011]] standard, natural joins are part of the optional F401, "Extended joined table", package.

In many database environments the column names are controlled by an outside vendor, not the query developer.  A natural join assumes stability and consistency in column names which can change during vendor mandated version upgrades.