Editing Functional dependency

{{Short description|Relational database theory concept}}
{{about|a concept in relational database theory|function dependencies in the Haskell programming language|type class}}
{{refimprove|date=October 2012}}
In [[relational database]] theory, a '''functional dependency''' is the following [[Relational database#Constraints|constraint]] between two attribute sets in a [[Relation (database)|relation]]: Given a relation ''R'' and attribute sets ''X'',''Y'' <math>\subseteq</math> ''R'', ''X'' is said to '''functionally determine''' ''Y'' (written ''X'' → ''Y'') if each ''X'' value is associated with precisely one ''Y'' value. ''R'' is then said to satisfy the functional dependency ''X'' → ''Y''. Equivalently, the [[projection (relational algebra)|projection]] <math>\Pi_{X,Y}R</math> is a [[Function (mathematics)|function]], that is, ''Y'' is a function of ''X''.<ref name="HalpinMorgan2008">{{cite book |author1=Terry Halpin |title=Information Modeling and Relational Databases |url=https://books.google.com/books?id=puO_VlbR_x4C&pg=PA140 |year=2008 |publisher=Morgan Kaufmann |isbn=978-0-12-373568-3 |page=140 |edition=2nd}}</ref><ref name="Date2012">{{cite book |author=Chris Date |title=Database Design and Relational Theory: Normal Forms and All That Jazz |url=https://books.google.com/books?id=8jAGhpMSjAcC&pg=PA21 |year=2012 |publisher=O'Reilly Media, Inc. |isbn=978-1-4493-2801-6 |page=21}}</ref> In simple words, if the values for the ''X'' attributes are known (say they are ''x''), then the values for the ''Y'' attributes corresponding to ''x'' can be determined by looking them up in ''any'' [[Tuple#Relational model|tuple]] of ''R'' containing ''x''. Customarily ''X'' is called the ''determinant'' set and ''Y'' the ''dependent'' set. A functional dependency FD: ''X'' → ''Y'' is called ''trivial'' if ''Y'' is a [[subset]] of ''X''.

In other words, a dependency FD: ''X'' → ''Y'' means that the values of ''Y'' are determined by the values of ''X''. Two tuples sharing the same values of ''X'' will necessarily have the same values of ''Y''.

The determination of functional dependencies is an important part of designing databases in the [[relational model]], and in [[database normalization]] and [[denormalization]]. A simple application of functional dependencies is [[Heath's theorem]]; it says that a relation ''R'' over an attribute set ''U'' and satisfying a functional dependency ''X'' → ''Y'' can be safely split in two relations having the [[Lossless-Join Decomposition|lossless-join decomposition]] property, namely into <math>\Pi_{XY}(R)\bowtie\Pi_{XZ}(R) = R</math> where ''Z'' = ''U'' − ''XY'' are the rest of the attributes. ([[set union|Union]]s of attribute sets are customarily denoted by their juxtapositions in database theory.) An important notion in this context is a [[candidate key]], defined as a minimal set of attributes that functionally determine all of the attributes in a relation. The functional dependencies, along with the [[attribute domain]]s, are selected so as to generate constraints that would exclude as much data inappropriate to the [[user domain]] from the system as possible.

A notion of [[logical implication]] is defined for functional dependencies in the following way: a set of functional dependencies <math>\Sigma</math> logically implies another set of dependencies <math>\Gamma</math>, if any relation ''R'' satisfying all dependencies from  <math>\Sigma</math> also satisfies all dependencies from <math>\Gamma</math>; this is usually written <math>\Sigma \models \Gamma</math>. The notion of logical implication for functional dependencies admits a [[soundness|sound]] and [[completeness (logic)|complete]] finite [[axiomatization]], known as [[Armstrong's axioms]].

== Examples ==

=== Cars ===
Suppose one is designing a system to track vehicles and the capacity of their engines. Each vehicle has a unique [[vehicle identification number]] (VIN). One would write ''VIN'' → ''EngineCapacity'' because it would be inappropriate for a vehicle's engine to have more than one capacity. (Assuming, in this case, that vehicles only have one engine.) On the other hand, ''EngineCapacity'' → ''VIN'' is incorrect because there could be many vehicles with the same engine capacity.

This functional dependency may suggest that the attribute EngineCapacity be placed in a relation with [[candidate key]] VIN. However, that may not always be appropriate. For example, if that functional dependency occurs as a result of the [[transitive relation|transitive]] functional dependencies VIN → VehicleModel and VehicleModel → EngineCapacity then that would not result in a normalized relation.

=== Lectures ===
This example illustrates the concept of functional dependency. The situation modelled is that of college students visiting one or more lectures in each of which they are assigned a teaching assistant (TA). Let's further assume that every student is in some semester and is identified by a unique integer ID.

{| class="wikitable"
|-
! Student ID !! Semester !! Lecture !! TA
|-
| 1234 || 6 || Numerical Methods || John
|-
| 1221 || 4 || Numerical Methods || Smith
|-
| 1234 || 6 || Visual Computing || Bob
|-
| 1201 || 2 || Numerical Methods || Peter
|-
| 1201 || 2 || Physics II || Simon
|}

We notice that whenever two rows in this table feature the same StudentID,
they also necessarily have the same Semester values. This basic fact
can be expressed by a functional dependency:
* StudentID → Semester.

If a row was added where the student had a different value of semester, then the functional dependency FD would no longer exist. This means that the FD is implied by the data as it is possible to have values that would invalidate the FD. 

Other nontrivial functional dependencies can be identified, for example:
* {StudentID, Lecture} → TA
* {StudentID, Lecture} → {TA, Semester}

The latter expresses the fact that the set {StudentID, Lecture} is a [[superkey]] of the relation.

=== Employee department ===

A classic example of functional dependency is the employee department model. 

{| class="wikitable"
|-
! Employee ID !! Employee name !! Department ID !! Department name
|-
| 0001 || John Doe || 1 || Human Resources
|-
| 0002 || Jane Doe || 2 || Marketing
|-
| 0003 || John Smith || 1 || Human Resources
|-
| 0004 || Jane Goodall || 3 || Sales
|}

This case represents an example where multiple functional dependencies are embedded in a single representation of data. Note that because an employee can only be a member of one department, the unique ID of that employee determines the department.

* Employee ID → Employee Name
* Employee ID → Department ID

In addition to this relationship, the table also has a functional dependency through a non-key attribute

* Department ID → Department Name

This example demonstrates that even though there exists a FD Employee ID → Department ID - the employee ID would not be a logical key for determination of the department Name. The process of normalization of the data would recognize all FDs and allow the designer to construct tables and relationships that are more logical based on the data.

== Properties  and axiomatization of functional dependencies ==
{{Main article|Armstrong's axioms}}
Given that ''X'', ''Y'', and ''Z'' are sets of attributes in a relation ''R'', one can derive several properties of functional dependencies.  Among the most important are the following, usually called [[Armstrong's axioms]]:<ref name="SilberschatzKorth2010a">{{cite book|author1-link=Abraham Silberschatz|author2-link=Henry F. Korth|author1=Abraham Silberschatz|author2=Henry Korth|author3=S. Sudarshan|title=Database System Concepts|year=2010|publisher=McGraw-Hill|isbn=978-0-07-352332-3|edition=6th|page=339}}</ref>
* '''Reflexivity''': If ''Y'' is a subset of ''X'', then ''X'' → ''Y'' 
* '''Augmentation''': If ''X'' → ''Y'', then ''XZ'' → ''YZ''
* '''Transitivity''': If ''X'' → ''Y'' and ''Y'' → ''Z'', then ''X'' → ''Z''

"Reflexivity" can be weakened to just <math>X \rightarrow \varnothing</math>, i.e. it is an actual [[axiom]], where the other two are proper [[inference rules]], more precisely giving rise to the following rules of syntactic consequence:<ref name="Vardi">M. Y. Vardi. [http://www.cs.rice.edu/~vardi/papers/ttcs87.pdf Fundamentals of dependency theory]. In E. Borger, editor, Trends in Theoretical
Computer Science, pages 171–224. Computer Science Press, Rockville, MD, 1987. {{ISBN|0881750840}}</ref>

<math>\vdash X \rightarrow \varnothing</math><br/>
<math>X \rightarrow Y \vdash XZ \rightarrow YZ</math><br/>
<math>X \rightarrow Y, Y \rightarrow Z \vdash X \rightarrow Z</math>.

These three rules are a [[Soundness|sound]] and [[Completeness (logic)|complete]] axiomatization of functional dependencies. This axiomatization is sometimes described as finite because the number of inference rules is finite,<ref name="alice">{{Citation
|last1=Abiteboul
|first1=Serge
|author-link=Serge Abiteboul
|last2=Hull
|first2=Richard B.
|author2-link=Richard B. Hull
|last3=Vianu
|first3=Victor
|author3-link=Victor Vianu
|title=Foundations of Databases
|publisher=Addison-Wesley
|year=1995
|isbn=0-201-53771-0
|url=https://archive.org/details/foundationsofdat0000abit/page/164
|pages=[https://archive.org/details/foundationsofdat0000abit/page/164 164–168]
}}</ref> with the caveat that the axiom and rules of inference are all [[Schema (logic)|schemata]], meaning that the ''X'', ''Y'' and ''Z'' range over all ground terms (attribute sets).<ref name="Vardi"/>

By applying augmentation and transitivity, one can derive two additional rules:
* '''Pseudotransitivity''': If ''X'' → ''Y'' and ''YW'' → ''Z'', then ''XW'' → ''Z''<ref name="SilberschatzKorth2010a"/>
* '''Composition''': If ''X'' → ''Y'' and ''Z'' → ''W'', then ''XZ'' → ''YW''<ref name="Singh2009">{{cite book|author=S. K. Singh|title=Database Systems: Concepts, Design & Applications|url=https://books.google.com/books?id=8PNCKe2SpRwC&pg=PA323|year=2009|orig-year=2006|publisher=Pearson Education India|isbn=978-81-7758-567-4|page=323}}</ref>

One can also derive the [[Armstrong's axioms#Additional rules (Secondary Rules)|'''union''' and '''decomposition''']] rules from Armstrong's axioms:<ref name="SilberschatzKorth2010a"/><ref name="Garcia-MolinaUllman2009">{{cite book|author1=Hector Garcia-Molina|author2=Jeffrey D. Ullman|author3=Jennifer Widom|title=Database systems: the complete book|year=2009|publisher=Pearson Prentice Hall|isbn=978-0-13-187325-4|edition=2nd|page=73}} This is sometimes called the splitting/combining rule.</ref>
:''X'' → ''Y'' and ''X'' → ''Z'' [[if and only if]] ''X'' → ''YZ''

== Closure ==

=== Closure of functional dependency ===
The closure of a set of values is the set of attributes that can be determined using its functional dependencies for a given relationship. One uses [[Armstrong's axioms]] to provide a proof - i.e. reflexivity, augmentation, transitivity.

Given <math>R</math> and <math>F</math> a set of FDs that holds in <math>R</math>:
The closure of <math>F</math> in <math>R</math> (denoted <math>F</math><sup>+</sup>) is the set of all FDs that are logically implied by <math>F</math>.<ref>{{Cite journal|last=Saiedian|first=H.|date=1996-02-01|title=An Efficient Algorithm to Compute the Candidate Keys of a Relational Database Schema|url=https://academic.oup.com/comjnl/article-lookup/doi/10.1093/comjnl/39.2.124|journal=The Computer Journal|language=en|volume=39|issue=2|pages=124–132|doi=10.1093/comjnl/39.2.124|issn=0010-4620}}</ref>

=== Closure of a set of attributes ===
Closure of a set of attributes X with respect to <math>F</math> is the set X<sup>+</sup> of all attributes that are functionally determined by X using <math>F</math><sup>+</sup>.

==== Example ====
Imagine the following list of FDs. We are going to calculate a closure for A (written as A<sup>+</sup>) from this relationship.

# ''A'' → ''B''
# ''B'' → ''C''
# ''AB'' → ''D''

The closure would be as follows:
{{ordered list | list-style-type = lower-alpha
| A → A (by Armstrong's reflexivity)
| A → AB (by 1. and (a))
| A → ABD (by (b), 3, and Armstrong's transitivity)
| A → ABCD (by (c), and 2)
}}
Therefore, A<sup>+</sup>= ABCD. Because A<sup>+</sup> includes every attribute in the relationship, it is a [[superkey]].

== Covers and equivalence ==

=== Covers ===
'''Definition''': <math>F</math> covers <math>G</math> if every FD in <math>G</math> can be inferred from <math>F</math>. <math>F</math> covers <math>G</math> if <math>G</math><sup>+</sup> &sube; <math>F</math><sup>+</sup> <br/>
Every set of functional dependencies has a [[canonical cover]].

=== Equivalence of two sets of FDs ===
Two sets of FDs <math>F</math> and <math>G</math> over schema <math>R</math> are equivalent, written <math>F</math> &equiv; <math>G</math>, if <math>F</math><sup>+</sup> = <math>G</math><sup>+</sup>. If <math>F</math> &equiv; <math>G</math>, then <math>F</math> is a cover for <math>G</math> and vice versa. In other words, equivalent sets of functional dependencies are called ''covers'' of each other.

=== Non-redundant covers ===
A set <math>F</math> of FDs is nonredundant if there is no proper subset 
<math>F'</math> of <math>F</math> with <math>F'</math> &equiv; <math>F</math>. If such an <math>F'</math> exists, <math>F</math> is redundant. <math>F</math> is a nonredundant cover for <math>G</math> if <math>F</math> is a cover for <math>G</math> and <math>F</math> is nonredundant.
<br/>
An alternative characterization of nonredundancy is that <math>F</math> is nonredundant if there is no FD ''X'' → ''Y'' in <math>F</math> such that <math>F </math> - {''X'' → ''Y''} <math>\models</math> ''X'' → ''Y''. Call an FD ''X'' → ''Y'' in <math>F</math> redundant in <math>F</math> if <math>F </math> - {''X'' → ''Y''} <math>\models</math> ''X'' → ''Y''.

== Applications to normalization ==

=== Heath's theorem ===
An important property (yielding an immediate application) of functional dependencies is that if ''R'' is a relation with columns named from some set of attributes ''U'' and ''R'' satisfies some functional dependency ''X'' → ''Y'' then <math>R=\Pi_{XY}(R)\bowtie\Pi_{XZ}(R)</math> where ''Z'' = ''U'' − ''XY''. Intuitively, if a functional dependency ''X'' → ''Y'' holds in ''R'', then the relation can be safely split in two relations alongside the column ''X'' (which is a key for <math>\Pi_{XY}(R)\bowtie\Pi_{XZ}(R)</math>) ensuring that when the two parts are joined back no data is lost, i.e. a functional dependency provides a simple way to construct a [[lossless join decomposition]] of ''R'' in two smaller relations. This fact is sometimes called ''Heaths theorem''; it is one of the early results in database theory.<ref>{{Cite book | last1 = Heath | first1 = I. J. | chapter = Unacceptable file operations in a relational data base | doi = 10.1145/1734714.1734717 | title = Proceedings of the 1971 ACM SIGFIDET (now SIGMOD) Workshop on Data Description, Access and Control - SIGFIDET '71 | pages =  19–33 | year = 1971 | s2cid = 22069259 }} cited in:
* {{cite book|editor=Michael Anshel and William Gewirtz|title=Mathematics of Information Processing: [short Course Held in Louisville, Kentucky, January 23-24, 1984]|chapter-url=https://archive.org/details/mathematicsofinf0034unse/page/23|year=1986|publisher=American Mathematical Soc.|isbn=978-0-8218-0086-7|author=Ronald Fagin and Moshe Y. Vardi|chapter=The Theory of Data Dependencies - A Survey|page=[https://archive.org/details/mathematicsofinf0034unse/page/23 23]}}
*{{cite book|author=C. Date|title=Database in Depth: Relational Theory for Practitioners|url=https://books.google.com/books?id=TR8f5dtnC9IC&pg=PT162|year=2005|publisher=O'Reilly Media, Inc.|isbn=978-0-596-10012-4|page=142}}
</ref>

Heath's theorem effectively says we can pull out the values of ''Y'' from the big relation ''R'' and store them into one, <math>\Pi_{XY}(R)</math>, which has no value repetitions in the row for ''X'' and is effectively a [[lookup table]] for ''Y'' keyed by ''X'' and consequently has only one place to update the ''Y'' corresponding to each ''X'' unlike the "big" relation ''R'' where there are potentially many copies of each ''X'', each one with its copy of ''Y'' which need to be kept synchronized on updates. (This elimination of redundancy is an advantage in [[OLTP]] contexts, where many changes are expected, but not so much in [[OLAP]] contexts, which involve mostly queries.) Heath's decomposition leaves only ''X'' to act as a [[foreign key]] in the remainder of the big table <math>\Pi_{XZ}(R)</math>.

Functional dependencies however should not be confused with [[inclusion dependency|inclusion dependencies]], which are the formalism for foreign keys; even though they are used for normalization, functional dependencies express constraints over one relation (schema), whereas inclusion dependencies express constraints between relation schemas in a [[database schema]]. Furthermore, the two notions do not even intersect in the [[classification of dependencies]]: functional dependencies are [[equality-generating dependency|equality-generating dependencies]] whereas inclusion dependencies are [[tuple-generating dependency|tuple-generating dependencies]]. Enforcing referential constraints after relation schema decomposition (normalization) requires a new formalism, i.e. inclusion dependencies. In the decomposition resulting from Heath's theorem, there is nothing preventing the insertion of tuples in <math>\Pi_{XZ}(R)</math> having some value of ''X'' not found in <math>\Pi_{XY}(R)</math>.

=== Normal forms ===
Normal forms are [[database normalization]] levels which determine the "goodness" of a table. Generally, the [[third normal form]] is considered to be a "good" standard for a relational database.{{citation needed|date=December 2012}}

Normalization aims to free the database from update, insertion and deletion anomalies. It also ensures that when a new value is introduced into the relation, it has minimal effect on the database, and thus minimal effect on the applications using the database.{{citation needed|date=December 2012}}

== Irreducible function depending set ==
A set S of functional dependencies is irreducible if the set has the following three properties:

# Each right set of a functional dependency of S contains only one attribute.
# Each left set of a functional dependency of S is irreducible. It means that reducing any one attribute from left set will change the content of S (S will lose some information).
# Reducing any functional dependency will change the content of S.

Sets of functional dependencies with these properties are also called ''canonical'' or ''minimal''. Finding such a set S of functional dependencies which is equivalent to some input set S' provided as input is called finding a ''minimal cover'' of S': this problem can be solved in polynomial time.<ref>{{Cite journal|last1=Meier|first1=Daniel|title=Minimum covers in the relational database model|year=1980|journal=[[Journal of the ACM]]|volume=27 |issue=4 |pages=664–674 |doi=10.1145/322217.322223|s2cid=15789293 |doi-access=free}}{{Closed access}}</ref>

== See also ==
* [[Chase (algorithm)]]
* [[Inclusion dependency]]
* [[Join dependency]]
* [[Multivalued dependency]] (MVD)
* [[Database normalization]]
* [[First normal form]]

== References ==
{{reflist}}

== Further reading ==
* {{cite journal|url=https://forum.thethirdmanifesto.com/wp-content/uploads/asgarosforum/987737/00-efc-further-normalization.pdf|title=Further Normalization of the Data Base Relational Model|first=E. F.|last=Codd|author-link=Edgar F. Codd|place=San Jose, California|journal=ACM Transactions on Database Systems|publisher=[[Association for Computing Machinery]]|date=1972}}

== External links ==
* {{cite web
 | url=http://www.cs.umbc.edu/courses/461/current/burt/lectures/lec14/
 | publisher=[[University of Maryland Baltimore County]] Department of Computer Science and Electrical Engineering
 | author=Gary Burt
 | title=CS 461 (Database Management Systems) lecture notes
 | date=Summer 1999
}}
* {{cite web
 | url=http://www-db.stanford.edu/~ullman/cs345notes/slides01-1.ps
 | title=CS345 Lecture Notes
 | publisher=Stanford University
 | author=Jeffrey D. Ullman
 | format=[[PostScript]]
}}
* {{cite web
 | url=http://www.cs.sfu.ca/CC/354/zaiane/material/notes/Chapter6/node10.html
 | author=Osmar Zaiane 
 | date=June 9, 1998
 | work=CMPT 354 (Database Systems I) lecture notes
 | title=Chapter 6: Integrity constraints
 | publisher=[[Simon Fraser University]] Department of Computing Science
}}

[[Category:Data modeling]]