Multiset
Template:Short description Template:About
In mathematics, a multiset (or bag, or mset) is a modification of the concept of a set that, unlike a set,<ref name="Cantor">Template:Cite journal</ref> allows for multiple instances for each of its elements. The number of instances given for each element is called the multiplicity of that element in the multiset. As a consequence, an infinite number of multisets exist that contain only elements Template:Mvar and Template:Mvar, but vary in the multiplicities of their elements:
- The set Template:Math contains only elements Template:Mvar and Template:Mvar, each having multiplicity 1 when Template:Math is seen as a multiset.
- In the multiset Template:Math, the element Template:Mvar has multiplicity 2, and Template:Mvar has multiplicity 1.
- In the multiset Template:Math, Template:Mvar and Template:Mvar both have multiplicity 3.
These objects are all different when viewed as multisets, although they are the same set, since they all consist of the same elements. As with sets, and in contrast to tuples, the order in which elements are listed does not matter in discriminating multisets, so Template:Math and Template:Math denote the same multiset. To distinguish between sets and multisets, a notation that incorporates square brackets is sometimes used: the multiset Template:Math can be denoted by Template:Math.<ref>Template:Cite book</ref>
The cardinality or "size" of a multiset is the sum of the multiplicities of all its elements. For example, in the multiset Template:Math the multiplicities of the members Template:Mvar, Template:Mvar, and Template:Mvar are respectively 2, 3, and 1, and therefore the cardinality of this multiset is 6.
Nicolaas Govert de Bruijn coined the word multiset in the 1970s, according to Donald Knuth.<ref name="Knuth1998">Template:Cite book</ref>Template:Rp However, the concept of multisets predates the coinage of the word multiset by many centuries. Knuth himself attributes the first study of multisets to the Indian mathematician Bhāskarāchārya, who described permutations of multisets around 1150. Other names have been proposed or used for this concept, including list, bunch, bag, heap, sample, weighted set, collection, and suite.<ref name="Knuth1998"/>Template:Rp
HistoryEdit
Wayne Blizard traced multisets back to the very origin of numbers, arguing that "in ancient times, the number n was often represented by a collection of n strokes, tally marks, or units."<ref name="Blizard1989">Template:Cite journal</ref> These and similar collections of objects can be regarded as multisets, because strokes, tally marks, or units are considered indistinguishable. This shows that people implicitly used multisets even before mathematics emerged.
Practical needs for this structure have caused multisets to be rediscovered several times, appearing in literature under different names.<ref name="Blizard1991">Template:Cite journal</ref>Template:Rp For instance, they were important in early AI languages, such as QA4, where they were referred to as bags, a term attributed to Peter Deutsch.<ref name="Rulifson1972">Template:Cite tech report</ref> A multiset has been also called an aggregate, heap, bunch, sample, weighted set, occurrence set, and fireset (finitely repeated element set).<ref name="Blizard1991" />Template:Rp<ref name="Singh2007">Template:Cite journal</ref>
Although multisets were used implicitly from ancient times, their explicit exploration happened much later. The first known study of multisets is attributed to the Indian mathematician Bhāskarāchārya circa 1150, who described permutations of multisets.<ref name="Knuth1998"/>Template:Rp The work of Marius Nizolius (1498–1576) contains another early reference to the concept of multisets.<ref name="Angelelli1965">Template:Cite journal</ref> Athanasius Kircher found the number of multiset permutations when one element can be repeated.<ref>Template:Cite book</ref> Jean Prestet published a general rule for multiset permutations in 1675.<ref>Template:Cite book</ref> John Wallis explained this rule in more detail in 1685.<ref>Template:Cite book</ref>
Multisets appeared explicitly in the work of Richard Dedekind.<ref>Template:Cite book</ref><ref name=syropoulos>Template:Cite conference</ref>
Other mathematicians formalized multisets and began to study them as precise mathematical structures in the 20th century. For example, Hassler Whitney (1933) described generalized sets ("sets" whose characteristic functions may take any integer value: positive, negative or zero).<ref name="Blizard1991" />Template:Rp<ref name="Whitney">Template:Cite journal</ref>Template:Rp Monro (1987) investigated the category Mul of multisets and their morphisms, defining a multiset as a set with an equivalence relation between elements "of the same sort", and a morphism between multisets as a function that respects sorts. He also introduced a multinumber: a function fTemplate:Hairsp(x) from a multiset to the natural numbers, giving the multiplicity of element x in the multiset. Monro argued that the concepts of multiset and multinumber are often mixed indiscriminately, though both are useful.<ref name="Blizard1991" />Template:Rp<ref>Template:Cite journal</ref>
ExamplesEdit
One of the simplest and most natural examples is the multiset of prime factors of a natural number Template:Mvar. Here the underlying set of elements is the set of prime factors of Template:Mvar. For example, the number 120 has the prime factorization <math display="block">120 = 2^3 3^1 5^1,</math> which gives the multiset Template:Math.
A related example is the multiset of solutions of an algebraic equation. A quadratic equation, for example, has two solutions. However, in some cases they are both the same number. Thus the multiset of solutions of the equation could be Template:Math, or it could be Template:Math. In the latter case it has a solution of multiplicity 2. More generally, the fundamental theorem of algebra asserts that the complex solutions of a polynomial equation of degree Template:Mvar always form a multiset of cardinality Template:Mvar.
A special case of the above are the eigenvalues of a matrix, whose multiplicity is usually defined as their multiplicity as roots of the characteristic polynomial. However two other multiplicities are naturally defined for eigenvalues, their multiplicities as roots of the minimal polynomial, and the geometric multiplicity, which is defined as the dimension of the kernel of Template:Math (where Template:Mvar is an eigenvalue of the matrix Template:Mvar). These three multiplicities define three multisets of eigenvalues, which may be all different: Let Template:Mvar be a Template:Math matrix in Jordan normal form that has a single eigenvalue. Its multiplicity is Template:Mvar, its multiplicity as a root of the minimal polynomial is the size of the largest Jordan block, and its geometric multiplicity is the number of Jordan blocks.
DefinitionEdit
A multiset may be formally defined as an ordered pair Template:Math where Template:Mvar is a set called a universe or the underlying set, and <math>m\colon U \to \mathbb{Z}_{\ge 0}</math> is a function from Template:Mvar to the nonnegative integers. The value Template:Tmath for an element Template:Tmath is called the multiplicity of Template:Tmath in the multiset and intepreted as the number of occurences of Template:Tmath in the multiset.
The support of a multiset is the subset of Template:Tmath formed by the elements Template:Tmath such that Template:Tmath. A finite multiset is a multiset with a finite support. Most authors define multisets as finite multisets. This is the case in this article, where, unless otherwise stated, all multisets are finite multisets.
Some authorsTemplate:Who define multisets with the additional constraint that Template:Tmath for every Template:Tmath, or, equivalently, the support equals the underlying set. Multisets with infinite multiplicities have also been studied;<ref>Cf., for instance, Richard Brualdi, Introductory Combinatorics, Pearson.</ref> they are not considered in this article. Some authorsTemplate:Who define a multiset in terms of a finite index set Template:Tmath and a function Template:Tmath where the multiplicity of an element Template:Tmath is Template:Tmath, the number of elements of Template:Tmath that get mapped to Template:Tmath by Template:Tmath.
Multisets may be represented as sets, with some elements repeated. For example, the multiset with support Template:Tmath and multiplicity function such that Template:Tmath can be represented as Template:Math. A more compact notation, in case of high multiplicities is Template:Tmath for the same multiset.
If <math>A=\{a_1, \ldots, a_n\},</math> a multiset with support included in Template:Tmath is often represented as <math display=block>a_1^{m(a_1)} \cdots a_n^{m(a_n)},</math> to which the computation rules of indeterminates can be applied; that is, exponents 1 and factors with exponent 0 can be removed, and the multiset does not depend on the order of the factors. This allows extending the notation to infinite underlying sets as <math display=block>\prod_{a\in U} a^{m(a)}.</math> An advantage of notation is that it allows using the notation without knowing the exact support. For example, the prime factors of a natural number Template:Tmath form a multiset such that <math display=block>n=\prod_{p \;\text {prime}} p^{m(p)}= 2^{m(2)} 3^{m(3)} 5^{m(5)}\cdots.</math>
The finite subsets of a set Template:Tmath are exactly the multisets with underlying set Template:Tmath, such that Template:Tmath for every Template:Tmath.
Basic properties and operationsEdit
Elements of a multiset are generally taken in a fixed set Template:Mvar, sometimes called a universe, which is often the set of natural numbers. An element of Template:Mvar that does not belong to a given multiset is said to have a multiplicity 0 in this multiset. This extends the multiplicity function of the multiset to a function from Template:Mvar to the set <math>\N</math> of non-negative integers. This defines a one-to-one correspondence between these functions and the multisets that have their elements in Template:Mvar.
This extended multiplicity function is commonly called simply the multiplicity function, and suffices for defining multisets when the universe containing the elements has been fixed. This multiplicity function is a generalization of the indicator function of a subset, and shares some properties with it.
The support of a multiset <math>A</math> in a universe Template:Mvar is the underlying set of the multiset. Using the multiplicity function <math>m</math>, it is characterized as <math display="block">\operatorname{Supp}(A) := \{x \in U \mid m_{A}(x) > 0 \}.</math>
A multiset is finite if its support is finite, or, equivalently, if its cardinality <math display="block">|A| = \sum_{x\in\operatorname{Supp}(A)} m_A(x) = \sum_{x\in U} m_A(x)</math> is finite. The empty multiset is the unique multiset with an empty support (underlying set), and thus a cardinality 0.
The usual operations of sets may be extended to multisets by using the multiplicity function, in a similar way to using the indicator function for subsets. In the following, Template:Mvar and Template:Mvar are multisets in a given universe Template:Mvar, with multiplicity functions <math>m_A</math> and <math>m_B.</math>
- Inclusion: Template:Mvar is included in Template:Mvar, denoted Template:Math, if <math display="block">m_A(x) \le m_B(x)\quad\forall x\in U.</math>
- Union: the union (called, in some contexts, the maximum or lowest common multiple) of Template:Mvar and Template:Mvar is the multiset Template:Mvar with multiplicity function<ref name=syropoulos/> <math display="block">m_C(x) = \max(m_A(x),m_B(x))\quad\forall x\in U.</math>
- Intersection: the intersection (called, in some contexts, the infimum or greatest common divisor) of Template:Mvar and Template:Mvar is the multiset Template:Mvar with multiplicity function <math display="block">m_C(x) = \min(m_A(x),m_B(x))\quad\forall x\in U.</math>
- Sum: the sum of Template:Mvar and Template:Mvar is the multiset Template:Mvar with multiplicity function <math display="block">m_C(x) = m_A(x) + m_B(x)\quad\forall x\in U.</math> It may be viewed as a generalization of the disjoint union of sets. It defines a commutative monoid structure on the finite multisets in a given universe. This monoid is a free commutative monoid, with the universe as a basis.
- Difference: the difference of Template:Mvar and Template:Mvar is the multiset Template:Mvar with multiplicity function <math display="block">m_C(x) = \max(m_A(x) - m_B(x), 0)\quad\forall x\in U.</math>
Two multisets are disjoint if their supports are disjoint sets. This is equivalent to saying that their intersection is the empty multiset or that their sum equals their union.
There is an inclusion–exclusion principle for finite multisets (similar to the one for sets), stating that a finite union of finite multisets is the difference of two sums of multisets: in the first sum we consider all possible intersections of an odd number of the given multisets, while in the second sum we consider all possible intersections of an even number of the given multisets.Template:Citation needed
Counting multisetsEdit
and 3-multisets with elements from a 5-set (right)
So this illustrates that <math display="inline"> {7 \choose 3} = \left(\!\!{5 \choose 3}\!\!\right).</math>
Template:See also The number of multisets of cardinality Template:Mvar, with elements taken from a finite set of cardinality Template:Mvar, is sometimes called the multiset coefficient or multiset number. This number is written by some authors as <math>\textstyle\left(\!\!{n\choose k}\!\!\right)</math>, a notation that is meant to resemble that of binomial coefficients; it is used for instance in (Stanley, 1997), and could be pronounced "Template:Mvar multichoose Template:Mvar" to resemble "Template:Mvar choose Template:Mvar" for <math>\tbinom n k.</math> Like the binomial distribution that involves binomial coefficients, there is a negative binomial distribution in which the multiset coefficients occur. Multiset coefficients should not be confused with the multinomial coefficients that occur in the multinomial theorem.
The value of multiset coefficients can be given explicitly as <math display="block">\left(\!\!{n\choose k}\!\!\right) = {n+k-1 \choose k} = \frac{(n+k-1)!}{k!\,(n-1)!} = {n(n+1)(n+2)\cdots(n+k-1)\over k!},</math> where the second expression is as a binomial coefficient;Template:Efn many authors in fact avoid separate notation and just write binomial coefficients. So, the number of such multisets is the same as the number of subsets of cardinality Template:Mvar of a set of cardinality Template:Math. The analogy with binomial coefficients can be stressed by writing the numerator in the above expression as a rising factorial power <math display="block">\left(\!\!{n \choose k}\!\!\right) = {n^{\overline{k}}\over k!},</math> to match the expression of binomial coefficients using a falling factorial power: <math display="block">{n \choose k} = {n^{\underline{k}}\over k!}.</math>
For example, there are 4 multisets of cardinality 3 with elements taken from the set Template:Math of cardinality 2 (Template:Math, Template:Math), namely Template:Math, Template:Math, Template:Math, Template:Math. There are also 4 subsets of cardinality 3 in the set Template:Math of cardinality 4 (Template:Math), namely Template:Math, Template:Math, Template:Math, Template:Math.
One simple way to prove the equality of multiset coefficients and binomial coefficients given above involves representing multisets in the following way. First, consider the notation for multisets that would represent Template:Math (6 Template:Mvars, 2 Template:Mvars, 3 Template:Mvars, 7 Template:Mvars) in this form:
This is a multiset of cardinality Template:Math made of elements of a set of cardinality Template:Math. The number of characters including both dots and vertical lines used in this notation is Template:Math. The number of vertical lines is 4 − 1. The number of multisets of cardinality 18 is then the number of ways to arrange the Template:Math vertical lines among the 18 + 4 − 1 characters, and is thus the number of subsets of cardinality 4 − 1 of a set of cardinality Template:Math. Equivalently, it is the number of ways to arrange the 18 dots among the Template:Math characters, which is the number of subsets of cardinality 18 of a set of cardinality Template:Math. This is <math display="block">{4+18-1 \choose 4-1}={4+18-1 \choose 18} = 1330,</math> thus is the value of the multiset coefficient and its equivalencies: <math display="block">\begin{align} \left(\!\!{4\choose18}\!\!\right) &={21\choose18}=\frac{21!}{18!\,3!}={21\choose3}, \\[1ex] &=\frac{{\color{red}{\mathfrak{ 4\cdot5\cdot6\cdot7\cdot8\cdot9\cdot10\cdot11\cdot12\cdot13\cdot14\cdot15\cdot16\cdot17\cdot18}}} \cdot \mathbf{19\cdot20\cdot21}}{\mathbf{1\cdot2\cdot3} \cdot {\color{red}{\mathfrak{ 4\cdot5\cdot6\cdot7\cdot8\cdot9\cdot10\cdot11\cdot12\cdot13\cdot14\cdot15\cdot16\cdot17\cdot18}}}}, \\[1ex] &=\frac{ 1\cdot2\cdot3\cdot4\cdot5\cdots16\cdot17\cdot18\;\mathbf{\cdot\;19\cdot20\cdot21}} {\,1\cdot2\cdot3\cdot4\cdot5\cdots 16\cdot17\cdot18\;\mathbf{\cdot\;1\cdot2\cdot3\quad}}, \\[1ex] &=\frac{19\cdot20\cdot21}{1\cdot2\cdot3}. \end{align}</math>
From the relation between binomial coefficients and multiset coefficients, it follows that the number of multisets of cardinality Template:Mvar in a set of cardinality Template:Mvar can be written <math display="block">\left(\!\!{n\choose k}\!\!\right)=(-1)^k{-n \choose k}.</math> Additionally, <math display="block">\left(\!\!{n\choose k}\!\!\right)=\left(\!\!{k+1\choose n-1}\!\!\right).</math>
Recurrence relationEdit
A recurrence relation for multiset coefficients may be given as <math display="block">\left(\!\!{n\choose k}\!\!\right) = \left(\!\!{n\choose k - 1}\!\!\right) + \left(\!\!{n-1\choose k}\!\!\right) \quad \mbox{for } n,k>0</math> with <math display="block">\left(\!\!{n \choose 0}\!\!\right) = 1,\quad n\in\N, \quad\mbox{and}\quad \left(\!\!{0 \choose k}\!\!\right) = 0,\quad k>0.</math>
The above recurrence may be interpreted as follows. Let <math>[n] := \{1,\dots,n\}</math> be the source set. There is always exactly one (empty) multiset of size 0, and if Template:Math there are no larger multisets, which gives the initial conditions.
Now, consider the case in which Template:Math. A multiset of cardinality Template:Mvar with elements from Template:Math might or might not contain any instance of the final element Template:Mvar. If it does appear, then by removing Template:Mvar once, one is left with a multiset of cardinality Template:Math of elements from Template:Math, and every such multiset can arise, which gives a total of <math display="block">\left(\!\!{n\choose k - 1}\!\!\right)</math> possibilities.
If Template:Mvar does not appear, then our original multiset is equal to a multiset of cardinality Template:Mvar with elements from Template:Math, of which there are <math display="block">\left(\!\!{n-1\choose k}\!\!\right).</math>
Thus, <math display="block">\left(\!\!{n\choose k}\!\!\right) = \left(\!\!{n\choose k - 1}\!\!\right) + \left(\!\!{n-1\choose k}\!\!\right).</math>
Generating seriesEdit
The generating function of the multiset coefficients is very simple, being <math display="block">\sum_{d=0}^\infty \left(\!\!{n\choose d}\!\!\right)t^d = \frac{1}{(1-t)^n}.</math> As multisets are in one-to-one correspondence with monomials, <math>\left(\!\!{n\choose d}\!\!\right)</math> is also the number of monomials of degree Template:Mvar in Template:Mvar indeterminates. Thus, the above series is also the Hilbert series of the polynomial ring <math>k[x_1, \ldots, x_n].</math>
As <math>\left(\!\!{n\choose d}\!\!\right)</math> is a polynomial in Template:Mvar, it and the generating function are well defined for any complex value of Template:Mvar.
Generalization and connection to the negative binomial seriesEdit
The multiplicative formula allows the definition of multiset coefficients to be extended by replacing Template:Mvar by an arbitrary number Template:Mvar (negative, real, or complex): <math display="block">\left(\!\!{\alpha\choose k}\!\!\right) = \frac{\alpha^{\overline k}}{k!} = \frac{\alpha(\alpha+1)(\alpha+2)\cdots(\alpha+k-1)}{k(k-1)(k-2)\cdots 1}
\quad\text{for } k\in\N \text{ and arbitrary } \alpha.
</math>
With this definition one has a generalization of the negative binomial formula (with one of the variables set to 1), which justifies calling the <math>\left(\!\!{\alpha\choose k}\!\!\right)</math> negative binomial coefficients: <math display="block">(1-X)^{-\alpha} = \sum_{k=0}^\infty \left(\!\!{\alpha\choose k}\!\!\right) X^k.</math>
This Taylor series formula is valid for all complex numbers α and X with Template:Math. It can also be interpreted as an identity of formal power series in X, where it actually can serve as definition of arbitrary powers of series with constant coefficient equal to 1; the point is that with this definition all identities hold that one expects for exponentiation, notably
<math display="block">(1-X)^{-\alpha}(1-X)^{-\beta}=(1-X)^{-(\alpha+\beta)} \quad\text{and}\quad ((1-X)^{-\alpha})^{-\beta}=(1-X)^{-(-\alpha\beta)},</math> and formulas such as these can be used to prove identities for the multiset coefficients.
If Template:Mvar is a nonpositive integer Template:Mvar, then all terms with Template:Math are zero, and the infinite series becomes a finite sum. However, for other values of Template:Mvar, including positive integers and rational numbers, the series is infinite.
ApplicationsEdit
Multisets have various applications.<ref name="Singh2007" /> They are becoming fundamental in combinatorics.<ref name="Aigner1979">Template:Cite book</ref><ref name="Anderson1987">Template:Cite book</ref><ref name="Stanley1997">Template:Cite book</ref><ref name="Stanley1999">Template:Cite book</ref> Multisets have become an important tool in the theory of relational databases, which often uses the synonym bag.<ref name="GrumbachMilo1996">Template:Cite journal</ref><ref name="LibkinWong1994">Template:Cite book</ref><ref name="LibkingWong1995">Template:Cite journal</ref> For instance, multisets are often used to implement relations in database systems. In particular, a table (without a primary key) works as a multiset, because it can have multiple identical records. Similarly, SQL operates on multisets and returns identical records. For instance, consider "SELECT name from Student". In the case that there are multiple records with name "Sara" in the student table, all of them are shown. That means the result of an SQL query is a multiset; if the result were instead a set, the repetitive records in the result set would have been eliminated. Another application of multisets is in modeling multigraphs. In multigraphs there can be multiple edges between any two given vertices. As such, the entity that specifies the edges is a multiset, and not a set.
There are also other applications. For instance, Richard Rado used multisets as a device to investigate the properties of families of sets. He wrote, "The notion of a set takes no account of multiple occurrence of any one of its members, and yet it is just this kind of information that is frequently of importance. We need only think of the set of roots of a polynomial fTemplate:Hairsp(x) or the spectrum of a linear operator."<ref name="Blizard1991" />Template:Rp
GeneralizationsEdit
Different generalizations of multisets have been introduced, studied and applied to solving problems.
- Signed multisets (in which multiplicity of an element can be any integer)<ref name="Blizard1990">Template:Cite journal</ref>
- Real-valued multisets (in which multiplicity of an element can be any real number)<ref>Template:Cite journal</ref>
- Fuzzy multisets<ref name="Yager1986">Template:Cite journal</ref>
- Rough multisets<ref name="Grzymala-Busse1987">Template:Cite book</ref>
- Hybrid sets<ref name="Loeb1992">Template:Cite journal</ref>
- Multisets whose multiplicity is any real-valued step function<ref name="Miyamoto2001">Template:Cite book</ref>
- Soft multisets<ref name="Alkhazaleh2011">Template:Cite journal</ref>
- Soft fuzzy multisets<ref name="Alkhazaleh2012">Template:Cite journal</ref>
- Named sets (unification of all generalizations of sets)<ref>Template:Cite book</ref><ref>Template:Cite journal</ref><ref>Template:Cite arXiv</ref><ref>Template:Cite book</ref>
See alsoEdit
- Frequency (statistics) as multiplicity analog
- Quasi-sets
- Set theory
- Bag-of-words model
- Template:Wikiversity-inline
NotesEdit
ReferencesEdit
<references/>