article

In mathematics, a multiset (sometimes also called a bag) differs from a set in that each member has a multiplicity, which is a natural number indicating (loosely speaking) how many times it is a member, or perhaps how many memberships it has in the multiset. For example, in the multiset { a, a, b, b, b, c }, the multiplicities of the members a, b, and c are respectively 2, 3, and 1.

Formal definition


Within set theory, a multiset can be formally defined as a pair (A, m) where A is some set and m : AN is a function from A to the set N of (positive) natural numbers. The set A is called the underlying set of elements. For each a in A the multiplicity (that is, number of occurrences) of a is the number m(a).

It is common to write the function m as a set of ordered pairs { (a, m(a)) : aA } — indeed, this is the set-theoretic definition of the function m. For example,

  • the multiset written as { a, b, b } is defined as { (a, 1), (b, 2) },
  • likewise { a, a, b } is defined as { (a, 2), (b, 1) }, and
  • the multiset { a, b } is defined as { (a, 1), (b, 1) }.
  • an indexed family, ( ai ), where i is in some index-set, defines the multiset { ai } , (provided no element occurs more than a finite number of times in the family. Even in an infinite multiset the multiplicities must be finite numbers).

If the set A is finite, the size or length of the multiset (A, m) is the sum of all multiplicities for each element of A:

|(A,m)|=\sum_{a\in A}m(a).
That is the size of multiset (A, m) is counted with multiplicity, while the size of the undelying set A is counted without multiplicity:
|A|=\sum_{a\in A}1.

A submultiset (B, n) of a multiset (A, m) is a subset BA and a function n : BN such that n(a) ≤ m(a).

Examples


One of the most natural and simple examples is the multiset of prime factors of a number n. Here the underlying set of elements is the set of prime divisors of n. For example the number 120 has the prime factorization

120 = 2^3 3^1 5^1
which gives the multiset {2, 2, 2, 3, 5}.

Another is the multiset of solutions of an algebraic equation. Everyone learns in secondary school that a quadratic equation has two solutions, but in some cases they are both the same number. Thus the multiset of solutions of the equation could be { 3, 5 }, or it could be { 4, 4 }. In the latter case it has a solution of multiplicity 2.

Operations


The usual set operations such as union, intersection and Cartesian product can be easily generalized for multisets.

Suppose (A, m) and (B, n) are multisets

  • The union can be defined as (AB, f) where f(x) = max{m(x), n(x)}.
  • The intersection can be defined as (AB, f) where f(x) = min{m(x), n(x)}.
  • The cartesian product can be defined as (A × B, f) where f((x,y)) = m(x)n(y).
  • The join (sometimes called the sum) can be defined as (A,m) \biguplus (B,n) = (A \cup B, f) where f(x) = m(x)+n(x).

Multiset coefficients


The number of submultisets of size k in a set of size n is the multiset coefficient

\left\langle \begin{matrix}n \\ k \end{matrix}\right\rangle = {n + k - 1 \choose n-1}={n+k-1 \choose k},

where the expressions to the right of "=" are binomial coefficients, i.e., the number of such multisets is the same as the number of subsets of size k in a set of size n + k − 1. Unlike the situation with sets, this cardinality will not be 0 when k > n. One simple way to prove this involves representing multisets in the following way. First, consider the notation for multisets that would represent { a, a, a, a, a, a, b, b, c, c, c, d, d, d, d, d, d, d } (6 as, 2 bs, 3 cs, 7 ds) in this form:

\bullet \bullet \bullet \bullet \bullet \bullet \mid \bullet \bullet \mid \bullet \bullet \bullet \mid \bullet \bullet \bullet \bullet \bullet \bullet \bullet

This is a multiset of size 18 made of elements of a set of size 4. The number of characters including both dots and vertical lines used in this notation is 18 + 4 − 1. The number of vertical lines is 4 − 1. The number of multisets of size 18 is then the number of ways to arrange the 4 − 1 vertical lines among the 18 + 4 − 1 characters, and is thus the number of subsets of size 4 − 1 in a set of size 18 + 4 − 1. Equivalently, it is the number of ways to arrange the 18 dots among the 18 + 4 − 1 characters, which is the number of subsets of size 18 of a set of size 18 + 4 − 1. This is

{18+4-1 \choose 4-1}={18+4-1 \choose 18},

so that is the value of the multiset coefficient

\left\langle\begin{matrix} 4 \\ 18 \end{matrix}\right\rangle.

One may define a generalized binomial coefficient

{n \choose k}={n(n-1)(n-2)\cdots(n-k+1) \over k!}

in which n is not required to be a nonnegative integer, but may be negative or a non-integer, or a non-real complex number. (If k = 0, then the value of this coefficient is 1 because it is the product of no numbers.) Then the number of multisets of size k in a set of size n is

\left\langle\begin{matrix} n \\ k \end{matrix}\right\rangle=(-1)^k{-n \choose k}.

This fact led Gian-Carlo Rota to ask "Why are negative sets multisets?". He considered that question worthy of the attention of philosophers of mathematics.

Polynomial notation


The set {x} may be represented by the monomial x. The set of subsets, { {}, {x} } , is represented by the binomial 1 + x.

The set {x,y} may be represented by the monomial xy. The set of subsets, { {}, {x}, {y}, {x,y} } , is represented by the polynomial

(1 + x)(1 + y) = 1 + (x + y) + xy.

The multiset {x,x} may be represented by the monomial xx = x2. The multiset of submultisets, { {}, {x}, {x}, {x,x} } , is represented by the polynomial

(1 + x)2 = 1 + 2x + x2.

The multiset of submultisets of the multiset xn is

(1+x)^n=\sum_{k=0}^n{n \choose k} x^k.
That is why the binomial coefficient count the number of k-combinations of an n-set.

The multiset xKyN−K , containing N elements, K of which are hits, is called a statistical population, and a submultiset is called a statistical sample. The set of samples is

(1 + x)K(1 + y)N−K

which by the binomial theorem equals

\sum_{n=0}^N\sum_{k=0}^K{K \choose k}{N-K \choose n-k}x^ky^{n-k}.

So the number of n-samples with k hits is

{K \choose k}{N-K \choose n-k}

See hypergeometric distribution and inferential statistics for further on the distribution of hits.

The infinite set of finite multisets of elements from the set {x}:

{ {}, {x}, {x,x}, {x,x,x}, ... }
may be represented by the formal power series
S = 1 + x + x2 + x3 + ... = 1 + xS .
The formal solution,
S = (1 − x)−1,
makes sense as a set of multisets, but the intermediate result, 1−x, does not make sense as a set of multisets.

The infinite set of finite multisets of elements from the set xy is

(1 − x)−1(1 − y)−1 = 1 + (x + y) + (x2 + xy + y2) + ...
The special case y=x : The infinite multiset of finite multisets of elements from the multiset x2 is
(1 − x)−2 =  1 + 2x + 3x2 + ...
The general case: The infinite multiset of finite multisets of elements from the multiset xn is
(1-x)^{-n}=\sum_{k=0}^\infty{-n \choose k} (-x)^k .
This explains why "multisets are negative sets". The negative binomial coefficients count the number of k-multisets of elements from an n-set.

Cumulant generating function


A non-negative integer, n, can be represented by the monomial xn . So a finite multiset of non-negative integers, say {2, 2, 2, 3, 5}, can be represented by a polynomial f(x), say 3x2+x3+x5 .

It is convenient to consider the function g(t) = log(f(et)), say g(t) = log(3e2t+e3t+e5t) . The number of elements in the multiset is eg(0) . The mean value of the elements in the multiset is the derivative μ = g '(0) . The variance, i.e. the square of the standard deviation, of the elements in the multiset is σ2 = g ' '(0). The numbers ( μ, σ2, ... ) = ( g '(0), g ' '(0), ... ) are called cumulants, and so g is called the cumulant generating function .

The infinite set of non-negative integers {0,1,2,...} is represented by the power series 1+x+x2+... = (1−x)−1. The mean value and standard deviation are undefined. Nevertheless it has a cumulant-generating function, g(t) = −log(1−et). The derivative of this cumulant-generating function is g '(t) = (et−1)−1.

A multiset of real numbers is conveniently represented by a cumulant-generating function, even if it cannot be represented by a polynomial. See partition function (statistical mechanics) for the case where the numbers in question are the energy levels of a physical system.

The cumulant-generating function of a multiset of n real numbers having mean μ and standard deviation σ is: g(t) = log(n) + μt + 2−1t)2 + ... , and the derivative is simply: g '(t) = μ + σ2t + ...

The cumulant-generating function of a single real number k is g(t) = kt , and the derivative is the number itself: g '(t) = k .

The cumulant-generating function of a constant multiset, {k,k,k,k,...,k} of n elements all equal to the same real number k, is g(t) = log(n)+kt , and the derivative is the number itself: g '(t) = k , irrespective of n.

The cumulant-generating function of the multiset of sums of elements of two multisets of numbers is the sum of the two cumulant-generating functions:

g_{A+B}(t) = \log \sum_{i,j} e^{t(A_i+B_j)} = \log \sum_{i,j} e^{tA_i}e^{tB_j} \,
= \log \sum_i e^{tA_i}\sum_je^{tB_j} = \log \sum_i e^{tA_i}+ \log\sum_je^{tB_j} = g_{A}(t)+g_{B}(t).

The cumulant-generating function of a constant times a multiset of numbers is:

g_{kA}(t) = \log \sum_{i} e^{t(kA_i)} = \log \sum_{i} e^{(tk)A_i} = g_{A}(kt).

The multiset 2A = {2Ai} is not the same multiset as 2×A =A+A = {Ai+Aj}. For example, 2{+1,−1}={+2,−2} while 2×{+1,−1}={+1,−1}+{+1,−1}={+1+1,+1−1,−1+1,−1−1}={+2,0,0,−2}.

g_{k\times A}(t) = k g_{A}(t).

The standard normal distribution is like a limit of big multisets of numbers.

\lim_{k \rarr \infty}k^{-1/2}(k\times \{+1,-1\}).
This limit does not make sense as a multiset of numbers, but the derivative of the cumulant generating functions of the multisets in question makes sense, and the limit is well defined.

\lim_{k \rarr \infty} g'_{k^{-1/2}(k\times \{+1,-1\})}(t)=\lim_{k \rarr \infty} \frac{d(k\log(e^{+tk^{-1/2}}+e^{-tk^{-1/2}}))}{dt}=\lim_{k \rarr \infty} \frac{d(k\log(2)+2^{-1}t^2+...)}{dt}=t.

The constant term vanishes by differentiation. The terms ... vanishes in the limit.

So the derivative of the cumulant generating function of the standard normal distribution is simply g '(t) = t .

Free commutative monoids


There is a connection with the free object concept: the free commutative monoid on a set X can be taken to be the set of finite multisets with elements drawn from X, with the obvious addition operation. Such monoids are also known as (finite) formal sums of elements of X with natural coefficents. Compare free abelian group.

Set theory | Factorial and binomial topics

Multimenge | Multiconjunto | Multiinsieme | Multiset | Multizbiór | Večkratna množica | Мультимножина

 

This article is licensed under the GNU Free Documentation License. It uses material from the "Multiset".

Home Pageartsbusinesscomputersgameshealthhospitalshomekids & teensnewsphysiciansrecreationreferenceregionalscienceshoppingsocietysportsworld