In mathematics, a multiset (sometimes also called a bag) differs from a set in that each member has a multiplicity, which is a natural number indicating (loosely speaking) how many times it is a member, or perhaps how many memberships it has in the multiset. For example, in the multiset { a, a, b, b, b, c }, the multiplicities of the members a, b, and c are respectively 2, 3, and 1.
Within set theory, a multiset can be formally defined as a pair (A, m) where A is some set and m : A → N is a function from A to the set N of (positive) natural numbers. The set A is called the underlying set of elements. For each a in A the multiplicity (that is, number of occurrences) of a is the number m(a).
It is common to write the function m as a set of ordered pairs { (a, m(a)) : a ∈ A } — indeed, this is the set-theoretic definition of the function m. For example,
If the set A is finite, the size or length of the multiset (A, m) is the sum of all multiplicities for each element of A:
A submultiset (B, n) of a multiset (A, m) is a subset B ⊆ A and a function n : B → N such that n(a) ≤ m(a).
One of the most natural and simple examples is the multiset of prime factors of a number n. Here the underlying set of elements is the set of prime divisors of n. For example the number 120 has the prime factorization
Another is the multiset of solutions of an algebraic equation. Everyone learns in secondary school that a quadratic equation has two solutions, but in some cases they are both the same number. Thus the multiset of solutions of the equation could be { 3, 5 }, or it could be { 4, 4 }. In the latter case it has a solution of multiplicity 2.
Suppose (A, m) and (B, n) are multisets
where the expressions to the right of "=" are binomial coefficients, i.e., the number of such multisets is the same as the number of subsets of size k in a set of size n + k − 1. Unlike the situation with sets, this cardinality will not be 0 when k > n. One simple way to prove this involves representing multisets in the following way. First, consider the notation for multisets that would represent { a, a, a, a, a, a, b, b, c, c, c, d, d, d, d, d, d, d } (6 as, 2 bs, 3 cs, 7 ds) in this form:
This is a multiset of size 18 made of elements of a set of size 4. The number of characters including both dots and vertical lines used in this notation is 18 + 4 − 1. The number of vertical lines is 4 − 1. The number of multisets of size 18 is then the number of ways to arrange the 4 − 1 vertical lines among the 18 + 4 − 1 characters, and is thus the number of subsets of size 4 − 1 in a set of size 18 + 4 − 1. Equivalently, it is the number of ways to arrange the 18 dots among the 18 + 4 − 1 characters, which is the number of subsets of size 18 of a set of size 18 + 4 − 1. This is
so that is the value of the multiset coefficient
One may define a generalized binomial coefficient
in which n is not required to be a nonnegative integer, but may be negative or a non-integer, or a non-real complex number. (If k = 0, then the value of this coefficient is 1 because it is the product of no numbers.) Then the number of multisets of size k in a set of size n is
This fact led Gian-Carlo Rota to ask "Why are negative sets multisets?". He considered that question worthy of the attention of philosophers of mathematics.
The set {x,y} may be represented by the monomial xy. The set of subsets, { {}, {x}, {y}, {x,y} } , is represented by the polynomial
The multiset {x,x} may be represented by the monomial xx = x2. The multiset of submultisets, { {}, {x}, {x}, {x,x} } , is represented by the polynomial
The multiset of submultisets of the multiset xn is
The multiset xKyN−K , containing N elements, K of which are hits, is called a statistical population, and a submultiset is called a statistical sample. The set of samples is
which by the binomial theorem equals
So the number of n-samples with k hits is
See hypergeometric distribution and inferential statistics for further on the distribution of hits.
The infinite set of finite multisets of elements from the set {x}:
The infinite set of finite multisets of elements from the set xy is
It is convenient to consider the function g(t) = log(f(et)), say g(t) = log(3e2t+e3t+e5t) . The number of elements in the multiset is eg(0) . The mean value of the elements in the multiset is the derivative μ = g '(0) . The variance, i.e. the square of the standard deviation, of the elements in the multiset is σ2 = g ' '(0). The numbers ( μ, σ2, ... ) = ( g '(0), g ' '(0), ... ) are called cumulants, and so g is called the cumulant generating function .
The infinite set of non-negative integers {0,1,2,...} is represented by the power series 1+x+x2+... = (1−x)−1. The mean value and standard deviation are undefined. Nevertheless it has a cumulant-generating function, g(t) = −log(1−et). The derivative of this cumulant-generating function is g '(t) = (e−t−1)−1.
A multiset of real numbers is conveniently represented by a cumulant-generating function, even if it cannot be represented by a polynomial. See partition function (statistical mechanics) for the case where the numbers in question are the energy levels of a physical system.
The cumulant-generating function of a multiset of n real numbers having mean μ and standard deviation σ is: g(t) = log(n) + μt + 2−1(σt)2 + ... , and the derivative is simply: g '(t) = μ + σ2t + ...
The cumulant-generating function of a single real number k is g(t) = kt , and the derivative is the number itself: g '(t) = k .
The cumulant-generating function of a constant multiset, {k,k,k,k,...,k} of n elements all equal to the same real number k, is g(t) = log(n)+kt , and the derivative is the number itself: g '(t) = k , irrespective of n.
The cumulant-generating function of the multiset of sums of elements of two multisets of numbers is the sum of the two cumulant-generating functions:
The cumulant-generating function of a constant times a multiset of numbers is:
The multiset 2A = {2Ai} is not the same multiset as 2×A =A+A = {Ai+Aj}. For example, 2{+1,−1}={+2,−2} while 2×{+1,−1}={+1,−1}+{+1,−1}={+1+1,+1−1,−1+1,−1−1}={+2,0,0,−2}.
The standard normal distribution is like a limit of big multisets of numbers.
The constant term vanishes by differentiation. The terms ... vanishes in the limit.
So the derivative of the cumulant generating function of the standard normal distribution is simply g '(t) = t .
Set theory | Factorial and binomial topics
Multimenge | Multiconjunto | Multiinsieme | Multiset | Multizbiór | Večkratna množica | Мультимножина
This article is licensed under the GNU Free Documentation License.
It uses material from the
"Multiset".
Home Page • arts • business • computers • games • health • hospitals • home • kids & teens • news • physicians • recreation• reference • regional • science • shopping • society • sports • world