In probability theory and, in particular, information theory, the mutual information, or transinformation, of two random variables is a quantity that measures the mutual dependence of the two variables. The most common unit of measurement of mutual information is the bit, in which case the logarithms below should be taken to the base 2.
Intuitively, mutual information measures the information about X that is shared by Y. If X and Y are independent, then X contains no information about Y and vice versa, so their mutual information is zero. If X and Y are identical then all information conveyed by X is shared with Y: knowing X reveals nothing new about Y and vice versa, therefore the mutual information is the same as the information conveyed by X (or Y) alone, namely the entropy of X. In a specific sense (see below), mutual information quantifies the distance between the joint distribution of X and Y and the product of their marginal distributions.
Formally, the mutual information of two discrete random variables X and Y can be defined as:
where p(x,y) is the joint probability distribution function of X and Y, and p(x) and p(y) are the marginal probability distribution functions of X and Y respectively.
In the continuous case, we replace summation by a definite double integral:
where p(x,y) is now the joint probability density function of X and Y, and p(x) and p(y) are the marginal probability density functions of X and Y respectively.
Mutual information is a measure of independence in the following sense: I(X; Y) = 0 iff X and Y are independent random variables. This is easy to see in one direction: if X and Y are independent, then p(x,y) = p(x) × p(y), and therefore:
Moreover, mutual information is nonnegative (i.e. I(X;Y) ≥ 0; see below) and symmetric (i.e. I(X;Y) = I(Y;X)).
Several generalizations of mutual information to more than two random variables have been proposed, but a widely agreed on definition has not yet emerged.
Mutual information can be equivalently expressed as
where H(X) and H(Y) are entropies, H(X|Y) and H(Y|X) are conditional entropies, and H(X,Y) is the joint entropy of X and Y. Since H(X) ≥ H(X|Y), this characterization is consistent with the nonnegativity property stated above.
Note that H(X|X) = 0 and therefore H(X) = I(X;X). This is the reason why entropy is often called self-information. Thus I(X;X) ≥ I(X;Y), and one can formulate the basic principle that a variable contains more information about itself than any other variable can provide.
Mutual information can also be expressed as a Kullback-Leibler divergence, of the product p(x) × p(y) of the marginal distributions of the two random variables X and Y, from p(x,y) the random variables' joint distribution:
Furthermore, let p(x|y) = p(x, y) / p(y). Then
Thus mutual information can thus also be understood as the expectation of the Kullback-Leibler divergence of the univariate distribution p(x) of X from the conditional distribution p(x|y) of X given Y: the more different the distributions p(x|y) and p(x), the greater the information gain.
In many applications, one wants to maximize mutual information (thus increasing dependencies), which is often equivalent to minimizing conditional entropy. Examples include:
Information theory | Transinformation | Information mutuelle
This article is licensed under the GNU Free Documentation License.
It uses material from the
"Mutual information".
Home Page • arts • business • computers • games • health • hospitals • home • kids & teens • news • physicians • recreation• reference • regional • science • shopping • society • sports • world