Maximum likelihood estimation (MLE) is a popular statistical method used to make inferences about parameters of the underlying probability distribution of a given data set.
The method was pioneered by geneticist and statistician Sir R. A. Fisher between 1912 and 1922 (see external resources below for more information on the history of MLE).
The following discussion assumes that the reader is familiar with basic notions in probability theory such as probability distributions, probability density functions, random variables and expectation. It also assumes s/he is familiar with standard basic techniques of maximising continuous real-valued functions, such as using differentiation to find a function's maxima.
Given a parameterized family Dθ of probability distributions associated with either a known probability density function (continuous distribution) or a known probability mass function (discrete distribution), denoted as fθ, we may draw a sample of values from this distribution and then using fθ we may compute the probability density associated with our observed data:
As a function of θ with x1, ..., xn fixed, this is the likelihood function
When θ is not observable, the method of maximum likelihood uses the value of θ that maximizes L(θ) as an estimate of θ. This is the maximum likelihood estimator (MLE) of θ
This contrasts with seeking an unbiased estimator of θ, which may not necessarily yield the most likely value of θ but which will yield a value that (on average) will neither tend to over-estimate nor under-estimate the true value of θ.
The maximum likelihood estimator may not be unique, or indeed may not even exist.
Consider tossing an unfair coin 80 times (i.e., we sample something like and count the number of HEADS observed). Call the probability of tossing a HEAD , and the probability of tossing TAILS (so here is the parameter which we referred to as above). Suppose we toss 49 HEADS and 31 TAILS, and suppose the coin was taken from a box containing three coins: one which gives HEADS with probability , one which gives HEADS with probability and another which gives heads with probability . The coins have lost their labels, so we don't know which one it was. Using maximum likelihood estimation we can calculate which coin it was most likely to have been, given the data that we observed. The likelihood function (defined above) takes one of three values:
We see that the likelihood is maximised by parameter , and so this is our maximum likelihood estimate for .
Now suppose our special box of coins from example 1 contains an infinite number of coins: one for every possible value . We must maximise the likelihood function:
over all possible values .
One may maximize this function by differentiating with respect to and setting to zero:
which has solutions , , and . The solution which maximises the likelihood is clearly (since and result in a likelihood of zero). Thus we say the maximum likelihood estimator for is .
This result is easily generalised by substituting a letter such as in the place of 49 to represent the observed number of 'successes' of our Bernoulli trials, and a letter such as in the place of 80 to represent the number of Bernoulli trials. Exactly the same calculation yields the maximum likelihood estimator:
for any sequence of Bernoulli trials resulting in 'successes'.
One of the most common continuous probability distributions is the normal distribution which has probability density function:
The corresponding probability density function for a sample of independent identically distributed normal random variables is
or more conveniently:
This distribution has two parameters: . This may be alarming to some, given that in the discussion above we only talked about maximising over a single parameter. However there is no need for alarm: we simply maximise the likelihood over each parameter separately, which of course is more work but no more complicated. In the above notation we would write .
When maximising the likelihood, we may equivalently maximise the log of the likelihood, since log is a continuous strictly increasing function over the range of the likelihood. the log-likelihood is closely related to information entropy and Fisher information . This often simplifies the algebra somewhat, and indeed does so in this case:
which is solved by . This is indeed the maximum of the function since it is the only turning point in and the second derivative is strictly less than zero.
Similarly we differentiate with respect to and equate to zero.
which is solved by .
Formally we say that the maximum likelihood estimator for is:
For independent observations, the maximum likelihood estimator often follows an asymptotic normal distribution.
Estimation theory | Statistics
Maximum-Likelihood-Methode | Metodo della massima verosimiglianza | Maximum Likelihood | 最尤法 | Sannsynlighetsmaksimeringsestimator | Maximum Likelihood-metoden | 最大似然估计
This article is licensed under the GNU Free Documentation License.
It uses material from the
"Maximum likelihood".
Home Page • arts • business • computers • games • health • hospitals • home • kids & teens • news • physicians • recreation• reference • regional • science • shopping • society • sports • world