article

\; \exp\left(-\frac{\left(x-\mu\right)^2}{2\sigma^2} \right) \!| cdf =\frac12 \left(1 + \mathrm{erf}\,\frac{x-\mu}{\sigma\sqrt2}\right) \!| mean =\mu| median =\mu| mode =\mu| variance =\sigma^2| skewness = 0| kurtosis = 3| entropy =\ln\left(\sigma\sqrt{2\,\pi\,e}\right)\!| mgf =M_X(t)= \exp\left(\mu\,t+\frac{\sigma^2 t^2}{2}\right)| char =\phi_X(t)=\exp\left(\mu\,i\,t-\frac{\sigma^2 t^2}{2}\right)| }} The normal distribution, also called Gaussian distribution, is an extremely important probability distribution in many fields. It is a family of distributions of the same general form, differing in their location and scale parameters: the mean ("average") and standard deviation ("variability"), respectively. The standard normal distribution is the normal distribution with a mean of zero and a standard deviation of one (the green curves in the plots to the right). It is often called the bell curve because the graph of its probability density resembles a bell.

Overview


The fundamental importance of the normal distribution as model of quantitative phenomena in the natural and behavioral sciences, is due to the central limit theorem (the proof of which requires rather advanced undergraduate mathematics). A variety of psychological test scores and physical phenomena like photon counts can be well approximated by a normal distribution. While the mechanisms underlying these phenomena are often unknown, the use of the normal model can be theoretically justified if one assumes many small (independent) effects contribute to each observation in an additive fashion. The normal distribution also arises in many areas of statistics: for example, the sampling distribution of the mean is approximately normal, even if the distribution of the population the sample is taken from is not normal. In addition, the normal distribution maximizes information entropy among all distributions with known mean and variance, which makes it the natural choice of underlying distribution for data summarized in terms of sample mean and variance. The normal distribution is the most widely used family of distributions in statistics and many statistical tests are based on the assumption of normality. In probability theory, normal distributions arise as the limiting distributions of several continuous and discrete families of distributions.

History


The normal distribution was first introduced by Abraham de Moivre in an article in 1734 (reprinted in the second edition of his The Doctrine of Chances, 1738) in the context of approximating certain binomial distributions for large n. His result was extended by Laplace in his book Analytical Theory of Probabilities (1812), and is now called the theorem of de Moivre-Laplace.

Laplace used the normal distribution in the analysis of errors of experiments. The important method of least squares was introduced by Legendre in 1805. Gauss, who claimed to have used the method since 1794, justified it rigorously in 1809 by assuming a normal distribution of the errors.

The name "bell curve" goes back to Jouffret who first used the term "bell surface" in 1872 for a bivariate normal with independent components. The name "normal distribution" was coined independently by Charles S. Peirce, Francis Galton and Wilhelm Lexis around 1875. This terminology is unfortunate, since it reflects and encourages the fallacy that many or all probability distributions are "normal". (See the discussion of "occurrence" below.)

That the distribution is called the normal or Gaussian distribution is an instance of Stigler's law of eponymy: "No scientific discovery is named after its original discoverer."

Specification of the normal distribution


There are various ways to specify a random variable. The most visual is the probability density function (plot at the top), which represents how likely each value of the random variable is. The cumulative distribution function is a conceptually cleaner way to specify the same information, but to the untrained eye its plot is much less informative (see below). Equivalent ways to specify the normal distribution are: the moments, the cumulants, the characteristic function, the moment-generating function, and the cumulant-generating function. Some of these are very useful for theoretical work, but not intuitive. See probability distribution for a discussion.

All of the cumulants of the normal distribution are zero, except the first two.

Probability density function

The probability density function of the normal distribution with mean \mu and variance \sigma^2 (equivalently, standard deviation \sigma) is an example of a Gaussian function,

f(x;\mu,\sigma) = \frac{1}{\sigma\sqrt{2\pi}} \, \exp \left( -\frac{(x- \mu)^2}{2\sigma^2} \right).

As a Gaussian function with the denominator of the exponent equal to two, the standard normal distribution is an eigenfunction of the Fourier transform.

(See also exponential function and π.)

If a random variable X has this distribution, we write X ~ N(\mu, \sigma^2). If \mu = 0 and \sigma = 1, the distribution is called the standard normal distribution and the probability density function reduces to

f(x) = \frac{1}{\sqrt{2\pi}} \, \exp\left(-\frac{x^2}{2} \right).

The image to the right gives the graph of the probability density function of the normal distribution for various parameter values.

Some notable qualities of the normal distribution:

  • The density function is symmetric about its mean value.
  • The mean is also its mode and median.
  • 68.26894921371% of the area under the curve is within one standard deviation of the mean.
  • 95.44997361036% of the area is within two standard deviations.
  • 99.73002039367% of the area is within three standard deviations.
  • 99.99366575163% of the area is within four standard deviations.
  • 99.99994266969% of the area is within five standard deviations.
  • 99.99999980268% of the area is within six standard deviations.
  • 99.99999999974% of the area is within seven standard deviations.

The inflection points of the curve occur at one standard deviation away from the mean.

Cumulative distribution function

The cumulative distribution function (cdf) is defined as the probability that a variable X has a value less than or equal to x, and it is expressed in terms of the density function as

F(x;\mu,\sigma) = \frac{1}{\sigma\sqrt{2\pi}} \int_{-\infty}^x \exp \left( -\frac{(u - \mu)^2}{2\sigma^2} \ \right)\, du.

The standard normal cdf, conventionally denoted \Phi, is just the general cdf evaluated with \mu=0 and \sigma=1,

\Phi(x) =F(x;0,1)= \frac{1}{\sqrt{2\pi}} \int_{-\infty}^x \exp\left(-\frac{u^2}{2}\right) \, du.

The standard normal cdf can be expressed in terms of a special function called the error function, as

\Phi(z) = \frac{1}{2} \left1 + \operatorname{erf} \left( \frac{z}{\sqrt{2}} \right) \right .

The inverse cumulative distribution function, or quantile function, can be expressed in terms of the inverse error function:

\Phi^{-1}(p) = \sqrt2 \; \operatorname{erf}^{-1} \left(2p - 1 \right) .

This quantile function is sometimes called the probit function. There is no elementary primitive for the probit function. This is not to say merely that none is known, but rather that the non-existence of such a function has been proved.

Values of Φ(x) may be approximated very accurately by a variety of methods, such as numerical integration, Taylor series, or asymptotic series.

Generating functions

Moment generating function

The moment generating function is defined as the expected value of \exp(tX). For a normal distribution, it can be shown that the moment generating function is

{| M_X(t)\, =
\mathrm{E} \left[ \exp(tX) \right]   = \int_{-\infty}^{\infty} \frac {1} {\sigma \sqrt{2\pi} } \exp \left( -\frac{(x - \mu)^2}{2 \sigma^2} \right) \exp (tx) \, dx   = \exp \left( \mu t + \frac{\sigma^2 t^2}{2} \right) as can be seen by completing the square in the exponent.

Cumulant generating function
The cumulant generating function is the logarithm of the moment generating function: g(t) = μt + 2−1σ2t2. The derivative of the cumulant generating function is simply: g' (t) = μ + σ2t

Characteristic function

The characteristic function is defined as the expected value of \exp (i t X), where i is the imaginary unit. For a normal distribution, the characteristic function is

{| \phi_X(t;\mu,\sigma)\! =
\mathrm{E} \left[ \exp(i t X) \right]   = \int_{-\infty}^{\infty} \frac{1}{\sigma \sqrt{2\pi}} \exp \left(- \frac{(x - \mu)^2}{2\sigma^2} \right) \exp(i t x) \, dx   = \exp \left( i \mu t - \frac{\sigma^2 t^2}{2} \right) . The characteristic function is obtained by replacing t with i t in the moment-generating function.

Properties


Some of the properties of the normal distribution:

  1. If X \sim N(\mu, \sigma^2) and a and b are real numbers, then a X + b \sim N(a \mu + b, (a \sigma)^2) (see expected value and variance).
  2. If X \sim N(\mu_X, \sigma^2_X) and Y \sim N(\mu_Y, \sigma^2_Y) are independent normal random variables, then:
    • Their sum is normally distributed with U = X + Y \sim N(\mu_X + \mu_Y, \sigma^2_X + \sigma^2_Y) (proof).
    • Their difference is normally distributed with V = X - Y \sim N(\mu_X - \mu_Y, \sigma^2_X + \sigma^2_Y).
    • Both U and V are independent of each other.
  3. If X \sim N(0, \sigma^2_X) and Y \sim N(0, \sigma^2_Y) are independent normal random variables, then:
    • Their product X Y follows a distribution with density p given by
      p(z) = \frac{1}{\pi\,\sigma_X\,\sigma_Y} \; K_0\left(\frac{|z|}{\sigma_X\,\sigma_Y}\right), where K_0 is a modified Bessel function of the second kind.
    • Their ratio follows a Cauchy distribution with X/Y \sim \mathrm{Cauchy}(0, \sigma_X/\sigma_Y).
  4. If X_1, \cdots, X_n are independent standard normal variables, then X_1^2 + \cdots + X_n^2 has a chi-square distribution with n degrees of freedom.

Standardizing normal random variables

As a consequence of Property 1, it is possible to relate all normal random variables to the standard normal.

If X ~ N(\mu, \sigma^2), then

Z = \frac{X - \mu}{\sigma} \!

is a standard normal random variable: Z ~ N(0,1). An important consequence is that the cdf of a general normal distribution is therefore

\Pr(X \le x)
= \Phi \left( \frac{x-\mu}{\sigma} \right) = \frac{1}{2} \left( 1 + \operatorname{erf} \left( \frac{x-\mu}{\sigma\sqrt{2}} \right) \right) .

Conversely, if Z ~ N(0,1), then

X = \sigma Z + \mu

is a normal random variable with mean \mu and variance \sigma^2.

The standard normal distribution has been tabulated, and the other normal distributions are simple transformations of the standard one. Therefore, one can use tabulated values of the cdf of the standard normal distribution to find values of the cdf of a general normal distribution.

Moments

Some of the first few moments of the normal distribution are:
Number Raw moment Central moment Cumulant
0 1 0
1 \mu 0 \mu
2 \mu^2 + \sigma^2 \sigma^2 \sigma^2
3 \mu^3 + 3\mu\sigma^2 0 0
4 \mu^4 + 6 \mu^2 \sigma^2 + 3 \sigma^4 3 \sigma^4 0

All of cumulants of the normal distribution beyond the second cumulant are zero.

Generating normal random variables

For computer simulations, it is often useful to generate values that have a normal distribution. There are several methods and the most basic is to invert the standard normal cdf. More efficient methods are also known, one such method being the Box-Muller transform.

The Box-Muller transform takes two uniformly distributed values as input and maps them to two normally distributed values. This requires generating values from a uniform distribution, for which many methods are known. See also random number generators.

The Box-Muller transform is a consequence of the fact that the chi-square distribution with two degrees of freedom (see property 4 above) is an easily-generated exponential random variable.

The central limit theorem

The normal distribution has the very important property that under certain conditions, the distribution of a sum of a large number of independent variables is approximately normal. This is the central limit theorem.

The practical importance of the central limit theorem is that the normal distribution can be used as an approximation to some other distributions.

  • A binomial distribution with parameters n and p is approximately normal for large n and p not too close to 1 or 0 (some books recommend using this approximation only if n p and n(1 - p) are both at least 5; in this case, a continuity correction should be applied).
The approximating normal distribution has mean \mu = n p and variance \sigma^2 = n p (1 - p).

The approximating normal distribution has mean \mu = \lambda and variance \sigma^2 = \lambda.

Whether these approximations are sufficiently accurate depends on the purpose for which they are needed, and the rate of convergence to the normal distribution. It is typically the case that such approximations are less accurate in the tails of the distribution.

Infinite divisibility

The normal distributions are infinitely divisible probability distributions.

Stability

The normal distributions are strictly stable probability distributions.

Standard deviation

In practice, one often assumes that data are from an approximately normally distributed population. If that assumption is justified, then about 68% of the values are at within 1 standard deviation away from the mean, about 95% of the values are within two standard deviations and about 99.7% lie within 3 standard deviations. This is known as the "68-95-99.7 rule" or the "Empirical Rule".

Normality tests


Normality tests check a given set of data for similarity to the normal distribution. The null hypothesis is that the data set is similar to the normal distribution, therefore a sufficiently small P-value indicates non-normal data.

Related distributions


  • R \sim \mathrm{Rayleigh}(\sigma^2) is a Rayleigh distribution if R = \sqrt{X^2 + Y^2} where X \sim N(0, \sigma^2) and Y \sim N(0, \sigma^2) are two independent normal distributions.
  • Y \sim \chi_{\nu}^2 is a chi-square distribution with \nu degrees of freedom if Y = \sum_{k=1}^{\nu} X_k^2 where X_k \sim N(0,1) for k=0,1,\cdots,\nu and are independent
  • Y \sim \mathrm{Cauchy}(\mu = 0, \theta = 1) is a Cauchy distribution if Y = X_1/X_2 for X_1 \sim N(0,1) and X_2 \sim N(0,1) are two independent normal distributions.
  • Y \sim \mbox{Log-N}(\mu, \sigma^2) is a log-normal distribution if Y = \exp(X) and X \sim N(\mu, \sigma^2).
  • Relation to Lévy skew alpha-stable distribution: if X\sim \textrm{Levy-S}\alpha\textrm{S}(2,\beta,\sigma/\sqrt{2},\mu) then X \sim N(\mu,\sigma^2).
  • Truncated normal distribution. If, X \sim N(\mu, \sigma^2) then, truncating below at A and above at B will lead to a random variable with mean E(X)=\mu + \frac{\sigma(\phi_1-\phi_2)}{T}, where T=\Phi\left(\frac{B-\mu}{\sigma}\right)-\Phi\left(\frac{A-\mu}{\sigma}\right) and \phi_1=f\left(\frac{A-\mu}{\sigma}\right) and \phi_2=f\left(\frac{B-\mu}{\sigma}\right), where f(\cdot) is the probability density function of a standard normal random variable.

Estimation of parameters


Maximum likelihood estimation of parameters

Suppose

X_1,\dots,X_n

are independent and each is normally distributed with expectation μ and variance σ2. In the language of statisticians, the observed values of these random variables make up a "sample from a normally distributed population." It is desired to estimate the "population mean" μ and the "population standard deviation" σ, based on observed values of this sample. The joint probability density function of these random variables is

f(x_1,\dots,x_n;\mu,\sigma) \propto \sigma^{-n} \prod_{i=1}^n \exp\left({-1 \over 2} \left({x_i-\mu \over \sigma}\right)^2\right).

(Nota bene: Here the proportionality symbol \propto means proportional as a function of \mu and \sigma, not proportional as a function of x_1,\dots,x_n. That may be considered one of the differences between the statistician's point of view and the probabilist's point of view. The reason this is important will appear below.)

As a function of μ and σ this is the likelihood function

L(\mu,\sigma) \propto \sigma^{-n} \exp\left({-\sum_{i=1}^n (x_i-\mu)^2 \over 2\sigma^2}\right).

In the method of maximum likelihood, the values of μ and σ that maximize the likelihood function are taken to be estimates of the population parameters μ and σ.

Usually in maximizing a function of two variables one might consider partial derivatives. But here we will exploit the fact that the value of μ that maximizes the likelihood function with σ fixed does not depend on σ. Therefore, we can find that value of μ, then substitute it from μ in the likelihood function, and finally find the value of σ that maximizes the resulting expression.

It is evident that the likelihood function is a decreasing function of the sum

\sum_{i=1}^n (x_i-\mu)^2. \,\!

So we want the value of μ that minimizes this sum. Let

\overline{x}=(x_1+\cdots+x_n)/n

be the "sample mean". Observe that

\sum_{i=1}^n (x_i-\mu)^2=\sum_{i=1}^n((x_i-\overline{x})+(\overline{x}-\mu))^2

=\sum_{i=1}^n(x_i-\overline{x})^2 + 2\sum_{i=1}^n (x_i-\overline{x})(\overline{x}-\mu) + \sum_{i=1}^n (\overline{x}-\mu)^2

=\sum_{i=1}^n(x_i-\overline{x})^2 + 0 + n(\overline{x}-\mu)^2.

Only the last term depends on μ and it is minimized by

\widehat{\mu}=\overline{x}.

That is the maximum-likelihood estimate of μ. When we substitute that estimate for μ in the likelihood function, we get

L(\overline{x},\sigma) \propto \sigma^{-n} \exp\left({-\sum_{i=1}^n (x_i-\overline{x})^2 \over 2\sigma^2}\right).

It is conventional to denote the "loglikelihood function", i.e., the logarithm of the likelihood function, by a lower-case \ell, and we have

\ell(\widehat{\mu},\sigma)=*-n\log(\sigma)-{\sum_{i=1}^n(x_i-\overline{x})^2 \over 2\sigma^2}

and then

{\partial \over \partial\sigma}\ell(\widehat{\mu},\sigma)
={-n \over \sigma} +{\sum_{i=1}^n (x_i-\overline{x})^2 \over \sigma^3} ={-n \over \sigma^3}\left(\sigma^2-{1 \over n}\sum_{i=1}^n (x_i-\overline{x})^2 \right).

This derivative is positive, zero, or negative according as σ2 is between 0 and

{1 \over n}\sum_{i=1}^n(x_i-\overline{x})^2,

or equal to that quantity, or greater than that quantity.

Consequently this average of squares of residuals is maximum-likelihood estimate of σ2, and its square root is the maximum-likelihood estimate of σ. This estimator is biased, but has a smaller mean squared error than the usual unbiased estimator, which is n/(n − 1) times this estimator.

Surprising generalization

The derivation of the maximum-likelihood estimator of the covariance matrix of a multivariate normal distribution is subtle. It involves the spectral theorem and the reason it can be better to view a scalar as the trace of a 1×1 matrix than as a mere scalar. See estimation of covariance matrices.

Unbiased estimation of parameters

The maximum likelihood estimator of the population mean \mu from a sample is an unbiased estimator of the mean, as is the variance when the mean of the population is known a priori. However, if we are faced with a sample and have no knowledge of the mean or the variance of the population from which it is drawn, the unbiased estimator of the variance \sigma^2 is:

S^2 = \frac{1}{n-1} \sum_{i=1}^n (X_i - \overline{X})^2.

This "sample variance" follows a Gamma distribution if all X are independent identically distributed (iid):

S^2 \sim \operatorname{Gamma}\left(\frac{n-1}{2},\frac{2 \sigma^2}{n-1}\right).

Occurrence


Approximately normal distributions occur in many situations, as a result of the central limit theorem. When there is reason to suspect the presence of a large number of small effects acting additively and independently, it is reasonable to assume that observations will be normal. There are statistical methods to empirically test that assumption, for example the Kolmogorov-Smirnov test.

Effects can also act as multiplicative (rather than additive) modifications. In that case, the assumption of normality is not justified, and it is the logarithm of the variable of interest that is normally distributed. The distribution of the directly observed variable is then called log-normal.

Finally, if there is a single external influence which has a large effect on the variable under consideration, the assumption of normality is not justified either. This is true even if, when the external variable is held constant, the resulting marginal distributions are indeed normal. The full distribution will be a superposition of normal variables, which is not in general normal. This is related to the theory of errors (see below).

To summarize, here is a list of situations where approximate normality is sometimes assumed. For a fuller discussion, see below.

  • In counting problems (so the central limit theorem includes a discrete-to-continuum approximation) where reproductive random variables are involved, such as
  • In physiological measurements of biological specimens:
    • The logarithm of measures of size of living tissue (length, height, skin area, weight);
    • The length of inert appendages (hair, claws, nails, teeth) of biological specimens, in the direction of growth; presumably the thickness of tree bark also falls under this category;
    • Other physiological measures may be normally distributed, but there is no reason to expect that a priori;
  • Measurement errors are often assumed to be normally distributed, and any deviation from normality is considered something which should be explained;
  • Financial variables
    • Changes in the logarithm of exchange rates, price indices, and stock market indices; these variables behave like compound interest, not like simple interest, and so are multiplicative;

    • Other financial variables may be normally distributed, but there is no reason to expect that a priori;
  • Light intensity
    • The intensity of laser light is normally distributed;
    • Thermal light has a Bose-Einstein distribution on very short time scales, and a normal distribution on longer timescales due to the central limit theorem.

Of relevance to biology and economics is the fact that complex systems tend to display power laws rather than normality.

Photon counting

Light intensity from a single source varies with time, as thermal fluctuations can be observed if the light is analyzed at sufficiently high time resolution. The intensity is usually assumed to be normally distributed.

Quantum mechanics interprets measurements of light intensity as photon counting. The natural assumption in this setting is the Poisson distribution. When light intensity is integrated over times longer than the coherence time and is large, the Poisson-to-normal limit is appropriate.

Measurement errors

Normality is the central assumption of the mathematical theory of errors. Similarly, in statistical model-fitting, an indicator of goodness of fit is that the residuals (as the errors are called in that setting) be independent and normally distributed. The assumption is that any deviation from normality needs to be explained. In that sense, both in model-fitting and in the theory of errors, normality is the only observation that need not be explained, being expected. However, if the original data are not normally distributed (for instance if they follow a Cauchy distribution), then the residuals will also not be normally distributed. This fact is usually ignored in practice.

Repeated measurements of the same quantity are expected to yield results which are clustered around a particular value. If all major sources of errors have been taken into account, it is assumed that the remaining error must be the result of a large number of very small additive effects, and hence normal. Deviations from normality are interpreted as indications of systematic errors which have not been taken into account. Whether this assumption is valid is debatable.

Physical characteristics of biological specimens

The sizes of full-grown animals is approximately lognormal. The evidence and an explanation based on models of growth was first published in the 1932 book Problems of Relative Growth by Julian Huxley.

However, in the case of human height for example, there are people several standard deviations away from the average who would almost certainly not exist at all among the whole population of the world if height followed a true lognormal distribution.

Differences in size due to sexual dimorphism, or other polymorphisms like the worker/soldier/queen division in social insects, further make the distribution of sizes deviate from lognormality.

The assumption that linear size of biological specimens is normal (rather than lognormal) leads to a non-normal distribution of weight (since weight or volume is roughly proportional to the 2nd or 3rd power of length, and Gaussian distributions are only preserved by linear transformations), and conversely assuming that weight is normal leads to non-normal lengths. This is a problem, because there is no a priori reason why one of length, or body mass, and not the other, should be normally distributed. Lognormal distributions, on the other hand, are preserved by powers so the "problem" goes away if lognormality is assumed.

On the other hand, there are some biological measures where normality is assumed, such as blood pressure of adult humans. This is supposed to be normally distributed, but only after separating males and females into different populations (each of which is normally distributed).

Financial variables

Because of the exponential nature of inflation, financial indicators such as stock values, or commodity prices make good examples of multiplicative behavior. As such, periodic changes in them (for example, yearly changes) should not be expected to be normal, but perhaps lognormal. This was the theory proposed in 1900 by Louis Bachelier. However, Benoît Mandelbrot, the popularizer of fractals, showed that even the assumption of lognormality is flawed--the changes in logarithm over short periods (such as a day) are approximated well by distributions that do not have a finite variance, and therefore the central limit theorem does not apply. Rather, the sum of many such changes gives log-Levy distributions.

Distribution in testing and intelligence

A great deal of confusion exists over whether or not IQ test scores and intelligence are normally distributed.

As a deliberate result of test construction, IQ scores are normally distributed for the majority of the population. But intelligence cannot be said to be normally distributed, simply because it is not a number.

The difficulty and number of questions on an IQ test is decided based on which combinations will yield a normal distribution. This does not mean, however, that the information is in any way being misrepresented, or that there is any kind of "true" distribution that is being artificially forced into the shape of a normal curve. Intelligence tests can be constructed to yield any kind of score distribution desired.

The Bell Curve is a controversial book on the topic of the heritability of intelligence. However, despite its title, the book does not primarily address whether IQ is normally distributed.

See also


References


External links


Continuous distributions

Normalna distribucija | Normalfordeling | Normalverteilung | Distribución normal | Normala distribuo | توزیع نرمال | Loi normale | Distribución normal | 정규 분포 | Distribusi normal | Normaldreifing | Variabile casuale normale | התפלגות נורמלית | Normālsadalījums | Normalusis skirstinys | Normale verdeling | 正規分布 | Rozkład normalny | Distribuição normal | Нормальное распределение | Sebaran normal | Normaalijakauma | Normalfördelning | Phân bố chuẩn | Нормальний розподіл | 正态分布

 

This article is licensed under the GNU Free Documentation License. It uses material from the "Normal distribution".

Home Pageartsbusinesscomputersgameshealthhospitalshomekids & teensnewsphysiciansrecreationreferenceregionalscienceshoppingsocietysportsworld