article

In statistics, one often considers a family of probability distributions for a random variable X (and X is often a vector whose components are scalar-valued random variables, frequently independent) parameterized by a scalar- or vector-valued parameter, which let us call θ. A quantity T(X) that depends on the (observable) random variable X but not on the (unobservable) parameter θ is called a statistic. Sir Ronald Fisher tried to make precise the intuitive idea that a statistic may capture all of the information in X that is relevant to the estimation of θ. A statistic that does that is called a sufficient statistic for θ.

Mathematical definition


The precise definition is this:

A statistic T(X) is sufficient for θ precisely if the conditional probability distribution of the data X given the statistic T(X) does not depend on the parameter θ,

i.e.

\Pr(X=x|T(X)=t,\theta) = \Pr(X=x|T(X)=t), \,
or in shorthand
\Pr(x|t,\theta) = \Pr(x|t), \,

so

\begin{matrix} \Pr(x|\theta) = \Pr(x,t|\theta) & = & \Pr(x|t,\theta) . \Pr(t|\theta) \\ \\ & = & \Pr(x|t) . \Pr(t|\theta) \end{matrix}

If the probability density function (in the discrete case, the probability mass function) of X is f(x ;θ), then T is sufficient for θ if and only if functions g and h can be found such that

f(x;\theta)=h(x) \, g(T(x),\theta),

i.e. the density f can be factorised into a product such that one factor, h, does not depend on θ and the other factor, which does depend on θ, depends on x only through T(x). This equivalent test is called Fisher's factorization criterion.

The way to think about this is to consider varying x in such a way as to maintain a constant value of T(X) and ask whether such a variation has any effect on inferences one might make about θ. If the factorization criterion above holds, the answer is "none" because the dependence of the likelihood function f on θ is unchanged.

Examples


  • If X1, ...., Xn are independent Bernoulli-distributed random variables with expected value p, then the sum T(X) = X1 + ... + Xn is a sufficient statistic for p (here 'success' corresponds to X_i=1 and 'failure' to X_i=0; so T is the total number of successes)

This is seen by considering the joint probability distribution:

\Pr(X=x)=P(X_1=x_1,X_2=x_2,\ldots,X_n=x_n).

Because the observations are independent, this can be written as

p^{x_1}(1-p)^{1-x_1} p^{x_2}(1-p)^{1-x_2}\cdots p^{x_n}(1-p)^{1-x_n}

and, collecting powers of p and 1 − p gives

p^{\sum x_i}(1-p)^{n-\sum x_i}=p^{T(x)}(1-p)^{n-T(x)}

which satisfies the factorization criterion, with h(x) being just the identity function. Note the crucial feature: the unknown parameter (here p) interacts with the observation x only via the statistic T(x) (here the sum Σ xi).

  • If X1, ...., Xn are independent and uniformly distributed on the interval *, then max(X1, ...., Xn ) is sufficient for θ.

To see this, consider the joint probability distribution:

\Pr(X=x)=P(X_1=x_1,X_2=x_2,\ldots,X_n=x_n).

Because the observations are independent, this can be written as

\frac{H(\theta-x_1)}{\theta}\cdot \frac{H(\theta-x_2)}{\theta}\cdot\cdots\cdot \frac{H(\theta-x_n)}{\theta}

where H(x) is the Heaviside step function. This may be written as

\frac{H\left(\theta-\max_i \{\,x_i\,\}\right)}{\theta^n}

which shows that the factorization criterion is satisfied, again where h(x) is the identity function.

  • If X1, ...., Xn are independent and have a Poisson distribution with parameter λ, then the sum T(X) = X1 + ... + Xn is a sufficient statistic is sufficient for λ.

To see this, consider the joint probability distribution:

\Pr(X=x)=P(X_1=x_1,X_2=x_2,\ldots,X_n=x_n).

Because the observations are independent, this can be written as

{e^{-\lambda} \lambda^{x_1} \over x_1 !} \cdot {e^{-\lambda} \lambda^{x_2} \over x_2 !} \cdot\cdots\cdot {e^{-\lambda} \lambda^{x_n} \over x_n !}

which may be written as

e^{-n\lambda} \lambda^{(x_1+x_2+\cdots+x_n)} \cdot {1 \over x_1 ! x_2 !\cdots x_n ! }

which shows that the factorization criterion is satisfied, where h(x) is the reciprocal of the product of the factorials.

The Rao-Blackwell theorem


Since the conditional distribution of X given T(X) does not depend on θ, neither does the conditional expected value of g(X) given T(X), where g is any function well-behaved enough for the conditional expectation to exist. Consequently that conditional expected value is actually a statistic, and so is available for use in estimation. If g(X) is any kind of estimator of θ, then typically the conditional expectation of g(X) given T(X) is a better estimator of θ ; one way of making that statement precise is called the Rao-Blackwell theorem. Sometimes one can very easily construct a very crude estimator g(X), and then evaluate that conditional expected value to get an estimator that is in various senses optimal.

statistics estimation theory

Suffizienz | Sufficienza (statistica)

 

This article is licensed under the GNU Free Documentation License. It uses material from the "Sufficiency (statistics)".

Home Pageartsbusinesscomputersgameshealthhospitalshomekids & teensnewsphysiciansrecreationreferenceregionalscienceshoppingsocietysportsworld