In statistics, one often considers a family of probability distributions for a random variable X (and X is often a vector whose components are scalar-valued random variables, frequently independent) parameterized by a scalar- or vector-valued parameter, which let us call θ. A quantity T(X) that depends on the (observable) random variable X but not on the (unobservable) parameter θ is called a statistic. Sir Ronald Fisher tried to make precise the intuitive idea that a statistic may capture all of the information in X that is relevant to the estimation of θ. A statistic that does that is called a sufficient statistic for θ.
The precise definition is this:
i.e.
so
If the probability density function (in the discrete case, the probability mass function) of X is f(x ;θ), then T is sufficient for θ if and only if functions g and h can be found such that
i.e. the density f can be factorised into a product such that one factor, h, does not depend on θ and the other factor, which does depend on θ, depends on x only through T(x). This equivalent test is called Fisher's factorization criterion.
The way to think about this is to consider varying x in such a way as to maintain a constant value of T(X) and ask whether such a variation has any effect on inferences one might make about θ. If the factorization criterion above holds, the answer is "none" because the dependence of the likelihood function f on θ is unchanged.
Since the conditional distribution of X given T(X) does not depend on θ, neither does the conditional expected value of g(X) given T(X), where g is any function well-behaved enough for the conditional expectation to exist. Consequently that conditional expected value is actually a statistic, and so is available for use in estimation. If g(X) is any kind of estimator of θ, then typically the conditional expectation of g(X) given T(X) is a better estimator of θ ; one way of making that statement precise is called the Rao-Blackwell theorem. Sometimes one can very easily construct a very crude estimator g(X), and then evaluate that conditional expected value to get an estimator that is in various senses optimal.
This article is licensed under the GNU Free Documentation License.
It uses material from the
"Sufficiency (statistics)".
Home Page • arts • business • computers • games • health • hospitals • home • kids & teens • news • physicians • recreation• reference • regional • science • shopping • society • sports • world