article

Resampling is a term used in statistics to describe a variety of methods for computing summary statistics using subsets of available data (jackknife), drawing randomly with replacement from a set of data points (bootstrapping), or switching labels on data points when performing significance tests (permutation test, also called exact test, randomization test, or re-randomization test).

Bootstrap


Bootstrapping is a statistical method for estimating the sampling distribution of an estimator by sampling with replacement from the original sample, most often with the purpose of deriving robust estimates of standard errors and confidence intervals of a population parameter like a mean, median, proportion, odds ratio, correlation coefficient or regression coefficient. It may also be used for constructing hypothesis tests. It is often used as a robust alternative to inference based on parametric assumptions when those assumptions are in doubt, or where parametric inference is impossible or requires very complicated formulas for the calculation of standard errors.

*

See also particle filter for the general theory of Sequential Monte Carlo methods, as well as details on some common implementations.

Jackknife


Like bootstrapping, the jackknife is a statistical method for estimating and compensating for bias and for deriving robust estimates of standard errors and confidence intervals. Jackknifed statistics are created by systematically dropping out subsets of data one at a time and assessing the resulting variation in the studied parameter. (Mooney & Duval).

Both methods estimate the variability of a statistic from the variability of that statistics between subsamples, rather than from parametric assumptions. The jackknife is a less general technique than the bootstrap, and explores the sample variation differently. However the jackknife is easier to apply to complex sampling schemes, such as multi-stage sampling with varying sampling weights, than the bootstrap.

The jackknife and bootstrap may in many situations yield similar results. But when used to estimate the standard error of a statistic, bootstrap gives slightly different results when repeated on the same data, whereas the jackknife gives exactly the same result each time (assuming the subsets to be removed are the same).

Richard von Mises was the first to conceive and apply the jackknife, which has some similarity to k-fold and leave-one-out cross-validation techniques.

Permutation tests


Statistical tests use observed data to calculate a test statistic, which (in well-constructed tests) assesses a hypothesis of interest. The value of the test statistic is compared to a reference distribution, the distribution of the test statistic assuming the null hypothesis is true. The p-value is the proportion of the distribution that is at least as extreme than the observed statistic. If the p-value is too small then the null hypothesis is rejected and an alternative hypothesis is rendered more plausible. Contrary to intuition, the alternative is not said to be accepted when the null is rejected, except in trivial examples.

A permutation test (also called a randomization test, re-randomization test, or an exact test) is a type of statistical significance test in which a reference distribution is obtained by calculating all possible test statistics. This is done by permuting the observed data points across all possible outcomes, given a set of conditions consistent with the null hypothesis. The theory has evolved from the works of R.A. Fisher and E.J.G. Pitman in the 1930s.

Relation to parametric tests

Permutation tests form a branch of non-parametric statistics. In contrast to permutation tests, the reference distributions for many popular "classical" statistical tests, such as the t-test, z-test and chi-squared test, are obtained from theoretical probability distributions. Many researchers believe this invalidates or at least critically weakens their use because the assumptions relating the theoretical distributions to the empirically obtained test statistics may not be valid. The extent to which this is true in various real-world settings is an area of active investigation. Researchers may be forced to make these assumptions in some situations because there is no other alternative, and a non-optimal statistical test is usually considered better than none at all. On the other hand, for the most commonly used tests, including those mentioned above, parametric assumptions have been shown to be relatively unimportant.

Fisher's exact test is a commonly used permutation test for evaluating the association between two dichotomous variables and contrasts with Pearson's chi-square test which can be used for the same purpose. When sample sizes are small, the chi-squared test statistic can no longer be accurately compared against the chi-square reference distribution and the use of Fisher’s exact test becomes most appropriate. A commonly used rule of thumb is that the expected count in each cell of the table should be greater than 5 before Pearson's chi-squared test is used.

All simple and many relatively complex parametric tests have a corresponding permutation test version that is defined by using the same test statistic as the parametric test, but obtains the p-value from the sample-specific permutation distribution of that statistic, rather than from the theoretical distribution derived from the parametric assumption. For example, it is possible in this manner to construct a permutation t-test, a permutation chi-squared test of association, a permutation two-sample Kolmogorov-Smirnov test and so on.

Examples

The most commonly used non-parametric tests are in their original form defined as permutation tests on ranks; these include for example the Mann-Whitney U test and Spearman’s rank correlation test.

Pitman’s original formulation (in 1937) of the general permutation test of association between two variables describes a general test procedure that when applied to two numeric variables in linear scales gives a permutation test of Pearson's correlation coefficient, when applied to ranked data points gives Spearman's rank correlation test, when applied to one numeric and one binary variable gives a permutation t-test, when applied to one ranked and one binary variable gives Mann-Whitney’s U-test (also known as the Wilcoxon rank sum test) and when applied to two binary variables gives Fisher's exact test.

Advantages

Wald tests are likelihood-based parametric tests that define the test statistic as a ratio t/s, where t is the deviation of an observed parameter from its expected value if the null hypothesis were true, and s is an estimate of the standard error of t. A permutation test need not in general take into account the value of s, as it is a constant for all permutations of a sample. This is an advantage, as finding the standard error (or variance) is often the trickiest part of developing new significance tests, requiring deep mathematical knowledge. So the construction of a permutation test rather than a parametric test to solve a certain problem may be regarded as a way of replacing mathematical skill with raw computing power.

In general, the greatest advantage of permutation tests is that the results are reliable also for small samples and when data strongly violates the distributional assumptions of the corresponding parametric test. For larger sample sizes, the central limit theorem usually assures that the parametric test results are very similar to the related permutation test's results, so it may be concluded that even when the parametric assumptions are violated, parametric tests are often good approximations to the corresponding "exact" permutation test, provided the sample is large enough.

Before the 1980s, the burden of creating the reference distribution was overwhelming except for data sets with small sample sizes. But since the 1980s, the confluence of cheap fast computers and the development of new sophisticated path algorithms applicable in special situations, made the application of permutation test methods practical for a wide range of problems, and initiated the addition of exact-test options in the main statistical software packages and the appearance of specialized software for performing a wide range of uni- and multi-variable exact tests and computing test-based "exact" confidence intervals.

Limitations

There are two important assumptions behind a permutation test - that the observations are independent and that they are exchangeable under the null hypothesis. An important consequence of the exchangeability assumption is that tests of difference in location (like a permutation t-test) require equal variance. In this respect, the permutation t-test shares the same weakness as the classical Student’s t-test, and in fact the permutation approach to the test is more limited than the parametric approach in that the parametric approach has been generalized to allow different variances. A third alternative in this situation is to use a bootstrap-based test, Good(2000) explains the difference between permutation tests and bootstrap tests the following way: "Permutations test hypotheses concerning distributions; bootstraps tests hypotheses concerning parameters. As a result, the boostrap entails less-stringent assumptions."
Another weakness of permutation tests is that they return a p-value as the only result, and cannot generate a confidence interval for the parameter of interest. However, there are methods for calculating "exact" confidence intervals from the inverse of a permutation test.

Monte Carlo testing

An asymptotically equivalent permutation test can be created when there are too many possible orderings of the data to conveniently allow complete enumeration. This is done by generating the reference distribution by Monte Carlo sampling, which takes a small (relative to the total number of permutations) random sample of the possible replicates.
The realization that this could be applied to any permutation test on any dataset was an important breakthrough in the area of applied statistics. The earliest known reference to this approach is Dwass (1957).
The necessary size of the Monte Carlo sample depends on the need for accuracy of the test, and for scientific purposes it usually is at least 10,000. (For observed p=0.05, the accuracy from 10,000 random permutations is around 0.005, and for observed p=0.10, the accuracy is around 0.008. Accuracy is defined from the binomial 99% confidence interval: p +/- accuracy).
For continuous data the number of permutations = N!, for N=10 the number of permutations is 3628800, for N=20 it is 2.4E18 and for N=50 it is 3.0E64. For discrete data the number of unique levels of the test statistic is less than N!, but it will still in practice often be impossible to derive the exact permutation distribution even for samples of moderate size.

See also


References


Statistical bootstrapping

  • Efron, B. (1979). Bootstrap methods: Another look at the jackknife. The Annals of Statistics, 7, 1-26.
  • Efron, B. (1981). Nonparametric estimates of standard error: The jackknife, the bootstrap and other methods. Biometrika, 68, 589-599.
  • Efron, B. (1982). The jackknife, the bootstrap, and other resampling plans. Society of Industrial and Applied Mathematics CBMS-NSF Monographs, 38.
  • Diaconis, P. & Efron, B. (1983). Computer-intensive methods in statistics. Scientific American, May, 116-130.
  • Efron, B., & Tibshirani, R. J. (1993). An introduction to the bootstrap. New York: Chapman & Hall, software.
  • Mooney, C Z & Duval, R D (1993). Bootstrapping. A Nonparametric Approach to Statistical Inference. Sage University Paper series on Quantitative Applications in the Social Sciences, 07-095. Newbury Park, CA: Sage
  • Edgington, E. S.(1995). Randomization tests. New York: M. Dekker.
  • Davison, A. C. and Hinkley, D. V. (1997): Bootstrap Methods and their Applications, software.
  • Simon, J. L. (1997): Resampling: The New Statistics.
  • Moore, D. S., G. McCabe, W. Duckworth, and S. Sclove (2003): Bootstrap Methods and Permutation Tests
  • Hesterberg, T. C., D. S. Moore, S. Monaghan, A. Clipson, and R. Epstein (2005): Bootstrap Methods and Permutation Tests, software.

Permutation test

Original references:
  • Fisher, R.A., The Design of Experiments. New York: Hafner; 1935
  • Pitman, E. J. G., Significance tests which may be applied to samples from any population. Royal Statistical Society Supplement, 1937; 4: 119-130 and 225-32 (parts I and II).
  • Pitman, E. J. G., Significance tests which may be applied to samples from any population. Part III. The analysis of variance test. Biometrika, 1938; 29: 322-35.
Modern references:
  • Edgington, E. S. Randomization tests, 3rd ed. New York: Marcel-Dekker. 1995.
  • Good, Phillip I. Permutation Tests 2nd ed. Springer. 2000. ISBN 038798898X
  • Lunneborg, Cliff. Data Analysis by Resampling. Duxbury Press. 1999. ISBN 0534221106
  • Welch, W. J., Construction of permutation tests, Journal of American Statistical Association, 85, 693-698, 1990
Computational Methods:
  • Mehta, C. R. and Patel, N. R. (1983). A network algorithm for performing Fisher’s exact test in r x c contingency tables, J. Amer. Statist. Assoc. 78(382), 427–434.
  • Metha, C. R., Patel, N. R. and Senchaudhuri, P. (1988). Importance sampling for estimating exact probabilities in permutational inference, J. Am. Statist. Assoc. 83(404), 999–1005.

External links


Statistics | Statistical tests

 

This article is licensed under the GNU Free Documentation License. It uses material from the "Resampling (statistics)".

Home Pageartsbusinesscomputersgameshealthhospitalshomekids & teensnewsphysiciansrecreationreferenceregionalscienceshoppingsocietysportsworld