In statistics, a result is significant if it is unlikely to have occurred by chance, given that in reality, the independent variable (the test condition being examined) has no effect, or, formally stated, that a presumed null hypothesis is true.
Technically, in traditional frequentist statistical hypothesis testing, the significance level of a test is the maximum probability that the observed statistic would be observed under the null hypothesis that is considered consistent with chance variation, and therefore with the truth of null hypothesis. Hence, if the null hypothesis is true, the significance level is the probability that it will be rejected in error (a decision known as a Type I error). The significance of a result is also called its p-value; the smaller the p-value, the more significant the result is said to be.
Popular levels of significance are 10%, 5%, and 1% , all represented by the Greek symbol, α (alpha).
For example, performing a test of significance, assuming the significance level is 5%, and the p-value is lower than 5% then the null hypothesis would be rejected. Informally, the test statistic is said to be "statistically significant".
If the significance level is smaller, a value will be less likely to be more extreme than the critical value. So a result which is "significant at the 1% level" is more significant than a result which is "significant at the 5% level". However a test at the 1% level is more likely to fail to reject a false null hypothesis (a Type II error) than a test at the 5% level, and so will have less statistical power.
In devising a hypothesis test, the tester will aim to maximize power for a given significance, but ultimately have to recognize that the best which can be achieved is likely to be a balance between significance and power, in other words between the risks of Type I and Type II errors. It is important to note that Type I error is not necessarily any worse than a Type II error, nor any better: the severity of a type of error depends on each individual case.
Even statistical significance can be illusory when data from numerous subgroups of the sample population are evaluated. A very large study, part of the Women's Health Initiative, that tested the effects of nutritional supplements reported in 2006 no significant effects for the entire population on certain variables, but among subgroups there was significance. Given that the number of subgroups -- age categories, obesity levels, marital status, along with the combinations thereof, could be extensive, simple probability would predict that some of these groups would show a spurious significant difference even if the null hypothesis were true (the multiple comparisons problem). Therefore, it is prudent in such cases to adjust p-values in order to control either the false discovery rate or the familywise error rate.
Expressed mathematically, the confidence that a result is not by random chance is given by the following formula by SackettSackett DL. Why randomized controlled trials fail but needn't: 2. Failure to employ physiological statistics, or the only formula a clinician-trialist is ever likely to need (or understand!). CMAJ. 2001 Oct 30;165(9):1226-37. PMID 11706914. Free Full Text.:
For clarity, the above formula is presented in tabular form below.
Dependence of confidence with noise, signal and sample size (tabular form)
| Parameter | Parameter increases | Parameter decreases |
|---|---|---|
| Noise | Confidence decreases | Confidence increases |
| Signal | Confidence increases | Confidence decreases |
| Sample size | Confidence increases | Confidence decreases |
In words, the dependence of confidence is high if the noise is low and/or the sample size is large and/or the effect size (signal) is large. The confidence of a result (and its associated confidence interval) is not dependent on effect size alone. If the sample size is large and the noise is low a small effect size can be measured with great confidence. Whether a small effect size is considered important is dependent on the context of the events compared.
In medicine, small effect sizes (reflected by small increases of risk) are often considered clinically relevant and are frequently used to guide treatment decisions (if there is great confidence in them). Whether a given treatment is considered a worthy endeavour is dependent on the risks, benefits and costs.
Statistics | Econometrics | Hypothesis testing
Statistische Signifikanz | מובהקות סטטיסטית | 有意 | Reikšmingumo lygmuo | Significantie | Significância estatística | Statistical significance | 显著性差异
This article is licensed under the GNU Free Documentation License.
It uses material from the
"Statistical significance".
Home Page • arts • business • computers • games • health • hospitals • home • kids & teens • news • physicians • recreation• reference • regional • science • shopping • society • sports • world