A Small p-value is not Enough, We Need More
Every year, we produce a large number of postgraduate dissertations with quantitative methods of data collection, analysis, and interpretation. In almost all of these dissertations, p-values are used to gauge the plausibility of hypotheses. Misinterpretations of p-values are unfortunately common in research publications. Here are some of the most common incorrect interpretations of p-values:
1. Misunderstanding statistical significance: Many researchers equate statistical significance with practical significance. However, statistical significance only indicates the likelihood of observing the data given the null hypothesis and does not imply the magnitude or importance of the effect.
2. Treating p < 0.05 as a definitive cutoff: Using the conventional significance level of p < 0.05 as an absolute threshold for acceptance or rejection of hypotheses can lead to errors. It is important to consider the context, effect size, study design, and other factors when interpreting p-values.
3. Confusing p-values with effect size: The p-value measures the strength of evidence against the null hypothesis, while effect size quantifies the magnitude of the observed effect. A small p-value does not necessarily mean a large or meaningful effect, and conversely, a large effect size can be observed with a non-significant p-value if the sample size is small.
4. Interpreting non-significant p-values as evidence of no effect: Failing to reject the null hypothesis (i.e., obtaining a non-significant p-value) does not prove that there is no effect. It may simply suggest that the study lacks sufficient statistical power to detect the effect, or that the effect is smaller than anticipated.
5. Multiple comparisons problem: Conducting multiple statistical tests without adjusting for multiple comparisons increases the chance of obtaining false positive results. Interpreting a significant p-value in the presence of multiple tests without appropriate corrections can lead to spurious findings.
6. Cherry-picking significant results: Selectively reporting significant p-values while ignoring non-significant ones can create a biased and misleading interpretation of the overall evidence. It is essential to consider the entire body of evidence and report all relevant findings.
7. Neglecting the pre-specified hypothesis: Sometimes researchers test multiple hypotheses and selectively report only the significant ones, without acknowledging that some may have been generated post hoc. This can inflate the false positive rate and undermine the validity of the conclusions.
8. Overinterpreting small p-values: The fact that the p-value is less than .05 does not mean the alternate hypothesis is true. The p-value of less than .05 only says the probability of obtaining the results given that the null hypothesis is true. Do not say I reject the null hypothesis and then accept the alternate hypothesis. It is a fallacy. More so, assigning excessive importance or causal conclusions based solely on small p-values can lead to unwarranted claims. P-values should be interpreted in conjunction with other statistical measures and corroborating evidence.
9. Ignoring effect modification or interaction effects: Focusing solely on main effects and neglecting potential interactions or effect modifiers can oversimplify the interpretation. The p-value for an interaction term is necessary to assess the presence and significance of interaction effects.
10. Equating p-values with research quality: A low p-value does not guarantee a well-designed study or high research quality. It merely indicates the strength of evidence against the null hypothesis. Other factors, such as sample size, study design, and robust methodology, also contribute to the reliability of the findings.
It is crucial to mention that researchers have not only exposed the weakness of p-values in research but also called for a total abandonment of p-values in research. I enjoin researchers and readers to be aware of these common misinterpretations and to critically evaluate the use of p-values within the broader context of the study. This can involve reporting effect sizes and confidence intervals, in addition to p-values.