Examples of real-world statistical analysis
Your results from a hypothesis test are statistically significant. Bravo! Are these results important? Not so. Significance not necessarily mean the results are practically significant in real world statistics.
In this blog post, I will list out few examples of real world statistical analyses that do not use p-value and significance testing. Before that, let us understand what exactly the p-value mean?
The same scenario as above have been faced by many data scientist when we talk about the p-value, isn’t it?
Well, sometimes the simplest definition of p-value tend to be complicated. Technically, a p-value is the probability of obtaining an effect at least as extreme as the one in the sample data. Usually, the results from a hypothesis testing procedure determines whether the assumed null hypothesis is correct for the population. You reject the null hypothesis only if the results are unconvincing under the assumptions. Strictly speaking, the statistically significant results are obtained when the strength of the evidence in the sample has passed the defined significance level (alpha).
Often p-value is used to determine the statistical significance in hypothesis tests such as chi-square tests, t-tests, ANOVA, and regression coefficients among many others. Also, it might seem logical that p-values and statistical significance relate to importance. However, there are situations where these may not be useful in practical world. Here is a list of situations where the significance testing and p-value failed and lead to impractical results.
- Suppose Mr. X is evaluating a training program by comparing the test scores of participants to those who study on their own. Further, he decide that the difference between these groups must be at least 5 points to represent a practically meaningful effect size. The results from the study shows a statistically significant difference with an average score of 3 points higher on a 100-point test. While these results are statistically significant, the 3-point difference is less than our 5-point threshold. Thus, the study provides evidence that the effect exists, but it is too small to be meaningful in the real world. The conclusion is that the time and money that participants spend on this training program are not worth an average improvement of only 3 points.
- Let us consider the mean pizza delivery time example. Once the data has been collected, our calculation finds that the mean delivery time is longer by 10 minutes with a p-value of 0.03. That is, the null hypothesis is true when there is a 3% chance that the mean delivery time is longer by 10 minutes. But this results will be impractical because we belief that the mean delivery time of the pizza is always 30 minutes of lesser. Here, the p value of 0.03 is less than the threshold 0.05 and hence we conclude it is statistically significant. In this situation, we may think about the result from the analysis and our true belief that the delivery time is lesser or equal to 30 minutes is a valid null hypothesis. In addition, from the reviews we may conclude there is also situation late delivery has taken place. From this, one may decide not to buy any pizza from that particular shop too. Therefore, in my opinion, result based on p-value is impractical like in this situation.
- If the sample mean vary among the sample, then the p-value will also vary and this effect is will result in wrong conclusion based on the p-value. See Dance of the p-value by Geoff Cumming to understand the effect of p-value on varying sample sizes.
- There is always an interesting hypothesis to understand how the p-value fails in real time situation. Cohen (1994) discussed the critique of the use of significance tests for the hypothesis that “the earth is round” with p<0.05 whereas Amrhein et al (2017) argued and discussed for the hypothesis that “the earth is flat (p > 0.05)”.
In closing, statistical testing and the resultant p-value indicates that the sample provides sufficient evidence to conclude that the effect exists in the population. However, there is always a question arises that; p-value is practically a valid measure? Thus, the use of test statistics, number of samples and framing the null hypothesis really matters in arriving any statistical conclusion.References