Tyler Vigen published a collection of spurious correlations, a popular and playf
ID: 3040994 • Letter: T
Question
Tyler Vigen published a collection of spurious correlations, a popular and playful showcase of the potential hazards of using (and/or consuming) statistical reports in a careless fashion. The core takeaway from the project is that correlation does not imply causation, but you already knew that! However, his work also serves as a fantastic illustration of another pitfall that poses a threat to anyone analyzing large amounts of data with lots of features. A coworker comes up to you, beaming, saying "Finally! I found the evidence we need to go forward with the project. Most of my hypotheses did not come out like I thought they would, but finally, one correlation came out really strong. R 0.98! Want to come tell the boss the good news?" In the light of what Vigen illustrates, what do you tell your colleague? (a) You bet! Since we had a prior hypothesis that anticipated this association, we can trust that the correlation reflects a meaningful relationship. (b) Not so fast. Based on what you just told me, I would not be so excited. (c) Neither of the above. [Hint: Vigen's front page mentions how many correlations are viewable on his site. Under a null hypothesis of no association between each pair of variables, how many of these correlations would appear significantly correlated due to chance?] 'Explanation / Answer
Correct Answer: (b) Not so fast. Based on what you just told me, I would not be so excited.
Explanation: Since co-worker says that "most of my hypothesis did not come out like I thought they would", it is clear that the null hypothesis was that no association exists between each pair of a set of variables. In addition co-worker has found that Only 1 correlation came out really Strong.Under the asumption that the null hypothesis being True, the probability that any 1 correlation will appear really strong is quite high. Therefore, the observed high value of correlation is just because of chance and truly there is no significant correlation. Correlation doesn't imply causation. Unless coworker is able to come up with larger number of significantly correlated pairs of variable, my reply may change. Therefore option (b) is right
Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.