Many have heard the phrase:
Correlation is not causation!
It’s mostly used in arguments about highly emotional topics where neither side is willing to change its mind, e.g., politics, religion, etc.
People are now using it as a pseudo-scientific tool to contradict data that opposes their viewpoint.
What is correlation?
Meriam-Webster defines correlation as:
- The relationship between things that happen or change together.
What’s crucial to understand about correlation is that it’s epistemic, meaning it is limited by our knowledge of the world.
Said differently, if we knew all variables we wouldn’t be forced to go off of mere correlations. Correlations are basically what people notice when they cannot see under the covers.
The less information you have, the more you are forced to observe correlations and come up with theories. And the more knowledge you have—the more transparent things are—the more you can see actual causal relationships.
If we see that crime rises as incomes go down in a neighborhood, but the only two things we know are the crime rate and the income numbers, we’re basically blind. You have nothing but a correlation because you have no information, and that opens things up to wild theories.
But if we could track every crime, document the perpetrator, capture their reasons for committing the crime, determine every aspect of that person’s life, and regress it back to previous causes, we’d find a lattice of causal streams that create outcomes.
If you have causal streams, you don’t need correlations. Correlations are what you go on when you don’t have data, and the less data you have the more you’re left with noticing relationships between variables.
Everything is caused
What’s important to realize about this is that everything is actually caused. Everything is part of one of those causal streams. The question is only how much data we can gather about those streams, and whether it’s enough to move from correlations to statements about causation.
- The observation of correlations is a practical response to not having enough data, knowledge, and visibility to talk about causal relationships.
- There are many situations where we can absolutely show cause, and not just correlation, and that is where we have enough data to do so.
- The precise amount of data required to transition from a correlation observation to a statement of cause is nuanced and differs based on the situation.
- Be cautious of conversations that say correlation isn’t valuable (because it sometime is), and of conversations where they assume it indicates causation (it often does not).
- For a great book on this, I recommend Naked Statistics, by Charles Wheelan.