Many have heard the phrase:
Correlation is not causation!
It’s mostly used in arguments about highly emotional topics where neither side is willing to change its mind, e.g., politics, religion, etc.
People are now using it as a pseudo-scientific tool to contradict data that opposes their viewpoint.
What is correlation?
Meriam-Webster defines correlation as:
- The relationship between things that happen or change together.
What’s crucial to understand about correlation is that it’s epistemic, meaning it is limited by our knowledge of the world.
Said differently, if we knew all variables there wouldn’t be any such thing as correlation. Correlations are basically what people notice when they cannot see under the covers.
The less information you have, the more you are forced to observe correlations and come up with theories. And the more knowledge you have—the more transparent things are—the more you can see actual causal relationships.
If we see that crime rises as incomes go down in a neighborhood, but the only two things we know are the crime rate and the income numbers, we’re basically blind. You have nothing but a correlation because you have no information, and that opens things up to wild theories.
But if we could track every crime, document the perpetrator, capture their reasons for committing the crime, determine every aspect of that person’s life, and regress it back to previous causes, we’d find a lattice of causal streams that create outcomes.
If you have causal streams, you don’t need correlations. Correlations are what you go on when you don’t have data, and the less data you have the more correlation you have to rely on because it’s the only thing you have.
Everything is caused
What’s important to realize about this is that everything is actually caused. Everything is part of one of those causal streams.
The question is only how much data we can gather about those streams, and whether it’s enough to move from correlations to statements about causation.
- The observation of correlations is a practical response to not having enough data, knowledge, and visibility into causal chains
- There are many situations where we can absolutely show cause, and not just correlation, and that is where we have enough data to do so
- The precise amount of data required to transition from a correlation observation to a statement of cause is nuanced, and is probably best left to experts in the particular field in question
- I am not one of the people who can tell when there is enough information to move from correlation to causation, but I’d love to know what the criteria are.