- Unsupervised Learning
- The Simple Reason Polls Failed So Hard in 2020
The Simple Reason Polls Failed So Hard in 2020
Regardless of who wins the presidency in 2020 there will be an incandescent conversation around polling. In short, how did they get it so wrong?
The graph above shows some top polls for Florida on November 2nd. Now compare that to how the state actually went.
Trump’s results in Florida from the New York Times
That’s an extraordinary miss of up to 12 percentage points in some polls, swinging from Biden +9 to Trump’s +3.
Nate Silver seems to have turned into Nate Copper.
But there’s a remarkably simple lesson in this that at least one pollster had already locked onto.
Don’t ask people their opinions directly.
Robert Cahaly is basically the new Nate Silver. He runs the Trafalgar Group poll, and he’s been using indirect techniques to run polls for a long time. Here’s what he had for Florida in October:
Trafalgar polling for Florida in October of 2020
He still didn’t get every state right, though.
He had Trump up a bit over 2% in Florida, which is very close to where it’ll likely land after everything is settled. And it wasn’t just Florida—he outperformed the other polls across the spread.
I highly recommend this book.
The technique he uses is something I just read about in a book called Everybody Lies, written by an ex-Google data scientist. It’s about how asking people their opinion is one of the worst ways to find out what they’re thinking.
One technique he talks about is looking at Google Search data to see what people really think about things. Why? Because Google searches are private, and that’s what makes them honest.
Cahaly no longer shares his questions.
Trafalgar uses a similar approach to polling in that he doesn’t trust anything asked or answered directly. One of his early techniques used something like the Google Search data trick. Instead of asking:
(which is likely to trigger all sorts of self-analysis and face-saving)
…they instead asked something like:
In other words, it’s not just lying to the pollster we have to worry about here; it’s also people lying to themselves. Cahaly talks in interviews about people not wanting to appear a certain way to the pollster, and that type of self-awareness seems likely to produce noise in the poll data.
The second version of this question allows people to speak freely about opinions they’re likely to share, but under the protection of, ‘People I know are likely to feel this way…’
Starting with a polling industry blood bath.
My expectation is that we’re about to see a revolution in polling that moves the industry away from Nate Silver’s approach—it turns out the aggregation of bullshit just results in a larger pile of bullshit—and towards Trafalgar and the concepts in Everybody Lies.
In a word, proxies.
Pollsters are about to start searching for ways to measure people’s opinions without asking them directly. Because yeah…that clearly doesn’t work.
I think the “Shy Trump Voter” is an element of this, but it’s a subclass of the Everybody Lies phenomenon. People might be proud of supporting Trump and just not want to share it out of self-preservation, but some subset might not even know how much they support him until they get ready to vote. And in both cases direct polling will fail.