Why Nate Silver is Wrong

Famed pollster and sabermetrician Nate Silver is calling the US Presidential race for Obama, in a big way:

Silver’s mathematical model gives Obama an 85% chance of winning. The Presidential election is based on an electoral college system, so Silver’s model rightly looks at state-level polls. And in swing state polls, Obama is mostly winning:

This is slightly jarring, because in national polls, the two candidates are locked together:

So who’s right? Is the election on a knife-edge like the national polls suggest, or is Obama strongly likely to win as Silver’s model suggests?

While the election could easily go either way depending on turnout, I think Silver’s model is predicting the wrong result. In order for that to be the case, the state polling data has to be wrong.

There are a number of factors that lead me to believe that this is the case.

First, Republicans tend to outperform their poll numbers. In 2008, the national average got the national race just about right:

In the end, Obama won the election with 52.9% of the vote, against McCain who came out with 45.7%.

However, polls have historically underestimated Republican support. Except 2000 (when a November Surprise revelation of a George W. Bush drunk-driving charge pushed Gore 3.2% higher than the final round of polling), Republican Presidential candidates since 1992 have outperformed their final polls by a mean of 1.8 points. Such an outcome for Romney would put him 1.5% ahead in the national polls, and imperil Obama’s grip on the swing states.

Second, the Bradley Effect. The interesting thing about the swing states is that many of them are disproportionately white. The United States is 72% white, but Iowa is 89% white, Indiana is 81% white, Ohio is 81% white, Minnesota is 83% white, Pennsylvania is 79% white, New Hampshire is 92% white, Maine is 94% white and Wisconsin is 83% white. This means that they are particularly susceptible to the Bradley Effect — where white voters tell a pollster they will vote for a black candidate, but in reality vote for a white alternative. In a state in which Obama holds a small lead in state-level polling, only a small Bradley Effect would be necessary to turn it red.

This effect may have already affected Barack Obama in the past — in the 2008 primaries, Obama was shown by the polls to be leading in New Hampshire, but in reality Hillary Clinton ran out the winner. And many national polls in October 2008 showed Obama with much bigger leads than he really achieved at the polls — Gallup showed Obama as 11% ahead, Pew showed Obama as 16% ahead.

A small Bradley Effect will not hurt Obama where he is 7% or 11% or 16% ahead in the polls. But when polls are closer — as they mostly are in the swing states — it becomes more plausible than such an effect could change the course of the race.

And the Bradley Effect in 2012 may be bigger than in 2008. A recent poll by the Associated Press concluded:

A majority of Americans (51 percent) now hold “explicit anti-black attitudes” — up from 49 percent in 2008 — and 56 percent showed prejudice on an implicit racism test.

Finally, polls have tended to overestimate the popularity of incumbent Presidents, especially Democrats. In 1980, polls put Jimmy Carter 3% of his final tally, and in 1996 polls put Bill Clinton 2.8% ahead of his final tally:

Taken together, these difficult-to-quantify factors pose a serious challenge to Silver’s model. While it is fine to build a predictive model on polling data, if the polling data fed into the model is skewed, then any predictions will be skewed. Garbage in, garbage out.

I rate Obama’s chance of being re-elected as no better than 50:50. If Silver really rates his chances as 85:15, perhaps he should consider taking bets at those odds.

UPDATE:

Obviously, Silver’s predictive model (and far, far more importantly the state-level polling data) proved even more accurate than 2008. However, the 2010 British General Election (in which polls and therefore Silver vastly overestimated the Liberal Democrat support level, leading to an electoral projection that was way off the mark) illustrates that there remain enough issues regarding the reliability of the polling data to ensure that Silver’s model (and similar) continue to suffer from the problem of fat tails. With solid, transparent and plentiful data (as Taleb puts it, in “Mediocristan”) such models work very, very well. But there remains plenty of scope (as Britain in 2010 illustrates) for polls to be systematically wrong (“Extremistan”). Given the likelihood that every news network will have its own state-level poll aggregator and Nate Silver soundalike on-hand come 2016, that might well be a poetic date for the chaotic effects of unreliable polling data to reappear. In the meantime, I congratulate the pollsters for providing Silver with the data necessary to make accurate projections.