Humanities Scholars Baffled By Math

Via the Wall Street Journal:

In the latest study, Kimmo Eriksson, a mathematician and researcher of social psychology at Sweden’s Mälardalen University, chose two abstracts from papers published in research journals, one in evolutionary anthropology and one in sociology. He gave them to 200 people to rate for quality—with one twist. At random, one of the two abstracts received an additional sentence, the one above with the math equation, which he pulled from an unrelated paper in psychology. The study’s 200 participants all had master’s or doctoral degrees. Those with degrees in math, science or technology rated the abstract with the tacked-on sentence as slightly lower-quality than the other. But participants with degrees in humanities, social science or other fields preferred the one with the bogus math, with some rating it much more highly on a scale of 0 to 100.

Specifically, 62% of humanities and social science scholars preferred the paper with the irrelevant equation, compared with 46% from a background of mathematics, science and technology.

This is a significant result, and I hope the experiment is repeated and replicated. It is all well and good for humanities and social science scholars to mostly eschew the use of mathematics in their work. But if humanities scholars begin to take work more seriously simply for the inclusion of (faux-) mathematics without themselves understanding the mathematics, then maybe it’s time for humanities and social science scholars to increase their mathematical and statistical literacy so as not to be so easily tricked by faux-mathematical rigour.

And this isn’t just a case of not understanding the equation — it seems like a nontrivial chunk of humanities and social science scholars have quite an inferiority complex. That should be a great embarrassment; there is nothing inherently inferior about the study of the human condition, or its (mostly non-mathematical) tools.

Last year, I wrote:

Well-written work — whether in plain language or mathematics — requires comprehensible explanations and definitions, so that a non-specialist with a moderate interest in the subject can quickly and easily grasp the gist of the concepts, the theory, the reasoning, and the predictions. Researchers can use as complex methods as they like — but if they cannot explain them clearly in plain language then there is a transparency problem. Without transparency, academia — whether cultural studies, or mathematics, or economics — has sometimes produced self-serving ambiguous sludge. Bad models and theories produce bad predictions that can inform bad policy and bad investment decisions.  It is so crucial that ideas are expressed in a comprehensible way, and that theories and the thought-process behind them are not hidden behind opaque or poorly-defined words or mathematics.

But in this case, I think the only real solution is mathematical and scientific literacy.

On the other hand, prestigious mathematics journals have also recently been conned into publishing papers of (literally) incomprehensible gibberish, so it is not like only humanities and social science scholars have the capacity to be baffled by bullshit.

Why Nate Silver is Wrong

Famed pollster and sabermetrician Nate Silver is calling the US Presidential race for Obama, in a big way:

Silver’s mathematical model gives Obama an 85% chance of winning. The Presidential election is based on an electoral college system, so Silver’s model rightly looks at state-level polls. And in swing state polls, Obama is mostly winning:

This is slightly jarring, because in national polls, the two candidates are locked together:

So who’s right? Is the election on a knife-edge like the national polls suggest, or is Obama strongly likely to win as Silver’s model suggests?

While the election could easily go either way depending on turnout, I think Silver’s model is predicting the wrong result. In order for that to be the case, the state polling data has to be wrong.

There are a number of factors that lead me to believe that this is the case.

First, Republicans tend to outperform their poll numbers. In 2008, the national average got the national race just about right:

In the end, Obama won the election with 52.9% of the vote, against McCain who came out with 45.7%.

However, polls have historically underestimated Republican support. Except 2000 (when a November Surprise revelation of a George W. Bush drunk-driving charge pushed Gore 3.2% higher than the final round of polling), Republican Presidential candidates since 1992 have outperformed their final polls by a mean of 1.8 points. Such an outcome for Romney would put him 1.5% ahead in the national polls, and imperil Obama’s grip on the swing states.

Second, the Bradley Effect. The interesting thing about the swing states is that many of them are disproportionately white. The United States is 72% white, but Iowa is 89% white, Indiana is 81% white, Ohio is 81% white, Minnesota is 83% white, Pennsylvania is 79% white, New Hampshire is 92% white, Maine is 94% white and Wisconsin is 83% white. This means that they are particularly susceptible to the Bradley Effect — where white voters tell a pollster they will vote for a black candidate, but in reality vote for a white alternative. In a state in which Obama holds a small lead in state-level polling, only a small Bradley Effect would be necessary to turn it red.

This effect may have already affected Barack Obama in the past — in the 2008 primaries, Obama was shown by the polls to be leading in New Hampshire, but in reality Hillary Clinton ran out the winner. And many national polls in October 2008 showed Obama with much bigger leads than he really achieved at the polls — Gallup showed Obama as 11% ahead, Pew showed Obama as 16% ahead.

A small Bradley Effect will not hurt Obama where he is 7% or 11% or 16% ahead in the polls. But when polls are closer — as they mostly are in the swing states — it becomes more plausible than such an effect could change the course of the race.

And the Bradley Effect in 2012 may be bigger than in 2008. A recent poll by the Associated Press concluded:

A majority of Americans (51 percent) now hold “explicit anti-black attitudes” — up from 49 percent in 2008 — and 56 percent showed prejudice on an implicit racism test.

Finally, polls have tended to overestimate the popularity of incumbent Presidents, especially Democrats. In 1980, polls put Jimmy Carter 3% of his final tally, and in 1996 polls put Bill Clinton 2.8% ahead of his final tally:

Taken together, these difficult-to-quantify factors pose a serious challenge to Silver’s model. While it is fine to build a predictive model on polling data, if the polling data fed into the model is skewed, then any predictions will be skewed. Garbage in, garbage out.

I rate Obama’s chance of being re-elected as no better than 50:50. If Silver really rates his chances as 85:15, perhaps he should consider taking bets at those odds.

UPDATE:

Obviously, Silver’s predictive model (and far, far more importantly the state-level polling data) proved even more accurate than 2008. However, the 2010 British General Election (in which polls and therefore Silver vastly overestimated the Liberal Democrat support level, leading to an electoral projection that was way off the mark) illustrates that there remain enough issues regarding the reliability of the polling data to ensure that Silver’s model (and similar) continue to suffer from the problem of fat tails. With solid, transparent and plentiful data (as Taleb puts it, in “Mediocristan”) such models work very, very well. But there remains plenty of scope (as Britain in 2010 illustrates) for polls to be systematically wrong (“Extremistan”). Given the likelihood that every news network will have its own state-level poll aggregator and Nate Silver soundalike on-hand come 2016, that might well be a poetic date for the chaotic effects of unreliable polling data to reappear. In the meantime, I congratulate the pollsters for providing Silver with the data necessary to make accurate projections.