|
Survey Bias and Jesse Ventura
On election night in 1998, no one was more surprised than the Governor-elect of Minnesota, Jesse Ventura. What made the results of that night so surprising was not so much that a professional wrestler, sometime-actor and minor-city mayor was elected but rather, that the pre-election polls never so much as hinted that he was in the running. The problem was that the pollsters were not paying attention to their own data and relying on the wrong algorithms to remove the sampling bias from their data.
In 1998 the Center for Evaluation Research was commissioned by the Jewish Community Relations Council of Minnesota and the Dakotas to conduct its first ever public opinion poll. A population census had been done a year or two earlier and CER randomly sampled from this list. The survey was conducted by phone. The survey rapidly proceeded toward close but an early review of the returns indicated that our sample was woefully short of respondents under 30 years of age. Extra time was added to the schedule and procedures were implemented to try to maximize the likelihood of capturing the younger end of our age distribution but nothing worked and we were left having far too few young people in the sample. Our results were tainted and we cautioned our audiences as such when we presented our data. In every subsequent phone survey that we conducted, including a second poll for the JCRC, we obtained the same result.
It is of no small interest that the age group that voted for Jesse Ventura were primarily people under 30. The very people under represented in our samples So, what happened in 1998 regarding Jesse's unheralded rise to power begins to have a certain fragrance. Let's briefly examine the scenario of what happened and how it reveals the problems with using bad data and why even random digit dialing polls will continue to produce biased results.
Typically, pollsters have very short time to complete each weekly poll, less than a week. What happens when the pollsters find that they haven't filled out all their demographics? They do what we did, the y try harder but at some point they have to quit and report the results. So, if a particular demographic if under-sampled they simply give each person in that demographic a greater weight so that the weighted sum of people is proportional in the sample to what it is in the population. Translating this to numbers if 10% of your sample should be under 30, for example, but in your sample of 910 people only 10 of them are under 30 than you multiply each one 10, in effect creating 10 people where there was only one before. Now your sample has 10% people under 30. Right?
Well, yes but you see immediately what the problem is. If the people in the sample are not a random (representative) sample of people, in other words if your sample is biased, you are multiplying that bias by ten! Bad idea? You bet it is. While the weighting procedure is fine for some situations it is terrible when any particular group is grossly under-sampled.
We submit two ideas here. The first is that the problem with the predictions of the election results was exactly because the sample was biased, probably because the thirty and under crowd was seriously under-sampled and that for most survey applications, that bias will continue to plague those using telephone surveys. Young people simply aren't home and even you can reach them on random digit dialing they are not responding to surveys while they are out and about or when it would use up precious minutes. We suggest that the best way to obtain a random sample is to send out invitations via US mail offering respondents their choice of response modality between paper, phone and web based surveys. The devil is of course in the details but if you are interested please contact us for more information.
Ed Siegel
President, Center for Evaluation Research
|