## Calling statistics geeks!

January 10, 2004 by Tom Negrino

Attention statistics geeks: In this political season, we’re all seeing lots of polls, and they all have a reported margin of error (MOE). Now, I think that I understand what that means. But let’s look at some numbers reported today from the LA Times Iowa poll:

Dean 30

Gephardt 23

Kerry 18

Edwards 11

The reported MOE is 4%. Now, I understand that means that Dean could be as high as 34 and as low as 26, and Gep could be as high as 27 and as low as 19. But in polls like this, is there any rule of thumb people can use for the likely, “real” result? In other words, I know that the MOE analysis says that Gep could possibly be beating Dean. But how likely is it that he is really doing so? My gut assumption for years when I see polls with a MOE number has been to take the number, cut it in half, assign the result negatively to the guy on top, and positively to the candidate on the bottom. Then I use that as a conservative estimate for where the race is. So far that has seemed to work. But is that just voodoo on my part?

on January 10, 2004 at 12:34 pmPaul HoffmanIt all depends on how they meaure the MOE. Different people assign different MOEs for different testing methodologies. For exampe, the MOEs on polls are often higher that MOEs on measurements of samples of items bought at stores because polling has been proven to be less accurate.

For most polling results, the MOE represents a bell curve, so your rule of thumb (to cut it in half) isn’t all that bad for holding things in your head, but it has no validity scientifically. In the case of Dean’s lowest overlapping Gep’s highest by 1%, that’s way less than half as likely than if they overlapped by 2%.

All of the above is from memory that has not been refreshed in 25 years, so I could be completely wrong.

The big warning here is that they pollsters probably used a standard calculation for the MOE, which assumes that they used standard polling methods and had a standard sample. I would be very wary of the sample. Respondents to polls that are known to be “important” tend to say more what they think the pollster wants them to say or to say more what they want the poll to look like when published, not what they actually intend to do. If this were a poll for the 25th state to vote, the respondents would likely be more honest about their intentions.

on January 14, 2004 at 1:56 pmRoger KarrakerI’m going off into territory here where I’m only semi-competent. I had hoped that someone else who really knows statistics would explain it. I don’t think Paul’s explanation really does that. I do know that there’s a very thorough explanation of MOE over on the great DailyKos site. Alas, http://www.dailykos.com has come up empty for me for the last dozen hours, so I can’t provide a link.

In the lurch, here’s my understanding of what we’re talking about. I don’t know anything about the retail surveys Paul mentions; my only experience is with polls. Here’s what I think I know.

Poll is of value only if it can be replicated with similar results. In general, the larger the number of people polled, the more likely that another poll, of another group with similar characteristics, will return similar numbers. Note that all this assumes that the question is worded identically, that the survey sample is representative of the group and so on.

The margin of error is a statistical creation that measures the likelihood of another identical poll coming up with similar results. Statisticians like to use a 95-percentile level. This means that if you were to conduct 100 polls of 100 groups of people, theoretically 95 of the those polls would come back with numbers within the margin of error.

The margin of error is a function of the size of the number of people polled. The minimum for most any survey is roughly 300. The maximum worth surveying is roughly 1200.

I seem to recall that the formula for finding the MOE is something like “the reciprocal of the square root of N-1 of those surveyed, expressed as a percentage. A survey sample of 300 has an MOE of approximately 6. That means if candidate XX shows in the survey as having 30 percent of the vote, well, if you conduct another 100 identical surveys 95 of those surveys should show the candidate somewhere between 36 percent (MoE+) and 24 percent (MOE-).

A survey of 1200 persons yields a MOE of 3 percent, indicating that 95 percent of the time our candidate XX will come up between 33 percent and 27 percent.

So when you have a survey of 300 persons with Dean having 30 percent and Clark with 24 and an MOE of 6 percent, theoretically Clark might really have as much as 30 percent and Dean as low as 24. Or Dean might really have as much as 36 and Clark as low as 18 if another poll of another 300 persons was conducted at the same time.

In short, any single poll, especially of a small sample size, is really terrible at “predicting” actual voter results. The thing to watch for are trendlines across several polls. The polls are also relatively meaningless for primaries, where there’s often little turnout. In those cases note whether the poll measures “registered voters” or “likely voters.” Professional campaign people value the latter much higher.

When we get closer to the November election the polls generally get “better” because the sample sizes get larger, lowering the MOE.

I hope this helps and that someone who really knows polling stats will step in to correct me.