Race for the Polls

It seems as if every conversation I had today (Tuesday), and about half of the e-mails, focused on poll methodology and accuracy. This spike of interest, of course, is due to the two polls released yesterday that showed Governor Perry falling below 35%. The San Antonio Express News ran a story headlined, TWO POLLS SHOW PERRY DEFEAT IS POSSIBLE.

As I have written repeatedly, the poll that I think is the least credible is Zogby/Wall Street Journal/Battleground States. I can’t believe the Journal allows its name to be attached to this so-called poll. Most of the audience for this poll is self-selected. The poll’s Web site describes the process as “interactive”–that is, it’s an Internet poll, based on a database of individuals who have signed up to participate. It is not a random sample; the polling organization solicits responses by e-mail. In addition, the poll takers make about 20 to 50 phone calls in the state where a race is taking place. The poll does not mention a screen for likely voters. Without referring to Zogby, a rival polling organization, Survey USA, says, “It’s important to note the distinction between polls that are scientific and statistically sound versus those that are ‘just for fun.’ Some Internet polls, for instance, aren’t scientific or reliable because people who participate are not representative of the population as a whole.”

The surest way to spot a poll with a dubious methodology is to look for huge swings that cannot be accounted for by events. Take the Hutchison-Radnofsky Senate race. The only news to come out of this race has been Radnofsky’s criticism of Hutchison for not debating more frequently. Yet Zogby shows Hutchison with less than 50% of the vote, 45.2% to 36.8%, with the Libertarian candidate polling at more than 7%. Just last month, Rasmussen was reporting Hutchison 61%, Radnofsky 31%. The result is so obviously out of line with the previous poll–not to mention with reality–that Zogby had to hedge on its own poll: “…Republican Sen. Kay Bailey Hutchison is a safe bet to win a third term, although Democratic challenger Barbara Ann Radnofsky, boosted by strength among independent voters, narrowed her deficit to around 9 percentage points, from about 18 points in mid-August.”

The Rasmussen organization does not publish its methodology. The poll reveals only that the results are based on a survey of 500 likely voters. I made a couple of calls to consultants today, and I was told that Rasmussen is an automated poll, which means that respondents reply to recorded questions by pressing a button on their telephone. This is somewhat controversial in the polling world, as there is speculation that automated questions cause people to hang up. Some polls may target a specific person at an address (say, a regular Republican primary voter) but if another person answers (say, a regular Democratic primary voter), there is no way for the polling organization to know. Still, automated polling has come to be regarded, grudgingly, as pretty accurate. It is also cheaper than using a call center and paying live people to make the calls and ask the questions.

Rasmussen and Zogby were far apart on everybody except Perry (Zogby results are in parentheses):

Perry 33% (30.7)
Strayhorn 22% (11.1)
Bell 18% (25.3)
Friedman 16% (22.9)

The third polling organization that tracks state races is Survey USA. This is another automated poll. The difference between it and others is that the questions are read by professional announcers. Survey USA contends that this eliminates errors made by amateur callers that could skew poll results. The last Survey USA poll (576 likely voters) in the governor’s race occurred in June, before the birth of this blog (ah, those were the days):

Perry 35%
Friedman 21%
Bell 20%
Strayhorn 19%

These results are not far from the 40/20/20/20 model that seemed to define the race for most of the summer.

Rasmussen claims to have been “the nation’s most accurate polling firm during the [2004] Presidential election and the only one to project Bush and Kerry’s vote total within half a percentage point of the actual outcome.”

Survey USA doesn’t brag. It does publish an election poll scorecard that compares the work of all major pollsters. Everything you could possibly want to know–no, make that more than you want to know–about Survey USA’s polling history can be found on the Web site. The poll did quite well in 2004 (President, Senator, Governor races):

Average error on the margin of victory:
1. Mason-Dixon 1.3%
2. Survey USA 1.5%
3. Strategic Vision 1.5%
4. Rasmussen 1.6%
9. Gallup 2.2%

Percent of polls outside the margin of error:
1. Survey USA 0.0%
2. Rasmussen 5.1%

On its home page, Rasmussen cites an article from Slate, the online political journal, proclaiming it the most accurate pollster in the 2004 election. To make a long story short–and you can see for yourself it’s very long because I’m going to include a large segment of it below–the automated pollsters whipped the pants off the live-voice pollsters. I have received some e-mails from Perry supporters disparaging automated polling; in particular, they questioned the reliability of Survey USA’s report on August 22 that the governor’s approval rating had dropped nine points (from 52% to 43%) since the July poll. I think the Slate piece affirms Survey USA’s credentials. Warning: This isn’t exactly light reading. It’s definitely for junkies only.

“Automation. Before the election, we publicly doubted and privately derided Rasmussen and SurveyUSA, which used recorded voices to read their poll questions. We rolled our eyes when they touted the virtues of uniformity and when they complained that live interviewers “may not know how to read or speak the English language,” could “chew gum,” or might “just make up the answers to questions.” It sounded to us like a rationalization for cutting costs.

“Look who’s laughing now. Rasmussen and SurveyUSA beat most of their human competitors in the battleground states, often by large margins.

“Let’s compare the automated surveys to the three biggest pollsters who used live interviewers in multiple battleground states. We’ll grade each pollster on two measures: 1) how far its final numbers for Bush and Kerry varied from the official returns, and 2) how far the gap between its final numbers for Bush and Kerry varied from the gap shown in the official returns. For example, suppose a pollster had Bush winning a state 48 to 46 percent, but Bush actually won the state 50 to 47. By the first measure—let’s call it the sum—the poll missed Bush’s number by 2 and Kerry’s by 1, for a total error of 3. By the second measure—let’s call it the spread—the poll’s 2-point lead for Bush missed the actual 3-point lead for Bush by a total error of 1.

“Start with the sum method. Rasmussen and Gallup overlapped in four battleground states: the big three (Florida, Ohio, and Pennsylvania) plus Minnesota. In all four, Rasmussen beat Gallup. Rasmussen’s average error in these states was 3.3 points compared to Gallup’s 6.2. SurveyUSA overlapped with Gallup in the big three states plus Iowa. Again, the automated pollster whipped Gallup. SurveyUSA’s average error was 3.5 points. Gallup’s was 6.4.

“Mason-Dixon fared better, but not by much. It conducted surveys in five states that Rasmussen also polled: the big three plus Michigan and Minnesota. Mason-Dixon’s average error in these states was 5.5 points. Rasmussen’s was 3.2. Mason-Dixon overlapped SurveyUSA in 10 states: the big three, Arkansas, Colorado, Iowa, Michigan, Missouri, Nevada, and Oregon. Mason-Dixon was off in these states by an average of 5.6 points. SurveyUSA was off by 3.3.

“Zogby [NOT his Internet poll] came closer but still couldn’t beat the robo-pollsters. Rasmussen went head-to-head with Zogby in the big three, Michigan, and Minnesota. Zogby erred in these states by an average of 4.3 points. Rasmussen erred by just 3.2. SurveyUSA squared off against Zogby in the big three, Colorado, Iowa, Michigan, and Nevada. Zogby was off in these states by an average of 4.5 points. SurveyUSA was off by just 3.4.

“Human pollsters argue that the sum method favors automated polls, because when respondents are asked to choose a candidate, they’re more likely to punch “1” or “2” on their phones than to punch “3” for other or undecided. This drives down the number of other/undecided responses, lifting both major candidates closer to their final numbers. If one poll has Kerry winning a state 46-45 with 9 percent undecided, and Kerry actually wins 50-49, the sum method punishes that pollster for every other/undecided respondent (calculating an 8-point error) and fails to reward the pollster for nailing the spread. Instead, the sum method rewards a second pollster who recorded fewer other/undecided responses and called the state for Bush, 51-48. The second pollster outscores the first by the sum method (missing Bush’s number by 2 and Kerry’s by 2), despite blowing the spread by 4 points (calling a 3-point win for Bush when Kerry actually won by a point).

“What happens to the pollster comparisons if we switch to the spread method? Both of the automated pollsters still beat Gallup. Head to head, SurveyUSA missed the spreads by an average of 2.3 points; Gallup missed by an average of 5.4. Rasmussen cleaned Gallup’s clock, missing the spreads by an average of 1.6 points compared to Gallup’s 6.2. Rasmussen also whipped Zogby, erring by 1.0 points compared to Zogby’s 3.2. But the contest between SurveyUSA and Zogby was tighter: The human pollster was off by an average of 3.6 points, compared to the robo-pollster’s 2.5.

“Throw in Mason-Dixon, and the comparison gets even tighter. In the five states where Rasmussen overlapped with Mason-Dixon, the two pollsters essentially tied. If you compare election returns (measured to a tenth of a percent) to the most precise published poll results (measured in whole integers), each pollster missed by the exact same average: 1.42 points.
Mason-Dixon says it would be more scientific to compare whole-integer poll results to whole-integer (rounded) election returns. This method would lower Mason-Dixon’s average error. We understand that error rates averaged to a tenth of a percent are tenuous when the poll numbers from which they’re computed are whole integers. But we can’t agree that rounding off election returns improves the situation. Alternatively, Mason-Dixon argues that if we’re using election returns calculated to a tenth of a percent, the best scientific comparison would be to poll results measured to a tenth of a percent, which again would lower Mason-Dixon’s average error. We agree that this would be more scientific. But Rasmussen didn’t release its results to a tenth of a percent, so we can’t compare the two pollsters at that level of precision. Anyway, the performances are so close, and the variation in averages depending on decimal place is so tiny when compared to the much bigger margin of error on each poll, that it’s impossible to call the race between Rasmussen and Mason-Dixon one way or the other. It’s a tie.

“The match-up between Mason-Dixon and SurveyUSA is a different story. In the 10 states where they went head to head, the human pollster prevailed. Mason-Dixon erred by an average of 1.8 points, beating SurveyUSA’s 2.6. For this lonely victory over the machines, Mason-Dixon deserves the polling industry’s Gary Kasparov award.”

“How did the robots largely beat the humans? For starters, they aren’t robots. They’re recordings of human voices. Pollsters who use this technology argue that the uniformity achieved by automation—every respondent hears the questions read exactly the same way—outweighs any distortions caused by people hanging up or lying to the recordings. They also argue that the interviewers who read questions and record answers in “human” polls are all too human. A human poll may bear the name of a major newspaper or television network, but the interviews are usually “outsourced” to a company you’ve never heard of and conducted by whoever is willing to make the phone calls—which sound a lot like telemarketing—for modest wages.

“We won’t settle the relative merits of the two approaches in this article or this election. But when the two major automated pollsters score either second and first–or third and tied for first, depending on how you count it–in round-robin match-ups with the three major human pollsters, it’s time to broaden the experiment in automated polling and compare results to see what’s working and why. Clearly, the automated pollsters are onto something, and the human pollsters who have fallen behind will have to figure out how to beat it—or join it.”

Correction, Dec. 11, 2004: This article originally said that the measure by which Rasmussen and SurveyUSA beat all three human pollsters was the spread method. This was incorrect. The error calculations supplied were for the sum method. We recalculated the average error for each pollster using the spread method and determined that Mason-Dixon beat SurveyUSA. We apologize to Mason-Dixon and to indignant humans everywhere.

Correction, Dec. 20. 2004: Dec. 11, after we had calculated and published pollsters’ error averages using the spread method, Ohio certified a revised vote count that lowered Bush’s vote share in that state from 51.0 to 50.9 and raised Kerry’s vote share from 48.5 to 48.8. Accordingly, we have recalculated all the numbers using both methods. The recalculation eliminated Rasmussen’s advantage over Mason-Dixon using the spread method, producing a tie.