Below, see GfK’s response to iMediaEthics’ research, and iMediaEthics’ response to GfK.
And, check out iMediaEthics’ full report on a 20 percent difference in responses from the public on trans fats.
Randall K. Thomas, Vice President at GfK for Online Research Methods, Responds
The initial evidence I was presented with were two surveys, one phone, one online separated by 3 weeks during which a critical event occurred – the discussion by the FDA to ban transfats as indicated here:
http://www.fda.gov/ForConsumers/ConsumerUpdates/ucm372915.htm
The first survey indicated much less support for banning transfats than the second survey. Based on the spike in news stories and searches concerning transfats, it appeared to me at the time that this may have reflected a response to the news stories where the baseline reflected the true public opinion and the post-survey reflected the true public opinion following the news stories.
In the latest results, you fielded both the online and telephone surveys using probability samples at the same time, and you found that the results were generally replicated in each mode, indicating strong reliability within mode but still large differences between modes. With both samples being selected based on probability-based techniques and assuming the data were weighted to similar targets with similar weighting approaches, this appears most likely to be a mode effect, where how an item and responses are presented can affect response. Online surveys tend to be visually presented with all response options and are self-administered while phone surveys have a human interviewer who orally asks the questions and presents the responses. These differences in mode have been found to lead to differences in responses in a number of studies.
In the past 15 years that I’ve studied mode effects between phone and online, most effects are relatively small, often ranging from 3 to 5% in proportions or scale range. But there have been a number of larger mode effects that I’ve seen, even from similar sample sources, and I’ve seen them most often occur with dichotomous items, as this item was (rather than using graded scales), and with items subject to satisficing, acquiescence pressures, and socially desirability pressures.
So though the reliability of the results was high, we do not know the validity of the results for either mode since we do not have any benchmark or gold standard by which to judge the accuracy of the results. It could equally be plausible that phone misrepresents public opinion on this issue. Without a gold standard that has no non-response and is not influenced by mode of administration or human interviewer (assuming there is a truth that exists independent of context and is measurable), we cannot definitively judge the accuracy of the results.
As you know, there have been a number of authors who have pointed out the non-response challenges facing surveys today, and in particular telephone surveys – the people who are answering landlines and cell phones and completing a survey may be significantly different from those who don’t. So though you may get results that are consistent across time, even with telephone we cannot simply assume that they are more valid than any other mode. Reliability is a necessary, but not sufficient, condition for validity.
In addition to a possible mode effect, which can singlehandedly introduce various hard-to-quantify confounders, and the possibility of significant non-response bias in phone surveys, there are also a number of serious methodological issues that have helped unseat telephone-based protocols as the gold standard in survey research. Thanks to the growing number of households that are abandoning their landline services and becoming cell phone only (CPO) households, all general population telephone surveys now employ some variant of dual-frame RDD methodology. As such, these surveys rely on arbitrary mixtures of landline and cellular telephone numbers to reach households. Moreover, because there are no current estimates for the number of CPO households at different levels of geography, each researcher relies on a different method of weighting the data to compensate for the disproportional allocation of the sample to the two phone strata. Further, CPO estimates from the CDC are survey-based, which means that they are also subject to large sampling errors and sometimes more than two years in lag.
As a result of the lack of clear, detailed, and commonly agreed upon targets, no two survey research organizations use the same sample allocation method or weighting strategy for dual-frame RDD surveys (Fahimi et al. 2009). These inconsistencies are part of the reasons why significantly different estimates can result from dual-frame RDD surveys that measure the same population parameters using otherwise identical protocols. Add to this the sobering reality that most commercial RDD surveys fielded for less than a week rarely obtain true response rates greater than 10%, it becomes evident why this trusted workhorse of survey research is no longer considered the gold standard.
The journal articles below have some nice discussions of mode differences, especially between online and phone, and the last one in particular (Yeager et al.) focuses on similarities and differences between phone and our online probability panel. Let me know if you have any further questions, thanks!
de Leeuw, E. (2005). “To mix or not to mix data collection modes in surveys.” Journal of Official Statistics, 21(2): 233-255.
Fahimi, M., D. Kulp, and M. Brick (2009). “A Reassessment of List-Assisted RDD Methodology.” Public Opinion Quarterly, Vol. 73 (4): 751–760.
Kreuter, F., S. Presser, and R. Tourangeau. “Social Desirability Bias in CATI, IVR, and Web Surveys: The Effects of Mode and Question Sensitivity”. Public Opin Q (2008) 72(5): 847-865.
Holbrook, A. L., M. C. Green and .J A. Krosnick. “Telephone versus Face-to-Face Interviewing of National Probability Samples with Long Questionnaires: Comparisons of Respondent Satisficing and Social Desirability Response Bias”. Public Opin Q (2003) 67 (1): 79-125.
Yeager, D. S., J A. Krosnick, L. Chang, H. S. Javitz, M. S. Levendusky, A. Simpser and R. Wang. “Comparing the Accuracy of RDD Telephone Surveys and Internet Surveys Conducted with Probability and Non-Probability Samples” Public Opin Q (2011) 75 (4): 709-747.
Response to Randall Thomas’ Comments from David W. Moore and Andrew E. Smith
In his response, Thomas acknowledges that the different results between the GfK polls and the PSRAI polls were not caused by opinion changing over time. Instead, he suggests a “mode” effect – that one poll was conducted online, the other by phone.
But we suggest that the poll differences are not caused just by whether a person is interviewed live on the phone vs. interviewed by an online questionnaire. We think the differences have more to do with the fact that the panel of respondents who constitute the GfK samples are apparently quite different from the people who are willing to respond by phone.
The very study that Thomas highlights in his response, the 2011 Public Opinion Quarterly article, by Yeager, et. al., reinforces our view. This article reports on a systematic comparison of polls conducted by 1) telephone, 2) a “probability sample Internet survey” (i.e., GfK’s The KnowledgePanelTM), and 3) non-probability Internet surveys.
What are the results?
Among other findings, the authors (p. 728) note that even after weighting the data (to insure the samples are representative of the American public at large, as reflected in U.S. Census data), the telephone surveys were about 50% more accurate than the “probability sample Internet survey.” This finding is consistent with our earlier argument that people who are willing to be part of an ongoing panel are more likely to be interested, engaged and informed about current events than is the general public.
Moreover, it’s important to recognize that in this specific case, when answering the online survey, GfK’s respondents can very quickly do a search on trans fats and almost immediately find information that shows how unhealthy that substance is. When we did a Google search on trans fats, we immediately found headlines that read “Avoid this cholesterol double whammy,” “Why trans fats are no longer considered safe,” and “Ban trans fats.”
This ability to immediately look up a subject might alone help to explain (in this particular case) why GfK’s respondents were so different in their views from PSRAI’s respondents. And the important point is that GfK’s respondents, once they go online to look up trans fats, no longer represent the general public as a whole.
Whatever the causes of the large differences between GfK’s results and those obtained by PSRAI, the fact of the very large size of the differences raises a red flag about the validity of the GfK polls.
More research is needed to help specify when the probabilistic online samples provide as valid a measure as the probabilistic telephone samples. And we expect to be part of that research effort.
In the meantime, we cannot help but be skeptical about poll results obtained using GfK’s KnowledgePanelTM respondents. When the results differ significantly from good phone surveys, we’d definitely be inclined to accept the phone survey results as the more representative of the American public.
David W. Moore is a Senior Fellow with the Carsey Institute at the University of New Hampshire. He is a former Vice President of the Gallup Organization and was a senior editor with the Gallup Poll for thirteen years. He is author of The Opinion Makers: An Insider Exposes the Truth Behind the Polls (Beacon, 2008; trade paperback edition, 2009). Publishers’ Weekly refers to it as a “succinct and damning critique…Keen and witty throughout.” He and Andrew E. Smith are writing a book about the New Hampshire Primary.
Andrew E. Smith is Associate Professor and Director of the Survey Center at the University of New Hampshire. He teaches in the Political Science Department and has published numerous scholarly articles on polling. He has also conducted numerous polls for the Boston Globe, USA Today, CNN, Fox News, several local television stations, and various state and local government agencies. He and David W. Moore are writing a book about the New Hampshire Primary.
GfK Web-Based Polls: Why Do AP, Pew, CBS, and Other Media Trust Them? A Recent GfK Poll