If you were trying to figure out who would win the South Carolina primary on Saturday, you could have read the polls. Or you could have checked Google data.
The Internet search giant predicted Donald Trump would win the South Carolina primary (an admittedly easy call) as well as a close race between Florida Senator Marco Rubio and Texas Senator Ted Cruz for second place, based on searches by people in the Palmetto State. Google also correctly forecast that day that interest in Vermont Senator Bernie Sanders would spike, though not enough to edge out former Secretary of State Hillary Clinton in the notoriously hard-to-poll Nevada caucuses.
But don't think that means Google has now replaced Gallup, the esteemed polling company that no longer tracks the presidential horserace. Experts inside and outside Google think it's premature to start relying on search trends to see who's going to win the election. Instead, they see it and social-media outlet Twitter serving as a complement to traditional polling, enriching the raw data with new insights.
“When there’s a lot of information, this sort of 'who’s winning, who’s losing' narrative is a … cue to who I should pay attention to," Shannon McGregor, a researcher at the University of Texas' Twitter Research Group, told TIME. "So in some ways social-media metrics, to the extent that the public is being exposed to them, can help them in that way. It’s more information that they have about what’s going on out there in terms of the election and in terms of how other people are feeling about it.”
Data from Google and Twitter has a couple of advantages over traditional polls. For one, it's faster, which means it can pick up late-breaking surges for candidates further back in the pack. For example, local Twitter data picked up a surge of interest in Cruz just before the Iowa caucuses, while local Google searches showed Ohio Governor John Kasich's last-minute momentum heading into the New Hampshire primary.
Companies like Google and Twitter have been eager to share this data with the public, campaigns and journalists, according to Daniel Kreiss, an assistant professor of media and journalism at the University of North Carolina at Chapel Hill. Elections present a high-profile opportunity for web analytics to present themselves as a public service and not just a private company.
“They sort of want to be that underlying infrastructure that will help campaigns connect with voters, help voters learn more about candidates, help people get involved,” Kreiss told TIME. “Really sort of serve as that base level of being a key player in the electoral process … companies want to be part of infrastructure for democracy.”
Another social-media platform that's looking to get involved in the election is Facebook, which encourages users to post about watching presidential debates and voting. Facebook releases data on user conversation ahead of primaries but is usually not predictive of the final results. This is because the site is not as public as Twitter and Google, with a lot of user information available only to friends and connections.
McGregor argues that all of the companies' increasing political involvement can have a beneficial side effect, helping generate more interest among the public.
"To the extent that people encounter politics in these personal spaces, it can help broaden the reach of politics, and it can help bring more people into politics by encountering them in these relatively informal and personal spaces," she said.
While social-media posts are publicly expressive, engaging in conversation with other users, data from search engines can signal latent attitudes, according to Joe DiGrazia, a Neukom fellow at Dartmouth University focusing on computational methodologies in the social sciences. Instead of answering a question from a pollster, entering a search on a site like Google or Bing is honest and immediate — your search history doesn't lie.
This is the new focus group — instead of pooling a small, limited group of voters to watch a debate, media-savvy analysts can just watch user responses on social media, according to Chris Kerns, vice president of research and insights at social-media marketing platform Spredfast.
“What social data does for us is it gives you not only a huge panel of people talking about either a show, or in the case of this weekend’s primaries and caucuses, issues and candidates, but it also gives it to you in seconds,” he said. “That panel is still going to be skewed based on the people that are on Twitter and the people that are talking about politics, it’s still 1,000 times better than the old model.”
Sometimes the focus group gets it right and sometimes it can be way off, according to DiGrazia. Like, if a candidate were involved in a scandal, you might see a surge in interest or chatter that does not indicate increased support.
“They’re often predictive, because they measure interest in a candidate, and interest is often correlated with support. These things are often predictive of election outcomes," DiGrazia told TIME. "But at the same time you can have situations where they will lead you astray.”
For example, interest in Jeb Bush spiked on Google over the weekend because the former Florida governor dropped out of the race.
Some researchers argue that the data is most useful when limited to a geographic location, like a metropolitan area, and ordered along a period of time. For instance in New Hampshire, search interest in Kasich spiked on Google on the day of the primary, correctly indicating that he would earn a surprising second-place finish.
The sheer utility of the data when compared with polling makes it invaluable to DiGrazia, offering quick public-opinion analysis that would normally take a polling outfit a longer period of time and a deep research budget. "It’s fast and it’s cheap. You can get it quickly and in real time, and you don’t have to spend any money commissioning a poll," he said.
Starting with John McCain’s 2000 presidential campaign and culminating in Barack Obama’s 2008 race to the White House, campaigns have made use of web data to fine-tune their strategies to appeal to and communicate with voters, according to Kreiss, who is the author of Taking Our Country Back, a history of online politics from 2004 to present.
Data, says Kreiss, “helps campaigns orient themselves” — helping them find what works and what doesn’t, but isn’t an end-all, be-all to campaigning in the digital age. “A lot of this data is messy. It’s often unclear the volume of activity going on around some particular issue and how that translates to things like vote share, how that relates to donations.”
There are pitfalls of relying on data — especially when it comes to those who think it will replace polling outright. The age-old Twitter adage applies: retweets do not equal endorsements.
Just because someone mentions a candidate online is not an indicator of support, and it's likely that they may not even be saying anything positive at all. Former Google data scientist Seth Stephens-Davidowitz says researchers need more time to establish a methodology for pulling data from the web.
"Polls took a while to figure out, it wasn’t just overnight. People knew proper polling methodology. We don’t know proper methodology to weight tweets and Google searches,” he said. “It's been established, without a doubt, that there are important insights that couldn’t be found anywhere else in this data.”