How Twitter Knows When You’re Depressed

January 27, 2014 3:48 PM EST

With its 230 million regular users, Twitter has become such a broad stream of personal expression that researchers are beginning to use it as a tool to dig into public health problems. Believe it or not, a scientist out there might actually care about the sandwich you ate for lunch—even if most of your followers don’t.

“Our attitude is that Twitter is the largest observational study of human behavior we’ve ever known, and we’re working very hard to take advantage of it,” explains Tyler McCormick of the Center for Statistics and the Social Sciences at the University of Washington.

What if, for example, an artificial intelligence model could scan your Twitter feed and tell you if you’re at risk for depression? And what if you could receive notices from third parties, for instance, that warned you that you may want to seek help, just based on an automated scan of your tweets? Eric Horvitz, co-director of Microsoft Research Redmond has helped pioneer research on Twitter and depression. He says that could one day be a possibility.

“We wondered if we could actually build measures that might be able to detect if someone is severely depressed, just in publicly posted media. What are people telling the world in public spaces?” asks Horvitz. “You might imagine tools that could make people aware of a swing in mood, even before they can feel it themselves.”

Horvitz and a team of researchers helped develop a model that can scan tweets and predict depression in Twitter users, with an accuracy they claim to be 70%. Researchers say the system is still far from perfect. When the model scans your tweets, it misses some signals and doesn’t diagnose many people—about 30%—who really will get depression. And the system has a “false positive” issue, Horvitz said, causing it to incorrectly predict that healthy Twitter users will get depression in about 10% of cases.

The Microsoft team found 476 Twitter users, 171 of whom were seriously depressed. They went back into users’ Twitter histories as far as a year in advance of their depression diagnosis, examining their tweets for language, level of engagement, mentions of certain medications, and other factors, using computer models to sift through a total of 2.2 million tweets. By comparing depressed Twitter users’ feeds with the non-depressed user sample class, they came up with a method for predicting depression diagnoses before they happened. When they tested the model on a different set of Twitter users, it showed 70% accuracy in predicting depression before its onset.

Some tweets the scientists looked at in the depressed group pretty obviously indicate some level of emotional distress. For example, the study cited tweets like these from their depressed user group:

“Having a job again makes me happy. Less time to be depressed and eat all day while watching sad movies.”

“I want someone to hold me and be there for me when I’m sad.”

“‘Are you okay?’ Yes… I understand that I am upset and hopeless and nothing can help me… I’m okay… but I am not all right.”

Not all users’ feeds are so clear. Microsoft’s researchers looked at factors like the number of tweets users made per day, what time of day users tweeted, how often users interacted with each other, and what kind of language tweeters were using. For example, seemingly depressed tweeters were more likely to post messages late at night (between 9pm and 6am) compared with healthy tweeters, who were most active during the day and after work hours.

The team also noticed that certain isolated words in Twitter posts also were characteristic of depression. Words like anxiety, severe, appetite, suicidal, nausea, drowsiness, fatigue, nervousness, addictive, attacks, episodes, and sleep were used by depressed users, but more surprisingly, words like she, him, girl, game, men, home, fun, house, favorite, wants, tolerance, cope, amazing, love, care, songs, and movie could be indications of depression as well.

The volume of tweets mattered too, as did the percentage of exchanges—users who are depressed begin to tweet less, and tweet less at other people, indicating a possible loss of social connectedness, said Horvitz. Of course, just because a Twitter user makes a post that includes the word fatigue and house at 4am, that doesn’t mean they’re depressed. The Microsoft team’s classifier looked at users’ feeds over long periods of time and incorporated many factors. A second Microsoft study that focused more on broader populations using slightly different methods achieved similar results, determining depression in tweets with around 70% accuracy.

One area of public health where this kind of research could come in handy is in measuring public reactions to events. Tracking public Twitter feeds after profound or traumatic events could help scientists understand how we’re affected by the news. “We really didn’t used to have many tools available traditionally for that kind of fine-grained analysis,” says said Horvitz. “Now there’s a new direction for doing the science.”

McCormick, of the University of Washington, said part of the research he and his team is now doing will involve improving earlier Twitter depression models, by weeding out false or misleading data and figuring out areas where depression-related data is being underreported. His team has also identified a group of first-year students at a number of colleges across the country based on their Twitter feeds—hashtags, posts relating to orientation—and is following them for “red flags” that could indicate emotional issues.

A study by University of California San Diego will also build on that research. Funded by the federal government’s National Institute of Health, UCSD’s Michael Conway is creating models that will eventually track depression in communities and figure out how to apply mental health resources better assess public health. “The ultimate goal of this work is to provide a cost-effective, real-time means of monitoring the prevalence of depression in the general population,” Conway said in an email.

In a post-Snowden era, privacy is a major concern facing any kind of mass-data collection. The Twitter users in the Microsoft study permitted Horvitz and his team to examine their tweets, but a possible future in which computer programs automatically sift through your tweets to make judgments on your health could understandably set off alarms with big data skeptics.

Conway’s team is looking at some of the tough ethical questions involved, by “investigating public attitudes towards the ethics of using social media for public health monitoring,” he says. “This ethical component of the work is particularly important given the evolving role of social media in society and concerns regarding the activities of the NSA.”

It may be some time before the research is developed enough for Twitter to warn individuals at risk for depression to seek help. Horvitz says part of what’s driven his research is the staggering number of suicides in the United States every year due to depression: 30,000. “If we can even save through interventions a few of those 30,000 people each year, it will make this research well worth it,” he said.

More Must-Reads from TIME