Twitter and History: Future Scholars May Use Your Social Media Data

By Jason Steinhauer / The John W. Kluge Center at the Library of Congress

July 28, 2015 11:00 AM EDT

This post is in collaboration with The John W. Kluge Center at the Library of Congress, which brings together scholars and researchers from around the world to use the Library’s rich collections. The article below was originally published on the Kluge Center blog with the title Preserving Social Media for Future Historians.

Information scientist Katrin Weller’s research investigates how future historians might use social media as primary source materials, and how such materials should be preserved. One of two inaugural Kluge Fellows in Digital Studies, Weller was in residence at the Library of Congress from January – June 2015. She sat down with Jason Steinhauer to discuss her research and the prospect of creating a guide to using social media as historical resources.

Hi, Katrin. Your research investigates whether social media data will be the primary source materials for future historians? Will it–and why or why not?

Social media data and other online communication data will surely be used by future historians to learn about our times. They won’t be the only source material, as current traditional sources will still remain. But social media are already being used as a new type of data source by contemporary scholars in various disciplines: political science, sociology, linguistics, communication science, geography, physics, computer science and many more. It is logical to assume that future historians will also look at these sources.

For what purpose will it be used and what might future scholars learn from this data?

Social media are used as a platform to discuss major events such as elections, political crises, natural disasters or cultural celebrations. For example, historians may want to discover the first people who reported live from what later became known as the Arab Spring. They will try to identify different locations of protest activities, such as during the Occupy Wall Street movement, based on communication in different social media channels. Social media is also used by numerous politicians and other public figures. Historians may want to retrace what Barack Obama said on Twitter during an election campaign and how people reacted to that in social media conversations.

Social media data can also be used to study aspects of everyday life, including popular culture, fashion, nutrition, health and well-being, or travel. Social media data open a window to everyday communication, little notes and observations, which remind us more of the spoken conversations that are typically ephemeral. It is fascinating to see thoughts on everyday life being shared on that large scale.

The only thing that would prevent this scenario is if the data are no longer accessible due to a lack of preservation efforts.

So what should we be thinking about now to ensure social media is preserved?

It’s a very good question and we still need a lot of work to answer it more comprehensively.

First of all, there is the more general topic of digital long-term preservation. We must ensure that storage devices remain intact and that we still have devices that allow us to run specific file formats. More technical challenges need to be solved for social media data, including how to handle their size and how to make them searchable.

Then there are legal and ethical challenges of archiving social media data. Usually data comes from social media platforms, which are operated by large companies–such as Facebook or Yahoo–who each has their own terms of services. Many social media data are not fully openly available and access often depends on agreements with the respective companies, who may or may not have an interest in sharing access to their data or discussing archival strategies.

Third, all preservation strategies have to happen within a framework that ensures that the social media users–the people who have actually created the content within a social media platform–and their interests are protected. Here we need usable approaches to protect privacy, for example.

Finally, we need to start working on how to preserve relevant contextual information. A lot of the context of social media data quickly gets lost, but is important when we want to interpret the data. For example, the look and feel of a social media platform changes over time and it is already very difficult to trace how a specific social media platform looked two years ago, which buttons were placed where and which interactions were possible. But the look and feel highly influences how people use social media data and interact with one another.

Much of your research in Germany focuses on Twitter, and its role in documenting how significant events unfold in real-time and how people respond to them. In your opinion, will Twitter be the barometer that future historians use to gauge what events and ideas were significant in our times, or will future historians decide that based on other sources, and then look to Twitter to gauge how we responded.

There currently are some connections between what is prominently discussed on Twitter and what makes it into the traditional news. Journalists have started to pick up trending topics from Twitter and are commenting on them on TV or news web sites–and of course Twitter users are commenting on events that make it into the news. The ability to connect users around topics rather than focusing on existing “friendship” connections distinguishes Twitter from other social networking sites such as Facebook and makes it special. Thus it will make sense to look at the relation of Twitter and traditional news in the future. In some cases it may be interesting to mine the whole collection of Twitter data for trends and interesting topics. But in most cases the decision about what is a significant event comes first–by looking at the broader picture of worldwide events and their connections–and social media sources like Twitter will subsequently be mined for reactions to those events.

Statista.com reports that, as of the first quarter of 2015, Twitter averaged 236 million monthly active users–impressive, but only about 3 percent of the world’s total population. How do we accurately gauge Twitter’s significance in being representative of contemporary thoughts and mores?

Exactly, that is what we have to constantly keep in mind. And it’s not even the fact that only 3 percent use Twitter which is the most critical argument here. It’s that we are aware that this is not a representative sample but indeed very biased. For example, we know very well that some countries are not represented through Twitter at all. There’s the general phenomenon of the digital divide and populations which are not online at all; of countries where other social networking sites are more frequently used or where Twitter may even be prohibited. When we look at tweets around Hurricane Sandy in 2012, they will tell us quite a lot about what was going on in New York City but nothing about what happened in Haiti around the same time. And even with Twitter users in the U.S., Twitter is not representative of the overall population. It is more frequently used by people of specific age groups and with specific backgrounds. That is why it is so important to generate information about user demographics, so that we can understand these biases.

Read the rest of this interview here, at the blog of the John W. Kluge Center at the Library of Congress

Katrin Weller is a senior researcher at GESIS Leibniz Institute for the Social Sciences in Cologne and the author of “Knowledge Representation in the Social Semantic Web.”

How Future Historians Might Use Your Tweets

More Must-Reads from TIME