Want to Regulate Facebook and User Data? Fix the Software

Ideas

April 3, 2018 2:07 PM EDT

Yang is an Assistant Professor of Computer Science at Carnegie Mellon University.

Learning that someone has used your Facebook data for questionable purposes feels like when you tell a secret to an untrustworthy friend and then they don’t keep it. While it makes sense to go after Facebook for being irresponsible and allowing this to happen, there is a deeper problem, one that is much bigger than Facebook.

The problem is that apps — like the one used by the company which handed over millions of Facebook users’ data to the political firm Cambridge Analytica — force consumers to make an all-or-nothing choice. Either we give up all of our data to the app, or we cannot use the app at all. The so-called dialog boxes that pop up when users download an app often offer little dialog at all.

Wouldn’t it be nice to live in a world in which you could specify exactly how — and for how long — your data may be used, no matter who the data is given to? Imagine if you could get a guarantee that the Yelp app uses your Facebook location information only while you are searching for somewhere to eat and then forgets your location immediately afterward. Or if you could know for sure that Snapchat will only share your camera data when you have taken a Snapchat photo, with exactly the users you intended to share it with, and for the number of seconds specified? (It’s unnerving that Facebook still reportedly has all of some users’ videos — even those they never published — which Facebook has said it is investigating.) Wouldn’t you feel much safer with a guarantee that if you take a personality quiz, the results will be shown only to you and friends you share them with — and not, say, sold to political strategists who are trying to influence major election outcomes? Even if it’s just about what cat you would be. (Full disclosure: I’ve been told I’m a Maine coon cat.) And instead of relying on apps to tell you what the choices are with respect to your data, imagine if you could specify with your data itself — for instance, in your phone settings that go with your camera or microphone — how much you want to limit it being shared. We can use the “Find My iPhone” feature to locate our phone, but we have no way of tracking down where our data goes once we release it to our apps.

While the Federal Trade Commission has been cracking down on software companies in the last few years, their role has largely been to ensure that companies do what they say they do with our data. Some U.S. states have stronger protections: for example, Illinois and Texas banned the Google Arts and Culture app, which matches a user’s selfie to portraits in paintings, because they require applications to get explicit user consent when obtaining biometric data. (Yes, selfies can be mined for such things.) And some sectors also have greater safeguards as well. In general, protections surrounding health data are stronger than protections for other consumer data: the Health Insurance Portability and Accountability Act requires, for instance, individual authorization for the sale of personal health information. In terms of stronger consumer data protection, the European Union will also soon put into place the General Data Protection Regulation (GDPR), which disallows sneaky opt-out tactics for data sharing, permits apps to ask only for data relevant to its own specific uses (this should keep, for instance, flashlight apps from asking for permission to delete other apps) and explicitly forbids using consumer data to uncover political opinions.

But a lack of legislative oversight is only one of the reasons we ended up giving Angry Birds permission to see who we call. The other part of the story is technical: given current software practices, checking security and privacy compliance is a mess. Code for enforcing security and privacy is tangled up with other code, making it hard for both developers and auditors to look at a code base and determine which policies are being enforced.

Think of the spread of sensitive information like that of a contagious disease. With disease, we can rely on self-tracking and quarantines: People will come to the doctor’s office because they’re sick; we can then keep them away from infecting other people and ask who else they’ve been in contact with, to see if those people have been infected too. Data does none of this. It does not alert anyone when it’s been compromised — that is, for example, being sent to someone without our permission to possess it. This means we also have no way of knowing if it helped compromise others.

Worse still, because of software’s tangled nature, every line of code is an opportunity for contamination. In order to prevent data contamination, it is currently up to developers and auditors to keep track of how sensitive information spreads through potentially complex interactions within a piece of software. And given that an app like Facebook is 62 million lines of code — well, there are a lot of interactions to keep an eye on.

It is just about impossible to seriously regulate data use with these current practices in place. We need to build security and privacy controls into software tools. Researchers have been developing techniques for doing precisely this. There exist techniques that can, for instance, ensure that an app can read camera information but not send it across the network to anybody else. My research group, in collaboration with researchers at UC Santa Cruz, UC San Diego, Harvard and MIT, is working on a set of techniques that allow programmers to attach precise, complex rules about data use — like “only my friends near me can see my location between 9 a.m. Monday and 5 p.m. Friday” — directly to sensitive data values, allowing developers to write these kinds of policies in one place and auditors to check such policies by looking in a single location. (Full disclosure: Facebook has contributed funding to my research group, and we collaborate with two Facebook employees on a non-privacy related aspect of the work. I also worked on backend privacy at Facebook as an intern in 2012.) This is part of a broader context of researchers at places like Cornell, Stanford and MIT, where there are also groups actively working on information flow security techniques for preventing these kinds of leaks. Requiring a software company like Facebook to use such techniques would make it much easier to enforce higher-level regulation.

A world where consumers have control over their data privacy will require both regulators and companies to buy in. But switching to a model where consumers specify the permissions on their data across third-party apps would involve significant and coordinated infrastructure changes across companies. Creating the motivation for such a grand shift requires both legislative action — the reaction to GDPR has been promising, as companies have already begun to invest more in data protection — as well as consumer demand for more ownership over our data, something that has been happening on an impressive scale since the Cambridge Analytica story came out. A good start would be to go through your phone and revoke permissions — or even better, delete — apps that use unreasonable permissions, like asking for access to your call history or permission to delete other apps when that access does not seem relevant to the functioning of the app. Consider it your civic duty to demand that only you and your connections know what kind of cat your best friend should be.

Want to Keep Personal Information Safe Online? Fix the Software

More Must-Reads from TIME