Facebook has long been able to recognize the people in your photos and sort images by where they were taken. But it hasn’t been as precise at understanding what’s actually happening in a photo. That’s now beginning to change, thanks to new developments in the Menlo Park, Calif.’s artificial intelligence software.
Facebook says the new tech will improve its user experience in two ways. First, it’ll make it possible to search Facebook for photos based on what’s in them, rather than just by date taken, tags, or location. If you’re trying to find a photo of a paella dish you cooked last year, for example, you’ll be able to simply type “paella” in the Facebook search bar. This, Facebook hopes, will help its users quickly find images without having to remember when they were taken or how they were tagged.
Second, the upgrade will improve Facebook’s automatic alt text feature, which describes photos aloud to the visually impaired. Before the update, Facebook could describe a photo’s subjects on a rudimentary level – when describing a concert photo, for instance, Facebook might say the shot contains a person, a stage, and a guitar. After the update, Facebook will be able to tell users the specific action that’s occurring in a scene, like “this is a picture of a person playing guitar on stage.” That might seem like a minor upgrade, but it’s a big step forward for image-identification software.
Facebook previously said it’s working on improved photo recognition technology, but the new search capability has only just begun to launch publicly. Services from other technology firms, like Google and Apple, also allow users to search their photos by content.
The special sauce powering Facebook’s new technology is its computer vision engine, called Lumos. Lumos analyzes the troves of images shared to the social platform each day, giving it plenty of data to crunch and learn from. Lumos relies on a form of computer science known as “neural networks,” which aim to mimic the behavior of the human brain. Among many other tasks, neural networks can be trained to recognize specific pieces of information – a network designed to recognize images, for example, would learn how to identify a cat after being shown thousands of photos of different cats.
Joaquin Candela, director of Facebook’s Applied Machine Learning team, says that being able to recognize specific actions, like running or jumping, requires a deeper neural network. But these networks are harder to train. The deeper the network, the harder it becomes for error signals — vital for the software to learn correct from incorrect — to permeate every layer of said network.
To solve this problem, Facebook is using a “residual network,” which makes it possible to send error signals deeper into a network, says Candela. “By doing that, you open up the possibility to train networks of a depth that have never been trained before,” he adds.
Neural networks may be able to process billions of images at lightning-fast speeds, but they’re a long way from understanding single images as well as humans can. That’s largely because their knowledge is limited to the data on which they were trained. A neural network may have been shown hundreds of kinds of chairs, for instance, but it might get stuck when trying to identify a type of chair it’s never seen. A person, on the other hand, would be able to use context clues (i.e. “oh, there’s a person sitting on it”) to quickly ID a previously unknown chair.
There’s no doubt that artificial intelligence’s abilities are progressing faster than many observers expected. Still, it’s unclear when, if ever, computers will be as good as humans at recognizing images. Candela, however, remains optimistic that new innovations in AI will enable further image-recognition developments. “I think it’s going to be even more exciting as we keep making progress on what we call semantic segmentation, or a semantic understanding of images,” he says. “It’s not only detecting the objects and what’s going on, but also understanding the relations between things and bringing common sense into it.”