While some social media users have learned not to take photographs at face value due to the power of image editing software like Photoshop, video — especially featuring a figure of authority like, say, the President of the United States — is generally accepted as authentic. But that may not be the case for much longer.
Researchers from the University of Washington have figured out how to synthesize photorealistic lip syncing using completely unrelated audio. To demonstrate the power of their project, they transformed the lip movements and gestures of former President Barack Obama in various speeches throughout his political career.
“Given audio of President Barack Obama, we synthesize a high quality video of him speaking with accurate lip sync, composited into a target video clip,” an excerpt from the video’s YouTube description reads. “Trained on many hours of his weekly address footage, a recurrent neural network learns the mapping from raw audio features to mouth shapes.
“Given the mouth shape at each time instant, we synthesize high quality mouth texture, and composite it with proper 3D pose matching to change what he appears to be saying in a target video to match the input audio track,” the description added.
Using side-by-side comparisons from the original speeches and modified lip syncing, viewers can at once see the difference and striking similarity of the two Obama videos. While the videos alter due to different seating and lighting arrangements, the physical gestures and lip movements look realistic despite the two speeches originally containing completely different content.
While it’s easy to consider the ramifications of such technology when it comes to the fight against “fake news,” researchers say there are a number of practical applications. Video synthesis could allow hearing-impaired people to lip-read from audio originally captured through a telephone, and the technology would have a number of entertainment applications when it comes to special effects in film and video games.