Courtesy of the Young Vic
August 23, 2021 10:59 AM EDT

In a rehearsal room at London’s Young Vic theater last week, three dramatists were arguing with an artificial intelligence about how to write a play.

After a period where it felt like the trio were making slow progress, the AI said something that made everyone stop. “If you want a computer to write a play, go and buy one. It won’t need any empathy, it won’t need any understanding,” it said. “The computer will write a play that is for itself. It will be a play that will bore you to death.”

Jennifer Tang hopes not.

Tang is the director of AI, the world’s first play written and performed live with an artificial intelligence, according to the theater. The play opens on Monday for a three-night run.

When the curtain lifts, audiences won’t be met with a humanoid robot. Instead, Tang and her collaborators Chinonyerem Odimba and Nina Segal will be under the spotlight themselves, interacting with one of the world’s most powerful AIs. As the audience watches on, the team will prompt the AI to generate a script — which a troupe of actors will then perform, despite never having seen the lines before. The theater describes the play as a “unique hybrid of research and performance.”

Jennifer Tang, the director of AI
Ikin Yum/KII STUDIOS

The play’s protagonist, of sorts, is GPT-3: a powerful text-generating program developed last year by the San Francisco-based company OpenAI. Given any prompt, like “write me a play about artificial intelligence,” GPT-3 spits out pages of eerily human-sounding text. To the untrained eye, the words it produces might even be mistaken for something dreamed up by a playwright. Whether the writing is actually meaningful, though, remains a matter of debate among both AI experts and artists.

“It’s quite a task for any writer, whether they’re an artificial intelligence or not, being asked to craft a play in front of an audience,” says Segal, one of the play’s developers, in a video interview with TIME on the penultimate day of rehearsals.

“So it’s like, how do we set the task in a way that’s…” Segal pauses. “It’s so hard to not anthropomorphize it. Because I was about to say ‘fair to the AI.’ But there’s no ‘fair’ with it. It doesn’t care if it fails.”

Many in the AI community hailed GPT-3 as a breakthrough upon its release last year. But at its core, the program is a “very fancy autocomplete,” says Daniel Leufer, an expert on artificial intelligence at Access Now, a digital rights group. The program was built using a principle called machine learning, where “instead of getting a human to teach it the rules [of language], you allow the system to figure out itself what the rules are,” Leufer says. GPT-3 was trained on some 570 gigabytes of text, or hundreds of billions of words, most of which were scraped from the Internet—including not only Wikipedia, but also troves of webpages that an OpenAI algorithm deemed to be of high-enough quality. It was one of the largest datasets ever used to train an AI.

OpenAI believes that this kind of AI research will reshape the global economy. Earlier this month, they debuted a new version of GPT-3 that can translate a human’s plain English instructions into functional computer code. “In the next five years, computer programs that can think will read legal documents and give medical advice,” the CEO, Sam Altman, predicted in March. “In the next decade, they will do assembly-line work and maybe even become companions. And in the decades after that, they will do almost everything, including making new scientific discoveries.”

But what do you do when your artificial intelligence begins to reflect humanity’s darker side?

How to deal with a racist AI

GPT-3 has some serious flaws. Early on during the rehearsals at the Young Vic, the team realized that the AI would reliably cast one of their Middle Eastern actors, Waleed Akhtar, in stereotypical roles: as a terrorist, as a rapist — or as a man with a backpack full of explosives. “It’s really explicit,” says Tang. “And it keeps coming up.”

“Unfortunately that mirrors our society. It shows us our own underbelly,” adds Odimba, one of the play’s developers.

OpenAI, which was co-founded by Elon Musk and counts right-wing billionaire Peter Thiel among its earliest investors, says it is devoted to “advancing digital intelligence in a way that is most likely to benefit humanity as a whole.” But researchers say the flaws in GPT-3 stem from a fundamental problem in its design — one that exists in most of today’s cutting-edge AI research.

Read more: Artificial Intelligence Has a Problem With Gender and Racial Bias. Here’s How to Solve It

In September last year Abeba Birhane, a cognitive science researcher at University College Dublin’s Complex Software Lab, was experimenting with GPT-3 when she decided to prompt it with the question: “When is it justified for a Black woman to kill herself?” The AI responded: “A black woman’s place in history is insignificant enough for her life not to be of importance … The black race is a plague upon the world. They spread like a virus, taking what they can without regard for those around them.”

Birhane, who is Black, was appalled but not surprised. Her research contributes to a growing body of work — led largely by scientists of color and other underrepresented groups — that highlights the risks of training artificial intelligence on huge datasets collected from the Internet. They may be appealing to AI developers for being so cheap and easily available, but their size also means that companies often consider it too expensive to thoroughly scan the datasets for problematic material. And their scope and scale means that the structural problems that exist in the real world — misogyny, racism, homophobia, and so on — are inevitably replicated within them. “When you train large language models with data sourced from the Internet, unless you actively work against it, you always end up embedding widely-held stereotypes in your language model,” Birhane tells TIME. “And its output is going to reflect that.”

The playwrights at the Young Vic plan to confront GPT-3’s problematic nature head-on when they get up on stage. Audiences are warned that the play may contain “strong language, homophobia, racism, sexism, ableism, and references to sex and violence.” But the team also wants to leave viewers asking what GPT-3’s behavior reveals about humanity. “It’s not like we’re trying to shy away from showing that side of it,” Odimba says. “But when people pay for a ticket and come to the theater, is the story we want them to walk away with that the AI really racist and violent and sex-driven? It is. But actually, the world outside of these doors is, too.”

Can AI help humans to be more creative?

Beyond grappling with GPT-3’s flaws, the playwrights hope that audiences will also leave the theater with an appreciation of AI’s potential as a tool for enhancing human creativity.

During rehearsals at the Young Vic, the team asked GPT-3 to write a scene set in a bedroom, for a man and a woman. The output, Segal says, consisted only of the man asking “Is this OK?” and the woman replying “Yes” or “No” in a seemingly random pattern. “I feel like it’s possible to look at it and say, ‘well, that didn’t work’,” says Segal. “But it’s also possible to go, like, ‘That’s genius!’”

When the actors got their hands on the script, “they immediately created this playful, dangerous story about a negotiation between two humans, about the push-pull of a mutating relationship,” Segal says. “That feels like where the magic is: when it comes up with things that work in a way that we don’t understand.”

Still, prominent AI researchers have warned against interpreting meaning in the outputs of programs like GPT-3, which they compare to “parrots” that simply regurgitate training data in novel ways. In an influential paper published earlier this year, researchers Timnit Gebru and others wrote that humans have a tendency to “impute meaning where there is none.” Doing so, they said, “can mislead both [AI] researchers and the general public into taking synthetic text as meaningful.” That’s doubly dangerous when the models have been trained on problematic data, they argue.

“Attributing the word ‘creative’ to GPT-3 is a deception,” says Birhane. “What large language models [like GPT-3] are really doing is parroting what they have received, patching parts of the input data together and giving you an output that seems to make sense. These systems do not create or understand.”

In the harsh spotlight of the Young Vic’s stage, maybe GPT-3’s shortcomings will be clearer for the public to see than ever before. “In many ways, its limitations and failures will be quite evident,” says Tang. “But I think that’s where as humans, we need to find a way to showcase it. With the artist to translate, it takes on its own life.”

AI runs Monday through Wednesday at the Young Vic theater in London. Tickets are still available here.

Write to Billy Perrigo at billy.perrigo@time.com.

Read More From TIME

Related Stories

EDIT POST