Illustration by TIME; reference image courtesy of Yejin Choi

A New Yorker cartoon shows a couple sitting facing each other from atop pieces of rubble in an apocalyptic wasteland. Which is the better caption: “Oh well, we’ve survived worse,” or “I’d like to see other people”?

Your guess is better than ChatGPT’s. Last year, Yejin Choi, professor of computer science at the University of Washington and a 2022 recipient of the prestigious MacArthur “genius” grant, co-authored an award-winning paper that tested whether a selection of advanced AI systems could guess the winning entries in a selection of New Yorker caption contests and explain why the best caption is funny. According to Choi, cartoon caption writers’ jobs are safe—at least, for now.

Choi’s research focuses on the many ways in which human intelligence differs from that of AIs like ChatGPT. “A calculator can calculate better and faster than I do, but it doesn’t mean that a calculator is superior to any of us in other dimensions of intelligence.”

Choi, who was born in South Korea and moved to the U.S. in 2000 to work for Microsoft, has spent much of her career researching whether AI systems can develop common sense and humor. More recently, Choi, 46, has taken an interest in building AI systems that understand social and moral norms. “It started with my interest in equity and diversity,” she says. But Choi soon realized that moral norms are also relevant to alignment—the problem of making sure that AI systems behave as their creators intend. Those concerned about alignment often worry that a rogue AI might develop harmful goals and end up killing humans in the pursuit of those goals. But doing so would clearly be a violation of moral norms, says Choi. It’s not easy, though, to understand how we develop our own sense of morality, she says. “It’s mysterious how humans acquire that.”

To begin with, humans learn right from wrong by instruction, Choi says. This is, in some sense, how many large language models, like the one that powers OpenAI’s ChatGPT, are taught to behave too, in a process called reinforcement learning from human feedback (RLHF). But Choi says the fact that a cleverly crafted instruction can cause an AI system to spit out a problematic response—and that the current training model is “a sort of patchwork or whack-a-mole at scale”—should lead us to keep searching for “a better, more robust, simple solution.”

Choi now spends much of her time thinking about the missing components for teaching moral values to AI systems. But, she cautions, even if this is solved, there are further problems to address. “Alignment assumes that there’s one mathematical objective that you can optimize for,” she says. “But I don’t think human society is like that, due to the fact that we’re just so different from each other.” Given the wide range of cultural norms—even within the same communities, different generations hold very different values—there’s no one correct solution to optimize for, Choi argues. “Somehow we need to figure out how to support these pluralistic values that different individuals have.”

More Must-Reads from TIME

Write to Will Henshall at will.henshall@time.com.

Sougwen Chung
Yann LeCun
Abeba Birhane
Audrey Tang
Grimes
EDIT POST