As siblings go, Dario and Daniela Amodei agree more than most. “Since we were kids, we’ve always felt very aligned,” Daniela says.
Alignment is top of mind for the brother-and-sister duo at the helm of Anthropic, one of the world’s leading AI labs. In industry lingo, the term means ensuring AI systems are “aligned” with human values. Dario, 40, and Daniela, 36—CEO and president of Anthropic, respectively—believe they are taking a safer and more responsible approach to AI alignment than other companies building cutting-edge AI systems.
Anthropic, which was founded in 2021, has carried out pioneering “mechanistic interpretability” research that aims to allow developers to carry out something analogous to a brain scan—to see what’s really going on inside an AI system, rather than relying on its text outputs alone, which don’t give a true representation of its inner workings. Anthropic has also developed Constitutional AI, a radical new method for aligning AI systems. It has embedded those approaches into its latest chatbot, Claude 2, a close competitor to GPT-4, OpenAI’s most powerful model.
Constitutional AI allows developers to explicitly specify the values their systems should adhere to, via the creation of a “constitution,” separating the question of whether an AI can do something from the more politically fraught question of whether it should. The other leading method for AI alignment—called reinforcement learning from human feedback (RLHF)—can often result in those two questions being “mixed together,” Dario says. Recent research from Carnegie Mellon shows that chatbots with more RLHF training tend to give more socially and economically liberal answers than those that don’t. That could be because the training process often rewards the models for being inclusive and inoffensive. Constitutional AI allows developers to instill a codified set of values into an AI rather than letting them be implicitly and imperfectly set via RLHF.
“I think it’s useful to separate out the technical problem of: the model is trying to comply with the constitution and might or might not do a perfect job of it, [from] the more values debate of: Is the right thing in the constitution?” Dario says. That the two questions have often been conflated in the past, he says, has “led to unproductive discussions about how these systems work and what they should do.”
Anthropic has seven founders, all of whom previously worked at OpenAI before leaving to start their own company. Dario and Daniela are diplomatic about what, if anything, pushed them to leave, but suggest they had a different vision for building safety into their models from the beginning. “I think our existence in the ecosystem hopefully causes other organizations to become more like us,” Dario says. “That’s been our general aim in the world and part of our theory of change.”
Accordingly, Anthropic casts itself as an AI safety-research lab. To do that research, however, the Amodei siblings have calculated they need to build their own state-of-the-art AI models. For that, they need vast amounts of computing power, which in turn means they need a lot of money. That means that rather than acting as a nonprofit, they need to operate as a business that sells access to its AI models to other businesses and raises funds from investors. Anthropic has raised $1.6 billion, including $500 million from the now bankrupt FTX crypto exchange. (Investors in Anthropic also include Salesforce, where TIME co-chair and owner Marc Benioff is CEO.)
Anthropic’s founders recognize the tensions inherent in this commercial approach—that they might be contributing to the very problem they founded Anthropic to prevent—but believe it’s the only way to do meaningful AI safety research. “There’s this intertwinement—it’s one of the things that makes the problem hard—between the safety problems and the kind of inherent capabilities of the model,” Dario says.
To try and insulate themselves from some of the perverse incentives that the market can create, Anthropic’s leaders structured it as a public benefit corporation, meaning it was created to generate social and public good. In practice, this makes it harder for investors to sue if they feel that Anthropic is prioritizing goals other than financial return. But whether Anthropic has resolved the tension between AI safety and operating as a corporation is a question that persists. In April, TechCrunch reported Anthropic had attempted to raise funds by promising potential investors it would use their cash to build Claude-Next, a model that it said would be 10 times more capable than today’s most powerful AI.
Dario disputes the report without providing specifics, declining to comment on Claude-Next. He also rejects the idea that the decision to operate Anthropic as a company has locked it into a counterproductive race to build larger models, but “one way or another, the scaling up of the models is part of our plan,” he says. “Simply not doing it, I think, isn’t a solution.”
Anthropic’s leaders are notably less outspoken than other AI luminaries about the potential benefits that AI could bring to humanity’s future. That’s “not because we don’t care or because we don’t think it’s there,” Daniela says, “but because we think there’s so much important work to do right now.” Her brother agrees, but strikes an even more skeptical tone about the value of discussing utopian scenarios for AI. “I just don’t want it to turn into corporate propaganda.”
- Where Trump 2.0 Will Differ From 1.0
- How Elon Musk Became a Kingmaker
- The Power—And Limits—of Peer Support
- The 100 Must-Read Books of 2024
- Column: If Optimism Feels Ridiculous Now, Try Hope
- The Future of Climate Action Is Trade Policy
- FX’s Say Nothing Is the Must-Watch Political Thriller of 2024
- Merle Bombardieri Is Helping People Make the Baby Decision