Inside Anthropic, the AI Company Betting That Safety Can Be a Winning Strategy

14 minute read
By Billy Perrigo/San Francisco
Photograph by Ian Allen for TIME
14 minute read

In the summer of 2022, Dario Amodei had a difficult decision to make. Anthropic, the AI company where he is co-founder and CEO, had just finished training a new chatbot that was far more powerful than anything he had seen before. The team working on it at Anthropic’s San Francisco headquarters were in awe of their creation, which they christened Claude.

Releasing Claude, Amodei knew, could earn fame and fortune for Anthropic, a roughly 50-person startup that had only launched the previous year. But he was worried about the potential consequences of unleashing the AI upon the world—so worried, in fact, that he ultimately decided not to, opting instead to continue internal safety testing. Some three months later, a rival lab, OpenAI, launched a similar product called ChatGPT. It set off a frenzy of hype and investment that has reshaped the entire tech industry.

Many Silicon Valley entrepreneurs would see that kind of missed opportunity as the regret of a lifetime. But for Amodei, it was about more than business: he wanted to avoid triggering a race to build bigger, and perhaps more dangerous, AI systems. “I suspect it was the right thing to do,” says Amodei, 41, twirling a lock of curly dark hair between his fingers during a two-hour interview in May. “But it’s not totally clear-cut.”

Photograph by Ian Allen for TIME

His uncertainty is understandable, given that a race began anyway and that his decision likely cost Anthropic billions of dollars. But ChatGPT woke regulators up to something Amodei had been worrying about for years: that advanced AI could, if handled poorly, be catastrophically risky. Last July, Amodei testified in front of Senators in Washington, D.C.—arguing that systems powerful enough to “create large-scale destruction” and change the balance of power between nations could exist as soon as 2025.

Others, including OpenAI CEO Sam Altman, had made similar warnings. But many in the AI-safety community felt Amodei had greater credibility, viewing Anthropic’s decision to withhold Claude as a signal of its commitment to prioritizing safety over money and acclaim. The lab was an underdog: the smallest of all the companies building “frontier” AI systems, the youngest, the least well-financed, and the most expressly committed to safety. This reputation has mostly endured, even as Anthropic has raised more than $7 billion from investors including Amazon and Google, expanded to around 500 employees, and launched three generations of its Claude chatbot. (Salesforce, where TIME co-chair and owner Marc Benioff is CEO, has also invested.)

Claude 3, which Anthropic released in March, was by some measures the most capable publicly available AI system at the time, outperforming OpenAI’s GPT-4 and Google’s Gemini. That put Anthropic in the curious position of having a reputation as the most cautious AI company, while also owning—and selling access to—one of today’s most advanced versions of the technology. Three days spent at Anthropic’s headquarters, and interviews with Amodei and nine senior employees, made it clear they don’t see that as a contradiction. “We’re not a company that believes a certain set of things about the dangers that AI systems are going to have,” Amodei says. Figuring out what those dangers really are is “an empirical question”—one he sees as Anthropic’s mission to answer with evidence. That, he says, requires building and studying powerful systems.

Amodei makes the case that the way Anthropic competes in the market can spark what it sees as an essential “race to the top” on safety. To this end, the company has voluntarily constrained itself: pledging not to release AIs above certain capability levels until it can develop sufficiently robust safety measures. Amodei hopes this approach—known as the Responsible Scaling Policy—will pressure competitors to make similar commitments, and eventually inspire binding government regulations. (Anthropic’s main competitors OpenAI and Google DeepMind have since released similar policies.) “We’re not trying to say we’re the good guys and the others are the bad guys,” Amodei says. “We’re trying to pull the ecosystem in a direction where everyone can be the good guy.”

***

Growing up in an Italian-American family in San Francisco, Amodei displayed precocious talent from an early age. As a toddler, he would declare “counting days” and strive to count as high as he could, his sister Daniela recalls their mother saying. By the 11th grade, Dario was taking undergrad math classes at the University of California, Berkeley—but unlike many kids who excel at quantitative subjects, he “was equally interested in the arc of human events,” says Daniela, who is Anthropic’s president and co-founder. The young siblings grew up hearing stories of how, in the 1930s, their maternal grandmother had chained herself to the Italian embassy in Chicago to protest the country’s invasion of Ethiopia. “We thought about and cared about: do people in other parts of the world have what we have?” Daniela says of the family’s attitude. The pair “both felt this immense responsibility for wanting to make the world better,” she recalls. 

After a physics Ph.D. at Princeton, Amodei became a machine-learning researcher. In 2016 he joined OpenAI, where he helped discover the so-called scaling laws—which essentially proved that better performance could be achieved by training AI systems with more data and computing power, rather than relying on new algorithms. Amodei grew concerned that those factors, combined with market incentives, could undermine safety. “We’re building a technology that’s powerful and potentially dangerous,” he says. “It's built from simple components. And anyone can build it if they have enough money.”

In 2020, Amodei and several colleagues discussed leaving OpenAI, which had just signed a $1 billion deal with Microsoft. Amodei, then vice president for research, distrusted Altman and president Greg Brockman, according to one person who spoke with Amodei at the time. In late 2020, he and six senior staff resigned, and founded Anthropic in early 2021. Seven more OpenAI staff joined soon after. Asked about his reasons for leaving, Amodei is diplomatic. “It all comes down to trust, and having the same values and the same mission alignments,” he says of his co-founders. “We were on the same page. We trusted each other. We were doing this for the right reasons.” Asked if this means he did not trust others at OpenAI, Amodei declines to comment.

Several of Anthropic’s initial employees and funders had ties to effective altruism (EA), a philosophy and movement popular in Silicon Valley that aims to do the most good in the world by using quantitative methods. Effective altruists were some of the earliest people to take seriously the study of catastrophic risks from AI, and many in the AI safety community—though not all—subscribe to the philosophy to varying degrees. EA has become more controversial in the last 18 months, in part because of disgraced cryptocurrency mogul Sam Bankman Fried, who identified as an EA and is currently serving a 25-year jail sentence for fraud. Through his firm FTX, Bankman Fried invested $500 million in Anthropic. (The majority of FTX’s stake was sold in March to a consortium of investors; the rest is held by the FTX estate, which has a mandate to make defrauded investors whole.) Some of Anthropic’s earliest funding came from other EA-affiliated investors, including Facebook co-founder Dustin Moskovitz and Skype co-founder Jaan Tallinn. Ties to effective altruism probably go deeper at Anthropic than they do at rival AI labs, though the movement’s stamp on the company appears to have waned as Anthropic has grown to more than 500 people. Neither Dario nor Daniela Amodei have ever personally identified as EAs, a company spokesperson said, but added that the siblings are “clearly sympathetic to some of the ideas that underpin effective altruism.”

From left: Geoffrey Irving and Dario Amodei show an autonomous system that has taught itself to play video games, at OpenAI in San Francisco, on July 10, 2017.
From left: Geoffrey Irving and Dario Amodei show an autonomous system that has taught itself to play video games, at OpenAI in San Francisco, on July 10, 2017.Christie Hemm Klok—The New York Times/Redux

In any case, their belief in the transformative nature of AI led Anthropic’s co-founders to structure their new company differently from the one they’d departed. Anthropic is a public benefit corporation, meaning its board is legally empowered to balance returns for investors with a separate mission to ensure that “transformative AI helps people and society flourish.” A separate body of experts in international development, AI safety, and national security, called the Long Term Benefit Trust, has the power to elect and fire a subset of the board: currently one out of five, rising to three out of five by November. (The trust’s members have no equity in the company.) Amodei argues this system aligns the interests of the public, employees, and shareholders, in a way that doesn’t compromise Anthropic’s stability, giving it greater leeway to sacrifice profits if it judges doing so is necessary for safety. “We mostly run the business like normal,” Amodei says, “but when we run into something that affects people outside the market transaction who didn’t consent to that transaction, we’re able to do the right thing.” Still, while the structure is different from OpenAI’s, power ultimately lies with a small, unaccountable group. And while board members are somewhat shielded from shareholder lawsuits, it’s unclear whether the public could sue Anthropic’s board members for not prioritizing safety.

Read More: How Anthropic Designed Itself to Avoid OpenAI’s Mistakes

A fundamental fact underpins most worries about today’s machine-learning systems: they are grown, not designed. Instead of writing explicit code, computer scientists feed huge amounts of data into neural networks, which are pattern-matching systems. With enough data and computing power, neural networks learn—nobody knows exactly how—to speak, do arithmetic, recognize concepts, and make logical connections. But look inside, and all you see is a bunch of inscrutable numbers. “People are often surprised that we don’t understand these systems,” says Chris Olah, an Anthropic co-founder who leads the lab’s interpretability team. “The core reason is because we grow them, rather than create them directly.”

AI companies including Anthropic are now scaling at a breakneck pace, raising the question of what new capabilities might emerge. Today, researchers seeking to assess if an AI is safe chat with it and examine the outputs. But that approach fails to address the concern that future systems could conceal their dangerous capabilities from humans. “What we’d like to be able to do is look inside the model as an object—like scanning the brain instead of interviewing someone,” Amodei says. In a major breakthrough toward that goal, Anthropic announced in May that researchers had identified millions of “features”—combinations of artificial neurons representing individual concepts—inside a version of Claude. By toggling those features on and off, they could alter Claude’s behavior. This new strategy for addressing both current and hypothetical risks has sparked a wave of optimism at Anthropic. Olah says Anthropic’s bet that this research could be useful for safety is “now starting to pay off.”

***

On the day of our interview, Amodei apologizes for being late, explaining that he had to take a call from a “senior government official.” Over the past 18 months he and Jack Clark, another co-founder and Anthropic’s policy chief, have nurtured closer ties with the Executive Branch, lawmakers, and the national-security establishment in Washington, urging the U.S. to stay ahead in AI, especially to counter China. (Several Anthropic staff have security clearances allowing them to access confidential information, according to the company’s head of security and global affairs, who declined to share their names. Clark, who is originally British, recently obtained U.S. citizenship.) During a recent forum at the U.S. Capitol, Clark argued it would be “a chronically stupid thing” for the U.S. to underestimate China on AI, and called for the government to invest in computing infrastructure. “The U.S. needs to stay ahead of its adversaries in this technology,” Amodei says. “But also we need to provide reasonable safeguards.”

Read More: No One Truly Knows How AI Systems Work. A New Discovery Could Change That 

Not everyone believes Anthropic’s narrative about itself. Some critics say that while the lab is doing important safety research, its creation of frontier AI models still heightens dangerous competitive pressures. Others—both skeptics of AI hype and “accelerationists” who want to see AI built as fast as possible—argue that its calls for regulation are a bid for regulatory capture by Big Tech. (Amodei flatly rejects that claim: “It’s just not true that a lot of what we’re advocating for is going to help the large companies.”) Some worry that its relentless focus on so-called “existential” risks is a distraction from nearer-term worries like bias, copyright infringement, and the environmental costs of training new AI models. 

And even if Anthropic succeeds in encouraging an industry-wide “race to the top” on safety, its commitments thus far—including the one to not release unsafe models—have all been voluntary. “What they’ve set up is a process that could easily fall by the wayside to the profit motive,” says Andrew Strait, an associate director at the Ada Lovelace Institute, an AI think tank, referring to Anthropic and its competitors who have made similar commitments. “It’s not a bad thing for companies to be putting these [policies] out, but it’s now on governments to come up with the surrounding regulatory infrastructure to bolster that, and make it so they’re not the ones setting their own thresholds.”

But where others see contradictions, Amodei sees nuance. He envisions different paths depending on what Anthropic learns about the difficulty of making AI safe. If it turns out that the task of aligning AI systems to human values is easy, he wants Anthropic to forge ahead, with a focus on minimizing harms like misuse. If it’s technically difficult, he wants to focus on the breakthroughs necessary to reduce catastrophic risks. And if it’s near impossible, he would want Anthropic to gather “very strong evidence” that would allow him to say to government officials, “There is a clear and present danger.” He simply couldn’t do that today. “I don’t think it would be credible,” he says.

The question remains whether Anthropic can survive long enough to get to that point. Claude 3 cost somewhere between $30 million and $300 million to train, Amodei says, declining to be more specific. He predicts training frontier models in 2024 will cost on the order of $1 billion; the trend suggests the generation after that would cost more like $10 billion. If those models fail to meet expectations, investment could dry up and AI progress would stall. If the exponential trend holds, Anthropic will need more funding to keep up with Google, Microsoft, and Amazon. All are now training their own models in-house, and have far more cash than Anthropic to spend on the computing power demanded by modern AI.

It’s unclear where this money will come from, and what concessions new investors might seek in return. Big tech companies might stump up more cash, perhaps on the condition of a change to Anthropic’s public benefit structure. Anthropic could raise the money itself by selling Claude more aggressively, thus further exposing itself to the perverse incentives of the market. It could turn to the government for funding—an option Amodei says he is open to. If none of those options works, a larger competitor may attempt to acquire Anthropic. But the lab’s executives are confident that its combination of talented staff, proprietary algorithms, and reputation for safety will keep Anthropic independent and at the frontier for years to come. “The essential bet of Anthropic is, we will show in business that a safer technology makes more money,” says policy chief Clark. “So whenever I see competition, I’m like: Cool. Bring it on.”  

Anthropic employees trade in metaphors: brain scanners, “grown” neural networks, races to both top and bottom. Amodei offers one more, comparing his decision not to release Claude in 2022 to the prisoner’s dilemma. In this famous game-theory experiment, two prisoners face a choice: betray the other for a chance at freedom, or stay silent and cooperate for a reduced sentence. If both betray, they each fare worse than if they’d cooperated. It’s a situation where individual incentives lead to worse collective outcomes—a dynamic Amodei sees playing out in the AI industry today. Companies taking risks are rewarded by the market, while responsible actions are punished. “I don’t want us to be in this impossible prisoner’s dilemma,” Amodei says. “I want to change the ecosystem so there is no prisoner’s dilemma, and everyone’s incentivized to do the right thing.”

—With reporting by Will Henshall/Washington

More Must-Reads from TIME

Write to Billy Perrigo/San Francisco at billy.perrigo@time.com