Employees at Top AI Labs Fear Safety Is an Afterthought

March 11, 2024 9:00 AM EDT

Workers at some of the world’s leading AI companies harbor significant concerns about the safety of their work and the incentives driving their leadership, a report published on Monday claimed.

The report, commissioned by the State Department and written by employees of the company Gladstone AI, makes several recommendations for how the U.S. should respond to what it argues are significant national security risks posed by advanced AI.

The report’s authors spoke with more than 200 experts for the report, including employees at OpenAI, Google DeepMind, Meta and Anthropic—leading AI labs that are all working towards “artificial general intelligence,” a hypothetical technology that could perform most tasks at or above the level of a human. The authors shared excerpts of concerns that employees from some of these labs shared with them privately, without naming the individuals or the specific company that they work for. OpenAI, Google, Meta and Anthropic did not immediately respond to requests for comment.

“We have served, through this project, as a de-facto clearing house for the concerns of frontier researchers who are not convinced that the default trajectory of their organizations would avoid catastrophic outcomes,” Jeremie Harris, the CEO of Gladstone and one of the authors of the report, tells TIME.

One individual at an unspecified AI lab shared worries with the report’s authors that the lab has what the report characterized as a “lax approach to safety” stemming from a desire to not slow down the lab’s work to build more powerful systems. Another individual expressed concern that their lab had insufficient containment measures in place to prevent an AGI from escaping their control, even though the lab believes AGI is a near-term possibility.

Still others expressed concerns about cybersecurity. “By the private judgment of many of their own technical staff, the security measures in place at many frontier AI labs are inadequate to resist a sustained IP exfiltration campaign by a sophisticated attacker,” the report states. “Given the current state of frontier lab security, it seems likely that such model exfiltration attempts are likely to succeed absent direct U.S. government support, if they have not already.”

Many of the people who shared those concerns did so while wrestling with the calculation that whistleblowing publicly would likely result in them losing their ability to influence key decisions in the future, says Harris. “The level of concern from some of the people in these labs, about the decisionmaking process and how the incentives for management translate into key decisions, is difficult to overstate,” he tells TIME. “The people who are tracking the risk side of the equation most closely, and are in many cases the most knowledgeable, are often the ones with the greatest levels of concern.”

Are you an employee at an AI lab and have concerns that you might consider sharing with a journalist? You can contact the author of this piece on Signal at billyperrigo.01

The fact that today’s AI systems have not yet led to catastrophic outcomes for humanity, the authors say, is not evidence that bigger systems will be safe in the future. “One of the big themes we’ve heard from individuals right at the frontier, on the stuff being developed under wraps right now, is that it’s a bit of a Russian roulette game to some extent,” says Edouard Harris, Gladstone’s chief technology officer who also co-authored the report. “Look, we pulled the trigger, and hey, we’re fine, so let’s pull the trigger again.”

Many of the world’s governments have woken up to the risk posed by advanced AI systems over the last 12 months. In November, the U.K. hosted an AI Safety Summit where world leaders committed to work together to set international norms for the technology, and in October President Biden issued an executive order setting safety standards for AI labs based in the U.S. Congress, however, is yet to pass an AI law, meaning there are few legal restrictions on what AI labs can and can’t do when it comes to training advanced models.

Biden’s executive order calls on the National Institute of Standards and Technology to set “rigorous standards” for tests that AI systems should have to pass before public release. But the Gladstone report recommends that government regulators should not rely heavily on these kinds of AI evaluations, which are today a common practice for testing whether an AI system has dangerous capabilities or behaviors. Evaluations, the report says, “can be undermined and manipulated easily,” because AI models can be superficially tweaked, or “fine tuned,” by their creators to pass evaluations if the questions are known in advance. Crucially it is easier for these tweaks to simply teach a model to hide dangerous behaviors better, than to remove those behaviors altogether.

The report cites a person described as an expert with “direct knowledge” of one AI lab’s practices, who judged that the unnamed lab is gaming evaluations in this way. “AI evaluations can only reveal the presence, but not confirm the absence, of dangerous capabilities,” the report argues. “Over-reliance on AI evaluations could propagate a false sense of security among AI developers [and] regulators.”

Employees at Top AI Labs Fear Safety Is an Afterthought, Report Says

More Must-Reads from TIME