Pentagon vs Anthropic — The Day AI Safety's Hard Limits Vanished

· # AI 뉴스
Anthropic RSP AI 안전 펜타곤 Pete Hegseth 군사 AI Claude

Tuesday morning, February 24, 2026, Anthropic CEO Dario Amodei met with Defense Secretary Pete Hegseth at the Pentagon. The meeting atmosphere was reportedly “cordial” on the surface, but the content was different. Hegseth informed Amodei that unless Anthropic lifted military restrictions on its AI models by Friday afternoon, the $200M defense contract would be terminated. That same day, Anthropic quietly released version 3.0 of its core safety policy, RSP (Responsible Scaling Policy). RSP v3.0 removed what had been considered the company’s most important safety commitment since founding: the ‘training halt hard limit.’12

The fact that both events occurred on the same day immediately sparked controversy. Over 10 outlets including AP, CNN, Bloomberg, Politico, Fox News, CNBC, BBC, The Register, TIME, Fortune, Business Insider, and Axios simultaneously covered this intensively.

Background: Why Only Anthropic Refused

Last summer, Anthropic secured a contract worth up to $200M from the US Department of Defense. At the same time, Google, OpenAI, and Elon Musk’s xAI each received contracts of identical scale.3 These contracts centered on AI companies supplying their models to Pentagon’s military networks.

However, only Anthropic stumbled on the Pentagon’s requirements. The Defense Department set a standard condition that contractors must provide their AI for “all lawful use cases.” OpenAI, Google, xAI, and Meta agreed to this condition. Only Anthropic refused, citing two red lines. First was fully autonomous weapons systems where AI makes final strike decisions, and second was domestic mass surveillance targeting US citizens.4

In a January essay, CEO Amodei wrote: “Powerful AI analyzing billions of conversations could gauge public opinion, detect forming disloyal groups, and eliminate them before they grow.” Anthropic couldn’t accept its AI being used as such a tool.

The Weight of the Ultimatum: From Contract Termination to Defense Production Act

Pentagon deployed three cards.

First was contract termination. Canceling the $200M contract was also symbolic damage. Anthropic held the distinction of being the first AI company approved for US military classified networks in 2025. For Anthropic, working with Palantir and AWS in classified sectors, losing that status meant reputational loss equivalent to financial loss.

Second was ‘supply chain risk’ designation. This designation, typically applied to hostile foreign powers, would ban all US government contractors from using Anthropic products once applied. Anthropic would be excluded from the entire Pentagon procurement supply chain.5

Third was most powerful. Defense Production Act (DPA) invocation. DPA grants the president (or defense secretary by delegation) authority to force private companies to accept specific contracts when deemed necessary for national security. If actually invoked, Anthropic couldn’t legally prevent Pentagon from using its AI regardless of company wishes.6

A senior Pentagon official told The Register: “Using Anthropic AI within lawful uses is the Defense Department’s responsibility as end user, not Anthropic’s responsibility.” This logic denied Anthropic’s claimed control domain entirely.

RSP v2 vs v3: What Exactly Was Deleted

Anthropic’s RSP was first announced in September 2023. The core was the ASL (AI Safety Level) framework. When AI models exceeded certain dangerous capability thresholds, stricter safety standards (ASL-3, ASL-4) would apply, and if those standards weren’t met, model training or deployment would halt.

The key sentence in RSP v1·v2 was:

“Following Anthropic’s ASL framework means we commit to pausing scaling or delaying new model deployment whenever our ability to scale models exceeds our ability to comply with that ASL’s safety procedures.”7

This was the hard limit. An unconditional declaration to stop training itself if safety standards weren’t met.

RSP v3.0 (effective: February 24, 2026) deleted this sentence. What replaced it was a dual conditional commitment. Development would be delayed only when two conditions were simultaneously met. First, Anthropic must internally judge it’s leading the AI race. Second, it must judge catastrophic risks as ‘material.’2

The structure requiring both conditions simultaneously raised the trigger threshold dramatically. While previously safety standard failure equaled halt conditions, now it requires complex executive judgment.

Instead, RSP v3.0 added transparency obligations. Anthropic committed to regularly publishing ‘Frontier Safety Roadmaps’ containing future safety measure goals and ‘Risk Reports’ every 3-6 months explaining relationships between capabilities, threat models, and mitigation measures. External independent reviewers would also access reports without conflicts of interest.2

As Futurism put it, “moved from binding hard commitments to public goals of openly evaluating progress.”8

Anthropic’s Logic: Solo Halts Are Actually More Dangerous

Anthropic Chief Science Officer Jared Kaplan explained the change in an exclusive TIME interview.

“We concluded that stopping AI model training wouldn’t actually help anyone. In a situation where AI is rapidly advancing, with competitors charging ahead, we didn’t feel it was meaningful to fulfill promises alone.”9

This logic is codified in RSP v3.0’s preamble. “If one AI developer stops to implement safety measures while others continue training and deployment with less robust safeguards, the result could be a more dangerous world. The developers with the weakest protections set the pace, and responsible developers lose their ability to conduct safety research.”2

Internal recognition that change was inevitable also existed. An Anthropic employee who led RSP v3.0 design posted his views on LessWrong, agreeing with the direction of change itself. Chris Painter, METR’s policy director who reviewed the draft as independent reviewer, differed. He told TIME: “This means Anthropic has determined that risk assessment and mitigation methods aren’t keeping pace with capability advancement speed. It’s additional evidence that society isn’t prepared for potential catastrophic risks posed by AI.”9

Other Companies’ Choices: Competitors Already Complied

While Anthropic refused to lift military restrictions, competitors had already chosen different paths.

xAI’s Grok became the second company to enter Pentagon’s classified networks in early 2026. Contrasting with previous instances where Grok generated non-consensual deepfake sexual images of people, leading to blocks by Malaysian and Indonesian governments, it was deemed acceptable to deploy without military restrictions.3

OpenAI joined GenAI.mil, Pentagon’s classified AI network, in early February 2026. This enabled service members to use customized ChatGPT for non-classified tasks.3

In a January speech at Texas SpaceX facilities, Hegseth said: “AI models that don’t allow warfare will be excluded.” He stated Pentagon’s AI systems would “operate supporting lawful military applications without ideological constraints,” adding that “Pentagon’s AI won’t be ‘woke.‘”3

Defense Department manager Emil Michael previously presented the requirement that models “must be available for all lawful uses” to OpenAI, Google, xAI, and Anthropic. According to Axios, one of these had already notified Pentagon they “agree to all lawful uses.”10

Differences also existed in RSP aspects. OpenAI introduced biological weapons-related classifiers in GPT-5 system cards, configuring safety policies in directions similar to Anthropic’s ASL framework. Google DeepMind also had a similar system called Frontier Safety Framework. However, neither company had ever publicly declared ‘training halt hard limits’ like Anthropic had in RSP v1·v2. Anthropic was the only one in the industry to make that promise, and now the only one to retract it.11

Temporal Coincidence: Chance or Pressure?

RSP v3.0’s publication timing is the controversy’s core. The document’s effective date is February 24, 2026. The day Hegseth delivered his ultimatum to Amodei was also February 24, 2026. The Register directly noted both events occurred on the same day.6

Anthropic described the change as “the result of months of internal discussion.” Kaplan told TIME they had “discussed RSP redesign approaches for almost a year.” In February 2026, Amodei decided to continue new training, he also added.9 RSP v3.0 was thus the product of an already ongoing process unrelated to military pressure, according to the company.

Critics have different perspectives. They point out Anthropic had just completed a $30 billion funding round at $380 billion valuation (February 2026), while annual revenue was growing 10x year-over-year with over 500 customers spending $1M+ each. Commercial success and government contract pressure simultaneously peaked when safety policy retreated.

Regardless of whether military pressure was the direct cause, the resulting structure is clear. Pentagon’s demand for “unrestricted military use” and the deleted “halt training if safety unassured” clause point in the same direction. External pressure and internal decisions aligned in the same direction.

Remaining Red Lines and Their Limits

Notable is that Anthropic didn’t concede everything. Amodei maintained two red lines to the end in the meeting. Fully autonomous strike decision systems and domestic surveillance of US citizens. Pentagon maintained that both were also unproblematic if they complied with “lawful uses.”4

However, maintaining red lines and deleting hard limits are separate issues. Setting deployment restrictions (red lines) and having mechanisms to halt training itself (hard limits) exist at different levels. The former sets usage boundaries, while the latter was a structural device to brake capability expansion itself.

In RSP v3.0’s new system, ‘Risk Reports’ and ‘Frontier Safety Roadmaps’ are transparency tools, not enforcement tools. Whatever those reports contain, the decision to stop development must pass through internal executive dual judgment. Painter’s expression is accurate. “Promising disclosure is structurally different from promising cessation.”9

The Signal Sent by the Entire Industry

Anthropic’s RSP wasn’t just an internal document. Within months of its September 2023 debut, OpenAI and Google DeepMind adopted similar frameworks. Anthropic’s effect of raising industry baseline standards was something the company itself acknowledged.2

The disappearance of hard limits from RSP v3.0 could send reverse signals. When the company that set the industry’s de facto highest standards relaxes those standards, incentive for other companies to voluntarily maintain higher standards decreases.

Regulatory systems referencing Anthropic’s RSP — California SB 53, New York RAISE Act, EU AI Act implementation codes — face the same question. When leading company voluntary standards retreat, what standards should regulatory designers reference?

Georgetown University Center for Security and Emerging Technology’s Owen Daniels told AP: “Anthropic’s peer companies Meta, Google, xAI readily complied with department policy for model use for all lawful applications. Anthropic’s negotiating power is limited and risks losing influence in the Defense Department’s AI adoption process.”4

Two events occurring on February 24, 2026 left one question. Can hard limits for AI safety — whether military pressure, competitive pressure, or political environment — be ultimately removed when external conditions become sufficiently strong? Anthropic provided the first data point for that question.


Footnotes

  1. AP News, “Hegseth pressures Anthropic to give military broader access to its AI tech, AP source says”, February 24, 2026.

  2. Anthropic, “Responsible Scaling Policy Version 3.0”, effective February 24, 2026. 2 3 4 5

  3. AP News, “Anthropic will no longer be the only AI company approved for classified military networks”, February 24, 2026. 2 3 4

  4. CNN Business, “Pentagon threatens to make Anthropic a pariah if it refuses to drop AI guardrails”, February 24, 2026. 2 3

  5. CNBC, “Anthropic faces Friday deadline in Defense AI clash with Hegseth”, February 24, 2026.

  6. The Register, “All your bots are belong to US if you don’t play ball, DoD tells Anthropic”, February 25, 2026. 2

  7. XDA Developers, “Anthropic just dropped its core AI safety promise”, February 25, 2026. (includes RSP v2 original text)

  8. Vocal Media / Futurism, “Anthropic Softens Its Signature Safety Promise While Battling the Pentagon Over ‘Red Lines’“.

  9. TIME, “Exclusive: Anthropic Drops Flagship Safety Pledge”, February 24, 2026. 2 3 4

  10. Axios, “Pentagon-Anthropic battle pushes other AI labs into major dilemma”, February 19, 2026.

  11. Winbuzzer, “Anthropic Drops Hard Safety Limits From its AI Scaling Policy”, February 25, 2026.

← AI Hacker Breached 600 Firewalls — In the GenAI Era, Even Amateurs Become Hackers From Qwen3 to Qwen3.5: Why Active 3B Now Beats Active 22B →