Claude Mythos: A Soft-Target Leak on a Hard-Target Model

Advisor@AegisIntel.ai
14 hours ago
4 min read

Just weeks into Anthropic's Mythos Preview release and Project Glasswing launch, there is already a 'breach' of secure access of this potentially groundbreaking AI technology.

Anthropic's accidental exposure of its unreleased Claude Mythos model is less a one-off embarrassment and more a case study in how "soft" SaaS edges now expose the "hardest" AI capabilities.

What Actually Happened

In late March 2026, thousands of unpublished assets tied to Anthropic's content management platform were unintentionally left publicly accessible due to a misconfiguration. Draft blog posts, images, documents, and related materials describing a new model — Claude Mythos — were indexed and open to the public before access was revoked. Among these was a launch draft positioning Mythos as a new tier above the existing Opus line, with capabilities in coding, advanced reasoning, and cyber-relevant tasks.

This was not a hack or outside incursion, only the result of a configuration failure in a standard SaaS workflow. Default-open settings, combined with human error, effectively turned a marketing and documentation tool into an intelligence collection surface.

Why Mythos Matters for Security

Based on the leaked descriptions and subsequent technical analysis, Mythos is oriented toward high-end penetration of complex systems. If confirmed, these capabilities will enable the user to target and possibly penetrate high level, sensitive and heavily protected systems.

Under those conditions, internal details about capability boundaries, evaluation approaches, and deployment strategies stop being neutral marketing content. They are relevant data for any actor attempting to anticipate, emulate, or counter the next generation of AI-augmented defense.

Claude Mythos: The Leak Is the Signal

Anthropic didn't just leak a model — they exposed a pattern. Since September 2025, Anthropic has accumulated a cluster of failures that all point in the same direction:

In September 2025, Operation GTG-1002 was detected. A state-linked threat group jailbroke Claude Code and used it to conduct a largely autonomous espionage campaign across roughly 30 global targets, with the model performing 80–90% of operational tasks. Anthropic publicly disclosed the campaign on November 13, 2025.
From August 2025 through January 2026, multiple critical Claude Code vulnerabilities were disclosed. Independent researchers documented flaws enabling remote code execution (CVE-2025-59536), API key exfiltration (CVE-2026-21852), and prompt-injection-driven data leakage via DNS channels (CVE-2025-55284). Each was patched, but each pointed at the same underlying issue — the agent trusting inputs it should not.
Through late 2025 and early 2026, a chain of repository-based attacks emerged. Disclosed by Check Point Research, these attacks exploited trust in local configuration files — .claude directories, MCP servers, and hooks — turning routine developer workflows into execution vectors against anyone cloning a malicious repository.
On March 26, 2026, the Mythos CMS leak occurred. Roughly 3,000 unpublished assets were exposed through a default-public content management system, including draft material describing a frontier model before its planned release.
On March 31, 2026, repeated packaging errors culminated in the largest source code leak. Version 2.1.88 of the Claude Code npm package shipped with a source map file exposing 512,000 lines of TypeScript across 1,906 files. The code was mirrored, ported to other languages, and redistributed beyond takedown control within hours. It was at least the second such packaging-error leak — an earlier incident occurred in February 2025.
In early April 2026, Adversa AI demonstrated a permission-model bypass. Researchers showed that Claude Code's security controls could be silently disabled by submitting command pipelines exceeding 50 subcommands, enabling credential exfiltration with no warning to the user.

Individually, each incident is at least explainable if not excusable. But together they form a pattern: insufficient security safeguards and operational controls.

In traditional software security, peripheral systems are lower-priority targets. Prompts, system instructions, and source maps all encode pieces of the same thing: how the model works, what it can do, and how it is constrained.

For an adversary building or countering AI-enabled capabilities, that's enough.

The Mythos leak didn't expose weights, but a category of intelligence: structured insight into Claude and Mythos model behavior and deployment strategies, delivered through infrastructure that wasn't treated as part of the threat surface. That is the real failure.

The leak is the signal.

The national debate on AI security has been at the top of the headlines since early this year, initially framed by Anthropic as whether the federal government should have access to Mythos-class AI tools in order to function in the defense and military context necessary to protect the country, in alignment with its 'ethics-based' development philosophy.

The focus will pivot now — should any company, public or private, be permitted to safeguard technology that according to its own developers has a potential for disaster if released or accessed by foreign threat actors, particularly where a proven course of conduct exposes negligence or insufficient safeguarding to protect such soft weaponry.

Who Watches the Watchmen?

Sources

Anthropic. "Disrupting the first reported AI-orchestrated cyber espionage campaign." November 13, 2025. https://www.anthropic.com/news/disrupting-AI-espionage

Axios. "Chinese hackers used Anthropic's Claude AI agent to automate spying." November 13, 2025. https://www.axios.com/2025/11/13/anthropic-china-claude-code-cyberattack

Check Point Research. "Caught in the Hook: RCE and API Token Exfiltration Through Claude Code Project Files (CVE-2025-59536, CVE-2026-21852)." February 26, 2026. https://research.checkpoint.com/2026/rce-and-api-token-exfiltration-through-claude-code-project-files-cve-2025-59536/

Rehberger, Johann. "Claude Code: Data Exfiltration with DNS (CVE-2025-55284)." Embrace The Red. August 11, 2025. https://embracethered.com/blog/posts/2025/claude-code-exfiltration-via-dns-requests/

Fortune. "Exclusive: Anthropic left details of an unreleased model, an upcoming exclusive CEO event, in a public database." March 26, 2026. https://fortune.com/2026/03/26/anthropic-leaked-unreleased-model-exclusive-event-security-issues-cybersecurity-unsecured-data-store/

Fortune. "Anthropic leaks its own AI coding tool's source code in second major security breach." March 31, 2026. https://fortune.com/2026/03/31/anthropic-source-code-claude-code-data-leak-second-security-lapse-days-after-accidentally-revealing-mythos/

VentureBeat. "In the wake of Claude Code's source code leak, 5 actions enterprise security leaders should take now." April 2026. https://venturebeat.com/security/claude-code-512000-line-source-leak-attack-paths-audit-security-leaders

SecurityWeek. "Critical Vulnerability in Claude Code Emerges Days After Source Leak." Kevin Townsend. April 2026. https://www.securityweek.com/critical-vulnerability-in-claude-code-emerges-days-after-source-leak/

Inc. "Why Anthropic's Massive Code Leak Is Now a National Security Concern in D.C." April 2026. https://www.inc.com/leila-sheridan/anthropic-code-leak-dc-security/91326007

Techzine Global. "Details leak on Anthropic's 'step-change' Mythos model." March 2026. https://www.techzine.eu/news/applications/140017/details-leak-on-anthropics-step-change-mythos-model/

Claude Mythos: A Soft-Target Leak on a Hard-Target Model

What Actually Happened

Why Mythos Matters for Security

Claude Mythos: The Leak Is the Signal

Recent Posts

Comments

Subscribe to Our Newsletter