On March 23, 2026, Jensen Huang told Lex Fridman that AGI had arrived. Three days later, Anthropic accidentally published nearly 3,000 internal documents revealing a model so capable of exploiting cyberinfrastructure that the company itself had warned it could spark a wave of AI-driven attacks "that far outpace the efforts of defenders." Meanwhile, your morning tasks were increasingly being completed not by you, but by an autonomous agent you had installed on a WhatsApp account. Consider the week a demonstration of the problem it was supposed to be solving.
I. Has AGI Arrived? That Depends on What You're Selling
The question of whether artificial general intelligence is already here has become less a scientific debate than a Rorschach test. What you see in it reveals your incentives.
Huang's declaration — made in the easy register of a podcast rather than a peer-reviewed paper — was precise in its imprecision. He framed AGI as an AI capable of founding a billion-dollar technology company, pointed to tools like OpenClaw as illustrative proof, and moved on. The definition was tailor-made to suit the current capabilities of frontier models while conveniently fitting the business interests of a company whose valuation depends on perpetual appetite for compute.
Sam Altman has described AGI as OpenAI's "biggest goal" without specifying when the finish line would appear. Anthropic's Dario Amodei famously refuses the term entirely, preferring "powerful AI" and his 2024 essay framing of systems "smarter than a Nobel Prize winner in most subjects" — a benchmark he had suggested could arrive as early as 2026. Google DeepMind's Demis Hassabis lands somewhere in the early 2030s, anchoring the timeline to the cure of disease rather than the launching of a startup.
OpenAI uses a five-level framework for measuring AGI progress. By their own classification, current models sit at Level 2 — "Reasoners" — with three additional stages remaining before full AGI. Whatever Jensen Huang means by the word, it is not what OpenAI means by it.
Then came the benchmark that made the definitional fog impossible to ignore. The same week Huang made his declaration, the ARC Prize Foundation released ARC-AGI-3 — the most demanding AI benchmark ever constructed, built specifically to resist gaming by training-data memorisation. It drops an AI agent into novel, never-seen interactive environments with no instructions, no stated rules, and no disclosed win conditions. The agent must work out the objective and solve it with human-level efficiency.
Humans scored 100%. The best frontier AI model, Google's Gemini 3.1 Pro, scored 0.37%. OpenAI's GPT-5.4 managed 0.26%. Anthropic's Claude Opus 4.6 achieved 0.25%. xAI's Grok-4.20 scored exactly zero.
"The term AGI is being stretched until it means whatever is commercially convenient."Decrypt, March 2026 · On the ARC-AGI-3 results
The disconnect is not a contradiction so much as a collision between two very different conceptions of intelligence — one defined by economic utility, one defined by cognitive generality. Huang's version captures something real: AI systems in early 2026 can write software at expert level, conduct research, reason across complex domains, and autonomously operate computers. By those measures, something genuinely profound has happened. But the "G" in AGI — the part that means transfer, novelty, and genuine understanding without prior exposure — remains conspicuously absent.
What the AGI debate actually illuminates is the alignment problem in its most fundamental form: we are building systems of extraordinary power without shared agreement on what those systems are, what they should do, or how to measure whether they're doing it right. The benchmark wars are, at their core, an alignment crisis in miniature.
II. The Mythos Meltdown: When the Safety Company Leaked Its Most Dangerous Model
On March 26, 2026, a default configuration setting in Anthropic's content management system made all uploaded assets publicly accessible and searchable. By the time the error was identified and corrected, nearly 3,000 documents had been indexed — among them a draft blog post for an unreleased model codenamed Claude Mythos, including a new performance tier called Capybara positioned above the company's existing flagship Opus architecture.
The document described Mythos as "by far the most powerful AI model we've ever developed." It warned that the model was "far ahead of any other AI model in cyber capabilities" and could enable "a wave of models that can exploit vulnerabilities in ways that far outpace the efforts of defenders." According to the leaked materials, Mythos was capable of autonomously identifying zero-day vulnerabilities, remediating network breaches with minimal human intervention, and performing what security researchers had begun calling "autonomous vulnerability hunting" across complex, interconnected codebases — finding, verifying, and potentially executing targeted exploits without requiring human guidance at any step.
The timing could hardly have been worse. The leak landed during the final days of RSAC 2026, the industry's premier security conference. The market responded with something between a panic and a reckoning.
| Company / Index | Ticker | Session Move | Analyst Read |
|---|---|---|---|
| CrowdStrike | CRWD | −5% to −8% | Signature-based defences pressured by AI-speed discovery |
| Palo Alto Networks | PANW | −5% to −8% | Prior $25B CyberArk acquisition questioned amid AI commoditisation |
| Okta | OKTA | −5% to −6% | Identity security exposed to AI-automated social engineering |
| Microsoft | MSFT | −3% | Security Copilot margins threatened by autonomous AI competitors |
| iShares Cybersecurity ETF | IHAK | −4% | Sector-wide re-evaluation of defensible moats |
Raymond James analyst Adam Tindle outlined the structural concern in stark terms: defensive approaches built on known signatures, vulnerability databases, and prior threat intelligence telemetry could be undermined as AI enables continuous, autonomous discovery of novel attack vectors. The moat that had made pure-play cybersecurity firms worth their premium valuations — proprietary threat intelligence, human expertise, years of data accumulation — suddenly looked like it might be democratisable through an API call.
What the market was responding to was not simply the existence of a capable AI model. It was the implicit signal that general-purpose AI capability, when sufficiently advanced, begins to commoditise every vertical it touches — including the one theoretically responsible for keeping it safe.
Anthropic's response was careful and controlled. The company confirmed Mythos's existence, attributed the leak to "human error" in CMS configuration, and noted it was providing cybersecurity vendors early access to the model specifically to improve defensive capabilities. The irony of this framing — a company with a history of security incidents offering its most dangerous model as a gift to the defenders — was not lost on the research community.
The Mythos leak was not Anthropic's first operational security failure. In January 2026, a flaw in Claude Cowork allowed attackers to exploit its API to steal user data days after launch. In late 2025, researchers demonstrated that Claude could be turned into a malware factory within eight hours. The company, publicly the most safety-focused lab in the industry, had quietly accumulated a pattern of practical security failures that sat in uncomfortable tension with its public positioning.
III. AI With Hands: The Agent Explosion No One Fully Planned For
While the definitional wars over AGI played out in op-eds, something more empirically consequential was happening in practice. AI had acquired hands. The year between late 2025 and early 2026 saw a categorical shift from AI as conversational interface to AI as autonomous operator — systems that don't answer questions, but take actions, on real systems, with real consequences, often without the user watching.
The shift materialised in three distinct products that arrived in rapid succession, each representing a different point on the spectrum between open and closed, agentic and supervised, chaotic and controlled.
These are not chatbots. They are digital operators. They do not wait to be asked — they execute workflows, manage files, book appointments, send emails, monitor systems, and, in the case of OpenClaw, create MoltMatch dating profiles for users who hadn't asked for one. The consent and accountability structures that govern human operators have not been applied to their AI equivalents. The legal and regulatory frameworks that might govern autonomous AI acting on a user's behalf remain, as of this writing, largely theoretical.
OpenClaw became the test case for what the category looks like without guardrails. Austrian developer Peter Steinberger launched it in November 2025 as a personal experiment. Within weeks, it had accumulated GitHub stars faster than DeepSeek-R1's viral moment. Mac Mini sales sold out as developers configured dedicated servers to run it. Within 72 hours of mass adoption, security firm Guardz had documented active infostealer campaigns specifically targeting its configuration directories — which stored credentials in plaintext Markdown and JSON files, with the default gateway binding to all network interfaces without authentication.
Palo Alto Networks described it as a "lethal trifecta" of risk: access to private data, exposure to untrusted content, and the ability to perform external communications while retaining persistent memory. One of OpenClaw's own maintainers warned publicly that the project was "far too dangerous" for anyone who didn't understand command-line operations. China restricted state agencies from running it. Cisco's AI security team found a third-party skill performing data exfiltration and prompt injection without user awareness.
When an AI agent reads a document, browses a webpage, or processes an email, it is exposed to content that could contain instructions designed to hijack its behaviour — redirecting it to exfiltrate data, perform unauthorised actions, or bypass safety measures. This is prompt injection, and it is one of the most intractable problems in agentic AI deployment. Unlike a human employee who can recognise a phishing attempt, an AI agent has no principled way to distinguish user instructions from adversarial instructions embedded in the data it processes. The attack surface grows with every tool integration.
Anthropic's response with Cowork was more considered. The system runs in a properly sandboxed VM environment, requires explicit folder permissions, surfaces its plan before executing each significant action, and logs activity locally. By the standards of the category, it is thoughtfully designed. But it still exposed a security vulnerability within days of launch, and the company's own documentation explicitly warns against using it for regulated workloads — no HIPAA, no FedRAMP, no financial services compliance environments.
Perplexity Computer sits in between: cloud-hosted (reducing local attack surface) but routing sensitive enterprise data — Snowflake queries, Salesforce records, legal contracts — through a three-year-old startup's infrastructure. The company launched Computer for Enterprise at its inaugural Ask 2026 conference in March, telling CISOs that its platform included a full audit trail and kill switch, then almost immediately acknowledged that enterprise features like audit logs and compliance APIs did not yet capture Cowork-style activity.
The agent explosion is not a problem in the conventional sense. These tools are genuinely powerful, genuinely useful, and represent a real step change in what knowledge workers can accomplish. The alignment question is not whether to deploy them. It is whether the assumptions embedded in their design — about trust, about consent, about the appropriate scope of autonomous action — will hold when adversaries begin exploiting them systematically at scale.
IV. What the Alignment Problem Actually Is (And Why This Moment Changes It)
The alignment problem is often described in the abstract language of theoretical AI safety: will a superintelligent system pursue goals that are genuinely aligned with human values, or will it optimise for a proxy that diverges catastrophically from what we actually want? The paperclip maximiser. The genie that grants the letter of the wish rather than its spirit. The instrumental convergence thesis.
These framings are not wrong, but they have historically functioned as future-tense concerns — problems for an AI that has not yet arrived, requiring solutions that need not be implemented today. What 2026 has made unmistakably clear is that the alignment problem has a present-tense version that is already in deployment, already causing harm, and already being systematically underestimated by the institutions most responsible for solving it.
The present-tense alignment problem is not about superintelligence. It is about the gap between what an AI system is specified to do and what it actually does when deployed in the real world, at scale, with access to real systems. It is the OpenClaw instance that creates a dating profile without the user's knowledge. It is the Claude Cowork installation that leaks data through an API flaw three days after launch. It is the Mythos model that can autonomously discover and exploit zero-day vulnerabilities in codebases it has never seen before. It is the underground discussion of AI-assisted cybercrime that surged 1,500 percent between November and December 2025 alone.
- Autonomous agents acting beyond the scope of user intent, without meaningful oversight or redress
- Prompt injection enabling adversarial hijacking of agents through content they process
- Credential theft and data exfiltration from poorly sandboxed local deployments
- Advanced capability models (Mythos-class) lowering barriers to sophisticated cyberattacks
- Regulatory frameworks written before agentic AI existed — governance lag measured in years
- Recursive AI development loops creating systems whose construction is not fully understood by their builders
- Constitutional AI training producing more predictable refusals on adversarial prompts than earlier approaches
- Sandboxed VM execution environments reducing local attack surface for desktop agents
- Anthropic's controlled rollout strategy for Mythos: early access limited to cyber-defence use cases only
- ARC-AGI-3 providing a benchmark resistant to memorisation, giving genuine readout of generalisation progress
- Community-driven security research exposing agentic vulnerabilities before state-level exploitation
- Growing consensus that model capability and operational safety require separate, dedicated teams
The researchers who study alignment at institutions like the Machine Intelligence Research Institute, the Center for Human-Compatible AI, and Anthropic's own safety team would argue that none of what happened in March 2026 constitutes a fundamental misalignment in the technical sense. The models are doing roughly what they were designed to do. The failures are in deployment, in access controls, in operational security, in the speed at which capabilities are being exposed to a threat landscape that was not prepared for them.
That is a reasonable distinction. It is also, from a practical standpoint, nearly irrelevant. Whether a cyberattack is executed by a misaligned AI or a well-aligned AI deployed carelessly produces the same result. Whether a user's data is extracted by an adversarially hijacked agent or a misconfigured one is a philosophical distinction that offers little comfort to the person whose private files are now in a threat actor's hands.
The more uncomfortable insight — the one that the alignment research community has been reluctant to foreground — is that the present-tense alignment problem is being manufactured and deployed at exactly the pace required by the competitive dynamics of the AI industry. Labs cannot afford to slow down. First-mover advantage in agentic AI is measured in months, not years. Safety research, by its nature, moves slower than capability research. The gap between what these systems can do and our ability to verify that what they're doing is what we want them to do was always going to widen before it narrowed.
V. The Anthropic Irony
There is a particular quality to Anthropic's position in this moment that demands examination. The company was founded explicitly on the thesis that the existing AI industry was moving too fast and too carelessly — that the safety-conscious faction needed its own well-resourced lab to pursue alignment research while building competitive models. Dario Amodei and his co-founders left OpenAI over these concerns. Constitutional AI is Anthropic's signature contribution to the field. Its public communications routinely emphasise caution, deliberation, and the primacy of safety over speed.
In early 2026, the company that embodies the safety-first brand built Claude Cowork — a powerful desktop agent — in approximately ten days using its own AI tools, rushed it to market as a research preview with a security flaw that was exploited within days, then accidentally published 3,000 internal documents about its most dangerous model through a configuration error that could have been caught by any competent content management audit.
Separately, the Trump administration blacklisted Anthropic in March 2026 after the company set limits on military use of its models — specifically refusing to allow Claude to be used in fully autonomous weapons systems or mass surveillance of Americans. Anthropic filed suit. A federal judge expressed concern that the designation appeared designed to "cripple" the company, possibly as retaliation for its public criticism. Whatever one thinks of the geopolitics involved, the episode illustrated that the company's safety commitments have real, costly consequences — and that there are powerful institutional actors who find those commitments inconvenient.
These contradictions do not invalidate Anthropic's safety research. The work being done on interpretability, Constitutional AI, and responsible scaling policies is substantive and important. But they do suggest that the structural pressures acting on even the most safety-conscious actor in the field are powerful enough to produce failures that the same actor would identify and criticise in others. The Mythos leak was not an act of cynicism. It was a failure of operational security at a company under enormous competitive pressure, building faster than its internal processes could safely handle, in an environment where the stakes are rising faster than the safeguards.
"The gap between what these systems can do and our ability to verify that what they're doing is what we want them to do was always going to widen before it narrowed."Lisa Pedrosa · The Journal, March 2026
The alignment problem is not a problem that Anthropic, or any single organisation, can solve while simultaneously competing for market share, raising capital at $380 billion valuations, building new product categories in two-week sprints, and fighting the United States government in federal court. It is a coordination problem — one that requires shared standards, shared benchmarks, meaningful regulatory frameworks, and a competitive environment that doesn't penalise restraint.
None of those things currently exist in forms adequate to the challenge. The ARC-AGI-3 benchmark is a step toward shared standards. The EU AI Act represents a step toward meaningful regulation, albeit written before agentic AI became mainstream. The $2 million prize for the first system to genuinely solve ARC-AGI-3 is, in its own small way, a proposal about what we should actually be optimising for. These are not nothing. But they are operating on a different timescale from the systems they are meant to govern.
VI. The Timeline That Actually Happened
Austrian developer Peter Steinberger releases Clawdbot, a local-first AI agent operating through messaging apps. Within weeks, it goes viral, runs out Mac Minis, and exposes serious security vulnerabilities in the agentic paradigm.
Anthropic launches Cowork as a research preview for Max subscribers, built in approximately ten days using Claude Code itself. Days later, a security flaw allows API-based data theft. The recursive loop that built the tool in record time is the same loop that may have contributed to the flaw.
Anthropic sends trademark complaints. The project renames twice in three days, each rebrand amplifying rather than diminishing attention. By February it has 145,000 GitHub stars and China has begun restricting its use in government agencies.
Perplexity launches a cloud-based agent orchestrating 19 AI models — including Claude Opus 4.6 as the core reasoning engine — for complex long-running workflows. Priced at $200/month. Over 100 enterprise customers request access in a single weekend.
Nvidia's CEO declares AGI achieved on Lex Fridman's podcast. The definition he uses — an AI capable of founding and running a billion-dollar tech company — is specifically calibrated to what current systems can demonstrably do.
The most rigorous AI benchmark yet published. Humans score 100%. The best frontier model scores 0.37%. The gap between what AI can reliably do and what humans can reliably do — on genuinely novel problems — remains, by this measure, effectively total.
A CMS misconfiguration exposes 3,000 Anthropic documents, revealing Claude Mythos — described as the most dangerous AI model ever built for cybersecurity applications. Cybersecurity stocks drop up to 8%. The iShares Cybersecurity ETF falls 4%. The irony is complete.
Buy me a coffee