Has AGI been achieved?

Depends entirely on who is defining it. Jensen Huang declared AGI arrived in March 2026 using a definition calibrated to current capabilities. But the ARC-AGI-3 benchmark - the most rigorous test ever built for genuine generalisation - scored humans at 100% and the best frontier AI at 0.37%. The G in AGI remains absent by any scientifically rigorous measure.

What was the Anthropic Mythos leak?

On March 26, 2026, a CMS misconfiguration made nearly 3,000 internal Anthropic documents publicly searchable. Among them was a draft blog post for an unreleased model called Claude Mythos, described as the most powerful AI ever built, with cyber capabilities that could enable attacks far outpacing defenders. Cybersecurity stocks fell up to 8% in response.

What is the AI alignment problem?

The alignment problem is the gap between what an AI system is specified to do and what it actually does when deployed at scale with access to real systems. In 2026 this is not a future concern - it is already producing real harms: agents acting beyond user intent, prompt injection attacks, credential theft from poorly sandboxed systems, and capability models lowering barriers to cyberattacks.

What is prompt injection in AI agents?

Prompt injection occurs when an AI agent reads a document, webpage, or email containing adversarial instructions designed to hijack its behaviour - redirecting it to exfiltrate data or bypass safety measures. Unlike humans who can recognise phishing attempts, AI agents have no principled way to distinguish user instructions from adversarial instructions embedded in content they process.

What is OpenClaw and why was it controversial?

OpenClaw (formerly Clawdbot) is an open-source AI agent by Austrian developer Peter Steinberger that connects to WhatsApp, Gmail, Telegram, and system files. It accumulated 145,000 GitHub stars within weeks. Within 72 hours of mass adoption, security researchers found active infostealer campaigns targeting its config directories, which stored credentials in plaintext. Cisco found a third-party skill performing data exfiltration without user awareness.

The AI Alignment Gap: When AI Outpaces Safety

On March 23, 2026, Jensen Huang told Lex Fridman that AGI had arrived. Three days later, Anthropic accidentally published nearly 3,000 internal documents revealing a model so capable of exploiting cyberinfrastructure that the company itself had warned it could spark a wave of AI-driven attacks "that far outpace the efforts of defenders." Meanwhile, your morning tasks were increasingly being completed not by you, but by an autonomous agent you had installed on a WhatsApp account. Consider the week a demonstration of the problem it was supposed to be solving.

I. Has AGI Arrived? That Depends on What You're Selling

The question of whether artificial general intelligence is already here has become less a scientific debate than a Rorschach test. What you see in it reveals your incentives.

Huang's declaration — made in the easy register of a podcast rather than a peer-reviewed paper — was precise in its imprecision. He framed AGI as an AI capable of founding a billion-dollar technology company, pointed to tools like OpenClaw as illustrative proof, and moved on. The definition was tailor-made to suit the current capabilities of frontier models while conveniently fitting the business interests of a company whose valuation depends on perpetual appetite for compute.

Sam Altman has described AGI as OpenAI's "biggest goal" without specifying when the finish line would appear. Anthropic's Dario Amodei famously refuses the term entirely, preferring "powerful AI" and his 2024 essay framing of systems "smarter than a Nobel Prize winner in most subjects" — a benchmark he had suggested could arrive as early as 2026. Google DeepMind's Demis Hassabis lands somewhere in the early 2030s, anchoring the timeline to the cure of disease rather than the launching of a startup.

Critical Context

OpenAI uses a five-level framework for measuring AGI progress. By their own classification, current models sit at Level 2 — "Reasoners" — with three additional stages remaining before full AGI. Whatever Jensen Huang means by the word, it is not what OpenAI means by it.

Then came the benchmark that made the definitional fog impossible to ignore. The same week Huang made his declaration, the ARC Prize Foundation released ARC-AGI-3 — the most demanding AI benchmark ever constructed, built specifically to resist gaming by training-data memorisation. It drops an AI agent into novel, never-seen interactive environments with no instructions, no stated rules, and no disclosed win conditions. The agent must work out the objective and solve it with human-level efficiency.

Humans scored 100%. The best frontier AI model, Google's Gemini 3.1 Pro, scored 0.37%. OpenAI's GPT-5.4 managed 0.26%. Anthropic's Claude Opus 4.6 achieved 0.25%. xAI's Grok-4.20 scored exactly zero.

"The term AGI is being stretched until it means whatever is commercially convenient."

Decrypt, March 2026 · On the ARC-AGI-3 results

The disconnect is not a contradiction so much as a collision between two very different conceptions of intelligence — one defined by economic utility, one defined by cognitive generality. Huang's version captures something real: AI systems in early 2026 can write software at expert level, conduct research, reason across complex domains, and autonomously operate computers. By those measures, something genuinely profound has happened. But the "G" in AGI — the part that means transfer, novelty, and genuine understanding without prior exposure — remains conspicuously absent.

What the AGI debate actually illuminates is the alignment problem in its most fundamental form: we are building systems of extraordinary power without shared agreement on what those systems are, what they should do, or how to measure whether they're doing it right. The benchmark wars are, at their core, an alignment crisis in miniature.

◆

II. The Mythos Meltdown: When the Safety Company Leaked Its Most Dangerous Model

On March 26, 2026, a default configuration setting in Anthropic's content management system made all uploaded assets publicly accessible and searchable. By the time the error was identified and corrected, nearly 3,000 documents had been indexed — among them a draft blog post for an unreleased model codenamed Claude Mythos, including a new performance tier called Capybara positioned above the company's existing flagship Opus architecture.

The document described Mythos as "by far the most powerful AI model we've ever developed." It warned that the model was "far ahead of any other AI model in cyber capabilities" and could enable "a wave of models that can exploit vulnerabilities in ways that far outpace the efforts of defenders." According to the leaked materials, Mythos was capable of autonomously identifying zero-day vulnerabilities, remediating network breaches with minimal human intervention, and performing what security researchers had begun calling "autonomous vulnerability hunting" across complex, interconnected codebases — finding, verifying, and potentially executing targeted exploits without requiring human guidance at any step.

The timing could hardly have been worse. The leak landed during the final days of RSAC 2026, the industry's premier security conference. The market responded with something between a panic and a reckoning.

Company / Index	Ticker	Session Move	Analyst Read
CrowdStrike	CRWD	−5% to −8%	Signature-based defences pressured by AI-speed discovery
Palo Alto Networks	PANW	−5% to −8%	Prior $25B CyberArk acquisition questioned amid AI commoditisation
Okta	OKTA	−5% to −6%	Identity security exposed to AI-automated social engineering
Microsoft	MSFT	−3%	Security Copilot margins threatened by autonomous AI competitors
iShares Cybersecurity ETF	IHAK	−4%	Sector-wide re-evaluation of defensible moats

Raymond James analyst Adam Tindle outlined the structural concern in stark terms: defensive approaches built on known signatures, vulnerability databases, and prior threat intelligence telemetry could be undermined as AI enables continuous, autonomous discovery of novel attack vectors. The moat that had made pure-play cybersecurity firms worth their premium valuations — proprietary threat intelligence, human expertise, years of data accumulation — suddenly looked like it might be democratisable through an API call.

What the market was responding to was not simply the existence of a capable AI model. It was the implicit signal that general-purpose AI capability, when sufficiently advanced, begins to commoditise every vertical it touches — including the one theoretically responsible for keeping it safe.

Anthropic's response was careful and controlled. The company confirmed Mythos's existence, attributed the leak to "human error" in CMS configuration, and noted it was providing cybersecurity vendors early access to the model specifically to improve defensive capabilities. The irony of this framing — a company with a history of security incidents offering its most dangerous model as a gift to the defenders — was not lost on the research community.

Security Record

The Mythos leak was not Anthropic's first operational security failure. In January 2026, a flaw in Claude Cowork allowed attackers to exploit its API to steal user data days after launch. In late 2025, researchers demonstrated that Claude could be turned into a malware factory within eight hours. The company, publicly the most safety-focused lab in the industry, had quietly accumulated a pattern of practical security failures that sat in uncomfortable tension with its public positioning.

◆

III. AI With Hands: The Agent Explosion No One Fully Planned For

While the definitional wars over AGI played out in op-eds, something more empirically consequential was happening in practice. AI had acquired hands. The year between late 2025 and early 2026 saw a categorical shift from AI as conversational interface to AI as autonomous operator — systems that don't answer questions, but take actions, on real systems, with real consequences, often without the user watching.

The shift materialised in three distinct products that arrived in rapid succession, each representing a different point on the spectrum between open and closed, agentic and supervised, chaotic and controlled.

🦞

OpenClaw

Formerly Clawdbot · Nov 2025

Open-source agent by Peter Steinberger running locally on any OS, connecting to WhatsApp, Telegram, Gmail, and system files. 145,000+ GitHub stars. Renamed twice after trademark disputes with Anthropic. China restricted use in government agencies. Within 72 hours of going viral, security researchers found exposed admin panels and active infostealer campaigns targeting its configuration directories.

⚡

Perplexity Computer

Perplexity AI · Feb 25, 2026

Cloud-based multi-model orchestration platform that coordinates 19 different AI models — Claude Opus 4.6 for core reasoning, Gemini for deep research, Grok for lightweight tasks — to complete long-running workflows without user supervision. Available to Max subscribers at $200/month. VentureBeat described it as sitting "somewhere between OpenClaw and Claude Cowork."

🖥

Claude Cowork

Anthropic · Jan 12, 2026

Desktop agent built on Claude Code's architecture but designed for non-developers. Runs in a sandboxed Linux VM via Apple Virtualization Framework. Accesses designated folders, opens applications, fills spreadsheets, browses the web. Notably, the entire feature was reportedly built in approximately ten days using Claude Code itself — a recursive loop that should prompt either excitement or reflection, depending on your disposition.

These are not chatbots. They are digital operators. They do not wait to be asked — they execute workflows, manage files, book appointments, send emails, monitor systems, and, in the case of OpenClaw, create MoltMatch dating profiles for users who hadn't asked for one. The consent and accountability structures that govern human operators have not been applied to their AI equivalents. The legal and regulatory frameworks that might govern autonomous AI acting on a user's behalf remain, as of this writing, largely theoretical.

OpenClaw became the test case for what the category looks like without guardrails. Austrian developer Peter Steinberger launched it in November 2025 as a personal experiment. Within weeks, it had accumulated GitHub stars faster than DeepSeek-R1's viral moment. Mac Mini sales sold out as developers configured dedicated servers to run it. Within 72 hours of mass adoption, security firm Guardz had documented active infostealer campaigns specifically targeting its configuration directories — which stored credentials in plaintext Markdown and JSON files, with the default gateway binding to all network interfaces without authentication.

Palo Alto Networks described it as a "lethal trifecta" of risk: access to private data, exposure to untrusted content, and the ability to perform external communications while retaining persistent memory. One of OpenClaw's own maintainers warned publicly that the project was "far too dangerous" for anyone who didn't understand command-line operations. China restricted state agencies from running it. Cisco's AI security team found a third-party skill performing data exfiltration and prompt injection without user awareness.

The Prompt Injection Problem

When an AI agent reads a document, browses a webpage, or processes an email, it is exposed to content that could contain instructions designed to hijack its behaviour — redirecting it to exfiltrate data, perform unauthorised actions, or bypass safety measures. This is prompt injection, and it is one of the most intractable problems in agentic AI deployment. Unlike a human employee who can recognise a phishing attempt, an AI agent has no principled way to distinguish user instructions from adversarial instructions embedded in the data it processes. The attack surface grows with every tool integration.

Anthropic's response with Cowork was more considered. The system runs in a properly sandboxed VM environment, requires explicit folder permissions, surfaces its plan before executing each significant action, and logs activity locally. By the standards of the category, it is thoughtfully designed. But it still exposed a security vulnerability within days of launch, and the company's own documentation explicitly warns against using it for regulated workloads — no HIPAA, no FedRAMP, no financial services compliance environments.

Perplexity Computer sits in between: cloud-hosted (reducing local attack surface) but routing sensitive enterprise data — Snowflake queries, Salesforce records, legal contracts — through a three-year-old startup's infrastructure. The company launched Computer for Enterprise at its inaugural Ask 2026 conference in March, telling CISOs that its platform included a full audit trail and kill switch, then almost immediately acknowledged that enterprise features like audit logs and compliance APIs did not yet capture Cowork-style activity.

The agent explosion is not a problem in the conventional sense. These tools are genuinely powerful, genuinely useful, and represent a real step change in what knowledge workers can accomplish. The alignment question is not whether to deploy them. It is whether the assumptions embedded in their design — about trust, about consent, about the appropriate scope of autonomous action — will hold when adversaries begin exploiting them systematically at scale.

◆

IV. What the Alignment Problem Actually Is (And Why This Moment Changes It)

The alignment problem is often described in the abstract language of theoretical AI safety: will a superintelligent system pursue goals that are genuinely aligned with human values, or will it optimise for a proxy that diverges catastrophically from what we actually want? The paperclip maximiser. The genie that grants the letter of the wish rather than its spirit. The instrumental convergence thesis.

These framings are not wrong, but they have historically functioned as future-tense concerns — problems for an AI that has not yet arrived, requiring solutions that need not be implemented today. What 2026 has made unmistakably clear is that the alignment problem has a present-tense version that is already in deployment, already causing harm, and already being systematically underestimated by the institutions most responsible for solving it.

The present-tense alignment problem is not about superintelligence. It is about the gap between what an AI system is specified to do and what it actually does when deployed in the real world, at scale, with access to real systems. It is the OpenClaw instance that creates a dating profile without the user's knowledge. It is the Claude Cowork installation that leaks data through an API flaw three days after launch. It is the Mythos model that can autonomously discover and exploit zero-day vulnerabilities in codebases it has never seen before. It is the underground discussion of AI-assisted cybercrime that surged 1,500 percent between November and December 2025 alone.

Alignment Risks — Present Tense

Autonomous agents acting beyond the scope of user intent, without meaningful oversight or redress
Prompt injection enabling adversarial hijacking of agents through content they process
Credential theft and data exfiltration from poorly sandboxed local deployments
Advanced capability models (Mythos-class) lowering barriers to sophisticated cyberattacks
Regulatory frameworks written before agentic AI existed — governance lag measured in years
Recursive AI development loops creating systems whose construction is not fully understood by their builders

Alignment Progress — Present Tense

Constitutional AI training producing more predictable refusals on adversarial prompts than earlier approaches
Sandboxed VM execution environments reducing local attack surface for desktop agents
Anthropic's controlled rollout strategy for Mythos: early access limited to cyber-defence use cases only
ARC-AGI-3 providing a benchmark resistant to memorisation, giving genuine readout of generalisation progress
Community-driven security research exposing agentic vulnerabilities before state-level exploitation
Growing consensus that model capability and operational safety require separate, dedicated teams

The researchers who study alignment at institutions like the Machine Intelligence Research Institute, the Center for Human-Compatible AI, and Anthropic's own safety team would argue that none of what happened in March 2026 constitutes a fundamental misalignment in the technical sense. The models are doing roughly what they were designed to do. The failures are in deployment, in access controls, in operational security, in the speed at which capabilities are being exposed to a threat landscape that was not prepared for them.

That is a reasonable distinction. It is also, from a practical standpoint, nearly irrelevant. Whether a cyberattack is executed by a misaligned AI or a well-aligned AI deployed carelessly produces the same result. Whether a user's data is extracted by an adversarially hijacked agent or a misconfigured one is a philosophical distinction that offers little comfort to the person whose private files are now in a threat actor's hands.

The more uncomfortable insight — the one that the alignment research community has been reluctant to foreground — is that the present-tense alignment problem is being manufactured and deployed at exactly the pace required by the competitive dynamics of the AI industry. Labs cannot afford to slow down. First-mover advantage in agentic AI is measured in months, not years. Safety research, by its nature, moves slower than capability research. The gap between what these systems can do and our ability to verify that what they're doing is what we want them to do was always going to widen before it narrowed.

◆

V. The Anthropic Irony

There is a particular quality to Anthropic's position in this moment that demands examination. The company was founded explicitly on the thesis that the existing AI industry was moving too fast and too carelessly — that the safety-conscious faction needed its own well-resourced lab to pursue alignment research while building competitive models. Dario Amodei and his co-founders left OpenAI over these concerns. Constitutional AI is Anthropic's signature contribution to the field. Its public communications routinely emphasise caution, deliberation, and the primacy of safety over speed.

In early 2026, the company that embodies the safety-first brand built Claude Cowork — a powerful desktop agent — in approximately ten days using its own AI tools, rushed it to market as a research preview with a security flaw that was exploited within days, then accidentally published 3,000 internal documents about its most dangerous model through a configuration error that could have been caught by any competent content management audit.

Separately, the Trump administration blacklisted Anthropic in March 2026 after the company set limits on military use of its models — specifically refusing to allow Claude to be used in fully autonomous weapons systems or mass surveillance of Americans. Anthropic filed suit. A federal judge expressed concern that the designation appeared designed to "cripple" the company, possibly as retaliation for its public criticism. Whatever one thinks of the geopolitics involved, the episode illustrated that the company's safety commitments have real, costly consequences — and that there are powerful institutional actors who find those commitments inconvenient.

These contradictions do not invalidate Anthropic's safety research. The work being done on interpretability, Constitutional AI, and responsible scaling policies is substantive and important. But they do suggest that the structural pressures acting on even the most safety-conscious actor in the field are powerful enough to produce failures that the same actor would identify and criticise in others. The Mythos leak was not an act of cynicism. It was a failure of operational security at a company under enormous competitive pressure, building faster than its internal processes could safely handle, in an environment where the stakes are rising faster than the safeguards.

"The gap between what these systems can do and our ability to verify that what they're doing is what we want them to do was always going to widen before it narrowed."

Lisa Pedrosa · The Journal, March 2026

The alignment problem is not a problem that Anthropic, or any single organisation, can solve while simultaneously competing for market share, raising capital at $380 billion valuations, building new product categories in two-week sprints, and fighting the United States government in federal court. It is a coordination problem — one that requires shared standards, shared benchmarks, meaningful regulatory frameworks, and a competitive environment that doesn't penalise restraint.

None of those things currently exist in forms adequate to the challenge. The ARC-AGI-3 benchmark is a step toward shared standards. The EU AI Act represents a step toward meaningful regulation, albeit written before agentic AI became mainstream. The $2 million prize for the first system to genuinely solve ARC-AGI-3 is, in its own small way, a proposal about what we should actually be optimising for. These are not nothing. But they are operating on a different timescale from the systems they are meant to govern.

VI. The Timeline That Actually Happened

November 2025

Clawdbot Launched

Austrian developer Peter Steinberger releases Clawdbot, a local-first AI agent operating through messaging apps. Within weeks, it goes viral, runs out Mac Minis, and exposes serious security vulnerabilities in the agentic paradigm.

January 12, 2026

Claude Cowork Released

Anthropic launches Cowork as a research preview for Max subscribers, built in approximately ten days using Claude Code itself. Days later, a security flaw allows API-based data theft. The recursive loop that built the tool in record time is the same loop that may have contributed to the flaw.

January 27, 2026

Clawdbot → Moltbot → OpenClaw

Anthropic sends trademark complaints. The project renames twice in three days, each rebrand amplifying rather than diminishing attention. By February it has 145,000 GitHub stars and China has begun restricting its use in government agencies.

February 25, 2026

Perplexity Computer Launches

Perplexity launches a cloud-based agent orchestrating 19 AI models — including Claude Opus 4.6 as the core reasoning engine — for complex long-running workflows. Priced at $200/month. Over 100 enterprise customers request access in a single weekend.

March 23, 2026

Jensen Huang: "We've Achieved AGI"

Nvidia's CEO declares AGI achieved on Lex Fridman's podcast. The definition he uses — an AI capable of founding and running a billion-dollar tech company — is specifically calibrated to what current systems can demonstrably do.

March 25, 2026

ARC-AGI-3 Drops

The most rigorous AI benchmark yet published. Humans score 100%. The best frontier model scores 0.37%. The gap between what AI can reliably do and what humans can reliably do — on genuinely novel problems — remains, by this measure, effectively total.

March 26–27, 2026

The Mythos Meltdown

A CMS misconfiguration exposes 3,000 Anthropic documents, revealing Claude Mythos — described as the most dangerous AI model ever built for cybersecurity applications. Cybersecurity stocks drop up to 8%. The iShares Cybersecurity ETF falls 4%. The irony is complete.

The
Alignment
Gap

I. Has AGI Arrived? That Depends on What You're Selling

II. The Mythos Meltdown: When the Safety Company Leaked Its Most Dangerous Model

III. AI With Hands: The Agent Explosion No One Fully Planned For

IV. What the Alignment Problem Actually Is (And Why This Moment Changes It)

V. The Anthropic Irony

VI. The Timeline That Actually Happened

Five Paths Forward From Here

TheAlignmentGap

I. Has AGI Arrived? That Depends on What You're Selling

II. The Mythos Meltdown: When the Safety Company Leaked Its Most Dangerous Model

III. AI With Hands: The Agent Explosion No One Fully Planned For

IV. What the Alignment Problem Actually Is (And Why This Moment Changes It)

V. The Anthropic Irony

VI. The Timeline That Actually Happened

Five Paths Forward From Here

The
Alignment
Gap