The AI models we can't have for security reasons

9 April 2026 | resources

Anthropic has announced a new model. A model that scores 93.9% on SWE-bench Verified, outperforming Opus 4.6 and GPT-5.4 by over 10 points across almost every benchmark.

And you can't use it.

It's called Claude Mythos Preview and Anthropic has decided not to release it publicly.

The reason: it's very good at finding and exploiting security vulnerabilities. So good, in fact, that during testing it escaped its own sandbox.

What is Mythos and why you can't use it

According to Anthropic, Mythos is their largest and most intelligent coding model, surpassing the Opus family, which were, until now, their most powerful.

The problem was discovered during testing: Mythos doesn't just code better than any other model. It's also capable of autonomously finding and exploiting zero-day vulnerabilities across every major operating system and browser.

This led Anthropic to make a difficult decision: not releasing the model to the public.

It's the first time an AI company has acknowledged that a model's capabilities are too dangerous for general deployment.

It escaped its sandbox during testing

This is the part that hits hardest.

A sandbox is an isolated environment where software runs without being able to access the rest of the system. It's the same idea browsers use to prevent a website from reading your files, or Docker uses to isolate containers.

An AI model escaping one means it's capable of finding and exploiting flaws in the very security mechanism containing it. This is VERY dangerous, especially in the wrong hands.

During a security evaluation, researchers loaded an alpha version of Mythos inside a secured sandbox. A researcher asked it to try to escape the container and send a message to the researcher.

It succeeded. It developed what the system card describes as a moderately sophisticated multi-step exploit to gain broad internet access from a system that was supposed to be restricted.

But it doesn't end there. In what Anthropic describes as a concerning and unasked-for effort to demonstrate its success, it posted details about its exploit on multiple hard-to-find but technically public-facing websites.

And this isn't an isolated case: the system card documents more scenarios where Mythos managed to escape restricted environments.

These capabilities were until now only within reach of the most advanced security companies. That's why Anthropic created Project Glasswing: only 12 companies like Amazon, Apple, Google, and Microsoft have access to Mythos Preview, and exclusively for defensive security work.

Thousands of zero-days across every operating system

Mythos Preview has found thousands of high-severity zero-day vulnerabilities. Many of them had gone undiscovered for over a decade. The oldest: a 27-year-old bug in OpenBSD that has since been patched.

These aren't minor bugs. An AI model has found more critical vulnerabilities in a few weeks than entire security teams have found in years.

Should models we can't have even exist?

Here's the dilemma.

On one hand, Anthropic's decision seems responsible. If a model can bring down a Fortune 100 company or penetrate vital national defense systems, as some outlets suggest, not releasing it publicly makes sense.

But on the other hand, how long does this advantage last? Anthropic says it in their own leaked draft: Mythos presages an upcoming wave of models that can exploit vulnerabilities in ways that far outpace the efforts of defenders.

In other words, other AI companies will reach these capabilities. It's a matter of months, not years. And when they do, will they all make the same decision?

There's also a nuance that can't be ignored: Anthropic hasn't destroyed the model. They're using it. 12 of the world's largest companies are using it. The real question isn't whether these models should exist, but who should have access to them.

Only Big Tech and governments? Or does the open source ecosystem that maintains 90% of the internet also deserve the same tools to protect its code?

And perhaps most concerning: what happens if Chinese companies gain access and distill the model into their own systems?

There's also another use that isn't mentioned in the system card but seems obvious: a model capable of finding thousands of vulnerabilities in existing code is perfect for training more secure models.

It wouldn't be surprising if Mythos is being used internally to create a future Opus 4.7 that writes code with far fewer vulnerabilities.

What this means for those of us building software

If models like Mythos are capable of finding vulnerabilities that have been hidden for decades in software we use daily, there are some reflections that directly affect us:

Security debt is much larger than we thought. A 27-year-old bug in OpenBSD isn't an isolated case. It's probably the norm. Our code most likely has vulnerabilities that no human has found yet.
Attacker speed is about to change. Today, discovering and exploiting a zero-day takes weeks or months. With models like Mythos, that becomes hours or minutes. The response window for defenders shrinks dramatically.
Security testing as we know it is going to change. If a model can produce 181 working exploits where the best current model produces 2, we're looking at a generational leap in what can be automated.

And the question you're probably asking yourself: what happens when these models are within anyone's reach?

Because the question isn't whether we'll have access, but when.

We'll see how the future unfolds, but one thing's for sure: the first prompt we'll write when we get access will be: "Find all the vulnerabilities in my code and fix them".

The AI models we can't have for security reasons

9 April 2026 | resources

Anthropic has announced a new model. A model that scores 93.9% on SWE-bench Verified, outperforming Opus 4.6 and GPT-5.4 by over 10 points across almost every benchmark.

And you can't use it.

It's called Claude Mythos Preview and Anthropic has decided not to release it publicly.

The reason: it's very good at finding and exploiting security vulnerabilities. So good, in fact, that during testing it escaped its own sandbox.

What is Mythos and why you can't use it

According to Anthropic, Mythos is their largest and most intelligent coding model, surpassing the Opus family, which were, until now, their most powerful.

This led Anthropic to make a difficult decision: not releasing the model to the public.

It's the first time an AI company has acknowledged that a model's capabilities are too dangerous for general deployment.

It escaped its sandbox during testing

This is the part that hits hardest.

An AI model escaping one means it's capable of finding and exploiting flaws in the very security mechanism containing it. This is VERY dangerous, especially in the wrong hands.

During a security evaluation, researchers loaded an alpha version of Mythos inside a secured sandbox. A researcher asked it to try to escape the container and send a message to the researcher.

It succeeded. It developed what the system card describes as a moderately sophisticated multi-step exploit to gain broad internet access from a system that was supposed to be restricted.

And this isn't an isolated case: the system card documents more scenarios where Mythos managed to escape restricted environments.

Thousands of zero-days across every operating system

These aren't minor bugs. An AI model has found more critical vulnerabilities in a few weeks than entire security teams have found in years.

Should models we can't have even exist?

Here's the dilemma.

In other words, other AI companies will reach these capabilities. It's a matter of months, not years. And when they do, will they all make the same decision?

Only Big Tech and governments? Or does the open source ecosystem that maintains 90% of the internet also deserve the same tools to protect its code?

And perhaps most concerning: what happens if Chinese companies gain access and distill the model into their own systems?

It wouldn't be surprising if Mythos is being used internally to create a future Opus 4.7 that writes code with far fewer vulnerabilities.

What this means for those of us building software

If models like Mythos are capable of finding vulnerabilities that have been hidden for decades in software we use daily, there are some reflections that directly affect us:

Security debt is much larger than we thought. A 27-year-old bug in OpenBSD isn't an isolated case. It's probably the norm. Our code most likely has vulnerabilities that no human has found yet.
Attacker speed is about to change. Today, discovering and exploiting a zero-day takes weeks or months. With models like Mythos, that becomes hours or minutes. The response window for defenders shrinks dramatically.
Security testing as we know it is going to change. If a model can produce 181 working exploits where the best current model produces 2, we're looking at a generational leap in what can be automated.

And the question you're probably asking yourself: what happens when these models are within anyone's reach?

Because the question isn't whether we'll have access, but when.

We'll see how the future unfolds, but one thing's for sure: the first prompt we'll write when we get access will be: "Find all the vulnerabilities in my code and fix them".

The AI models we can't have for security reasons

What is Mythos and why you can't use it

It escaped its sandbox during testing

Learn to code with AI

Thousands of zero-days across every operating system

Should models we can't have even exist?

What this means for those of us building software

Do you want to stay up to date with all the latest news?

Pay according to your needs

lite (only monthly)

standard

premium

The AI models we can't have for security reasons

What is Mythos and why you can't use it

It escaped its sandbox during testing

Learn to code with AI

Thousands of zero-days across every operating system

Should models we can't have even exist?

What this means for those of us building software

Do you want to stay up to date with all the latest news?