AI Found a 27-Year-Old Bug: Project Glasswing Dev Guide 2026

Q: How did AI find a 27-year-old bug that humans missed?

Claude Mythos Preview discovered a 27-year-old vulnerability in OpenBSD by reading and reasoning about code the way a skilled security researcher would — following variables across files and understanding data flow. Traditional automated fuzz testing had run the same code millions of times without catching the flaw. The AI's ability to reason about code context, not just pattern-match inputs, is what made the difference.

Q: Can I use Anthropic's AI vulnerability scanner today?

Yes. There are three options available right now: (1) Clone the defending-code-reference-harness on GitHub and run /quickstart in Claude Code for a self-hosted autonomous scanner. (2) Install the free Claude Code security plugin for real-time terminal vulnerability detection. (3) Claude Security (managed product, Opus 4.7-powered) is in beta for Claude Enterprise customers, with Team and Max plan support coming soon.

Nexgismo

2 months ago

TL;DR

What: Anthropic’s Project Glasswing used Claude Mythos Preview to find thousands of zero-day vulnerabilities — including a 27-year-old OpenBSD bug — in every major OS and browser.
Why it matters: AI can now find bugs that survived decades of human review and millions of fuzz tests, shifting the security balance permanently.
What to do: Use the free open-source defending-code-reference-harness or Claude Code’s security plugin to scan your own codebase starting today.
Key shift: Discovery is no longer the bottleneck — verification, triage, and patching are. Plan your workflow accordingly.

Project Glasswing is Anthropic’s initiative to use frontier AI models for defensive cybersecurity at scale. It brings together twelve industry partners — including AWS, Microsoft, Google, and Cisco — and commits $100M in AI usage credits to find and fix vulnerabilities in the world’s most critical software before attackers get there first. The underlying model, Claude Mythos Preview, scored 83.1% on CyberGym (cybersecurity vulnerability reproduction), compared to 66.6% for the previous Opus 4.6 — a meaningful jump that translates directly to real-world bug-finding capability.

On June 7, 2026, Anthropic’s Project Glasswing became one of the biggest stories in developer security. An AI model found a 27-year-old vulnerability in OpenBSD — one of the most security-hardened operating systems in existence — that allowed a remote attacker to crash any machine just by connecting to it. The same model also uncovered a 16-year-old bug in FFmpeg, in a line of code that automated testing tools had hit five million times without catching the problem. And it chained together multiple Linux kernel vulnerabilities to escalate from ordinary user access to full root control — autonomously, with no human steering. As of May 22, 2026, Anthropic had disclosed 1,596 vulnerabilities through open-source scanning, with 97 already patched. This post covers what Project Glasswing actually is, how it works, and — most importantly — what you can start using in your own projects right now.

What is Project Glasswing and why should developers care?

Project Glasswing is a cross-industry security initiative where Anthropic, AWS, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks pooled resources to put Claude Mythos Preview to work on defensive security. Anthropic committed $100M in model usage credits and $4M in direct donations to open-source security organizations — $2.5M to Alpha-Omega and OpenSSF through the Linux Foundation, and $1.5M to the Apache Software Foundation.

The reason this matters for everyday developers is the threshold that has been crossed. Mythos Preview scored 93.9% on SWE-bench Verified and 83.1% on CyberGym — benchmarks that measure real-world coding and security reproduction ability. That 83.1% figure means: given a known vulnerability, the model can reproduce the exploit eight times out of ten. Humans at top security firms average around the same rate. This is not a marginal improvement — it is the first time an AI model has reached parity with elite human security researchers at finding and exploiting flaws.

The practical implication: attackers who get access to similarly capable models will be able to find vulnerabilities in your software faster than your team can audit it. Defenders who use these tools first have a durable advantage. That is the entire premise of Glasswing — and why twelve of the world’s largest technology companies joined before the public announcement.

How did AI find bugs that survived decades of human review?

The short answer: it reads code the way a senior security researcher does — following variables across files, reasoning about which inputs can be trusted, and thinking about how components interact — rather than just throwing random inputs at a binary and hoping for a crash.

Traditional fuzz testing is powerful but blind. It generates millions of random inputs and watches for crashes. The FFmpeg bug that Mythos Preview found had been hit by fuzz testing five million times without triggering, because the vulnerable path required a specific sequence of state that random generation almost never produces. Mythos Preview read the source code, understood the state machine, and constructed a targeted proof of concept in minutes.

The 27-year-old OpenBSD vulnerability is even more striking. OpenBSD has a reputation as the most security-audited open-source operating system in the world — every line of its network stack has been manually reviewed multiple times over nearly three decades. The model found a remote crash vulnerability that human reviewers missed across dozens of audit cycles. The Linux kernel chain was more sophisticated still: Mythos Preview identified several independent vulnerabilities and recognized that chaining them together allowed privilege escalation from user to root. A human researcher might spot each flaw individually but not see the chain. The AI reasoned about all of them simultaneously.

If you have been following how AI is already changing developer workflows, this is the same pattern: AI doesn’t replace human judgment, but it can process context at a scale and speed that humans cannot match.

How does the Defending Code Reference Harness work?

The defending-code-reference-harness on GitHub is Anthropic’s open-source reference implementation — a 7-stage pipeline you can clone, customize, and run against your own codebase. Think of it as a blueprint, not a finished product.

The seven stages are: build (compile the target into a Docker container with ASAN, the memory error detector), recon (a lightweight AI agent partitions the codebase into distinct attack surfaces worth exploring separately), find (multiple agents run in parallel, each crafting inputs and looking for reproducible crashes — three consecutive crashes with the same input are required before a finding is recorded), verify (a separate grader agent reproduces each crash in a fresh container the find agent never touched), dedupe (a judge agent compares findings against previously known bugs to filter duplicates), report (structured exploitability analysis is generated for each unique finding, including severity and escalation path), and patch (a patch agent writes a proposed fix, then a grader confirms the binary still builds, the proof-of-concept no longer crashes, existing tests still pass, and fresh find agents cannot bypass the fix).

The ramp-up is designed to get you hands-on in four days: Day 1 covers threat modeling and first static scan, Day 2 runs the full autonomous pipeline on a known-vulnerable open-source library, Days 3–5 customize the pipeline for your stack, and Week 2 begins autonomous scanning at scale. The reference implementation targets C/C++ memory bugs but is explicitly designed to be adapted to other languages and vulnerability classes.

This is the same philosophy we cover in our SOLID design principles guide — separating concerns into distinct, verifiable stages makes complex systems auditable and trustworthy.

What security tools can developers use right now?

You do not need to wait for Project Glasswing partner access or enterprise pricing. Three options are available today:

Option 1 — Free Claude Code security plugin. Anthropic released a free security-guidance plugin for the Claude Code CLI that integrates real-time vulnerability detection directly into your terminal workflow. If you are already running Claude Code, install the plugin and it will flag potential security issues as you write code — before anything reaches a pull request.

Option 2 — Defending Code Reference Harness (self-hosted). Clone anthropics/defending-code-reference-harness on GitHub and run /quickstart in Claude Code. The quickstart walks you through threat modeling, sandboxed scanning, and triage on a demo target. You can run this against your own repo once you have worked through the demo. Requires Docker, gVisor for sandboxing, and an Anthropic API key.

Option 3 — Claude Security (managed product). The managed product, powered by Opus 4.7, scans entire codebases by reasoning about component interactions and data flow. Findings appear in a dashboard with confidence ratings and severity scores. It is currently in limited beta for Claude Enterprise customers, with Team and Max plan support coming soon. Nothing is applied without your explicit approval — every suggested patch is presented for human review.

There is also a claude-code-security-review GitHub Action that uses Claude to analyze pull requests for security vulnerabilities automatically — a lower-friction entry point if you want CI/CD integration without setting up the full reference harness.

For web developers building AI-powered features, combining security scanning with smart input validation is the next logical step — something we explored in our guide to building smart web forms with AI and JavaScript.

Is the cost of AI vulnerability scanning worth it?

The reference harness consumes roughly 10,000 uncached input tokens per minute and 2,000 output tokens per minute per agent. For a company with 100 developers running scans continuously, Anthropic estimates annual token costs around $2.5M. That sounds steep — until you benchmark it against incident costs. A single critical vulnerability exploited in production typically runs $1M–$10M in incident response, legal liability, customer churn, and reputation damage. From that lens, even continuous scanning is cheap insurance.

But continuous scanning is not the expectation. The practical approach is targeted: scan critical network-facing components, new features before shipping, and third-party dependencies with known vulnerability histories. Scoped scans drop token consumption by 80–90% while covering the highest-risk surface area.

The real cost insight that most coverage misses: the bottleneck has shifted. Discovery is now trivially parallelizable — you can spin up 50 find agents and cover thousands of code paths simultaneously. The constraint is now verification, triage, and patching. If your team cannot review and act on 200 findings per week, scaling the find stage further just creates noise. Size your pipeline to match your triage capacity, not your scanning ambition.

Tool	Cost	Setup	Best For
Claude Code Security Plugin	Free (existing plan)	Minutes	Real-time dev feedback in terminal
claude-code-security-review (GitHub Action)	API tokens per PR	1 hour	CI/CD automated PR scanning
Defending Code Reference Harness	API tokens (scalable)	1–5 days	Deep autonomous scanning, custom stacks
Claude Security (managed)	Enterprise/Team plan	Hours	Full codebase scan, dashboard, no DevOps

Claude Mythos Preview found vulnerabilities in every major operating system and browser, including a 27-year-old OpenBSD bug and a 16-year-old FFmpeg flaw that 5 million fuzz-test runs never caught.
As of May 22, 2026, Anthropic disclosed 1,596 vulnerabilities through open-source scanning — 97 have already been patched by maintainers.
The bottleneck in AI-powered security has shifted from discovery (now trivially parallelizable) to verification, triage, and patching — size your pipeline to your review capacity.
Three free or low-cost entry points exist today: the Claude Code security plugin, the claude-code-security-review GitHub Action, and the open-source defending-code-reference-harness.
Twelve industry partners including AWS, Microsoft, Google, and Cisco are already using Claude Mythos Preview for defensive security — the technology is production-proven, not experimental.

Frequently Asked Questions

What is Project Glasswing from Anthropic?

Project Glasswing is a cross-industry security initiative launched by Anthropic in 2026, bringing together AWS, Apple, Cisco, Microsoft, Google, CrowdStrike, NVIDIA, and others. It uses Claude Mythos Preview — a frontier AI model — to find and fix vulnerabilities in critical software before attackers can exploit them. Anthropic committed $100M in usage credits and $4M in open-source donations to the effort.

How did AI find a 27-year-old bug that humans missed?

Claude Mythos Preview discovered the OpenBSD vulnerability by reading and reasoning about code the way a skilled security researcher would — following variables across files and understanding data flow. Traditional fuzz testing had run the same code millions of times without catching the flaw. The AI’s ability to reason about code context, not just pattern-match inputs, is what made the difference.

Can I use Anthropic’s AI vulnerability scanner today?

Yes. Three options are available: (1) Clone the defending-code-reference-harness on GitHub and run /quickstart in Claude Code for a self-hosted autonomous scanner. (2) Install the free Claude Code security plugin for real-time terminal vulnerability detection. (3) Claude Security (managed, Opus 4.7-powered) is in beta for Claude Enterprise customers, with Team and Max plan support coming soon.

What is the Defending Code Reference Harness?

The Defending Code Reference Harness is an open-source framework (anthropics/defending-code-reference-harness) implementing a 7-stage autonomous scanning pipeline: build, recon, find, verify, dedupe, report, and patch. It is a customizable reference implementation — not a plug-and-play product — that you adapt for your specific codebase, language, and threat model.

Is AI vulnerability scanning worth the cost for small teams?

For most small teams, the managed options (Claude Code security plugin and Claude Security beta) cost nothing extra on existing plans. The self-hosted harness consumes API tokens — roughly $2.5M/year for a 100-developer org running continuously — but targeted scans of critical components cost a fraction of that. A single high-severity breach typically costs far more than any scanning bill.

Sources & Official References

The arrival of Project Glasswing changes the security calculus for every team that ships software. For years, the implicit assumption was that vulnerabilities would be found eventually — by a security researcher, a bug bounty hunter, or unfortunately an attacker. That assumption held because finding bugs at scale required rare human expertise. That constraint no longer exists. AI can now scan codebases continuously, reason about code context, and surface vulnerabilities that survived decades of human review. The teams that adapt first — starting with the free tools available today — will have a meaningful head start. Try the Claude Code security plugin in your next project, explore the reference harness on a low-risk internal tool, and think about where your highest-risk code actually lives. Drop a comment below or subscribe to NexGismo for weekly posts like this.