The interpreter we forgot to sandbox

Fri, 19 Jun 2026 00:00:00 +0000

I write a CLAUDE.md for every project I work on, and a small pile of other markdown files besides. They’re how I keep an AI agent on the rails: what the project is, what the conventions are, what it must never do. I lean on them heavily, I change them constantly, and… here’s the uncomfortable bit… I don’t always give a change to one the same hard look I’d give a change to the code. They look like notes. They feel like docs.

Somebody worked out that they’re not.

In May, a supply-chain campaign researchers named TrapDoor pushed 384 malicious versions of 34 packages across npm, PyPI and Crates.io. The bytes did the usual nasty things, hunting out SSH keys, AWS credentials, GitHub tokens and crypto wallets. The new trick was where it hid the instructions. The packages shipped poisoned .cursorrules and CLAUDE.md files, and the attackers also opened pull requests against real projects, LangChain, LangFlow, LlamaIndex, MetaGPT and OpenHands, under titles as innocent as “docs: add .cursorrules with dev standards and build verification”. The payload was a plain-English instruction telling your AI assistant to run a helpful-sounding “security scan” that quietly shipped your secrets to a stranger. And it was written into the file in zero-width Unicode, characters that render as nothing, so you wouldn’t see it even if you looked. Which, on a file marked “docs”, you probably didn’t.

Not a new attack, a new doorway

I want to be careful not to oversell this, because the loud version, “a terrifying new class of AI threat”, isn’t true. It’s a supply-chain attack, the same shape we’ve had for years on npm and PyPI: social engineering, plus a victim who didn’t quite do enough due diligence. I wrote a while back that nobody is coming to clean your supply chain, and nothing about TrapDoor changes that. The package is still the package.

What’s different, and worth the words, is where it goes off. A classic supply-chain payload waits for CI, or for production. This one detonates the moment you open the repository in your editor, on the one machine in the whole chain that nobody audits: your laptop.

Think about what sits on a developer’s machine. Tokens in environment variables. Cloud credentials. An SSH agent holding the keys to your git forge. A logged-in CLI for your package registry. And now an AI agent running with all of it, at your full permissions, and almost none of the guard-rails a CI runner gets. It’s the least sandboxed, most credentialed box you own, and we’ve just pointed an interpreter at it that will read and act on a file an attacker can write. Pop that one machine and you haven’t popped a machine, you’ve been handed the whole keyring and left alone in the building.

Markdown is a programming language now

Here’s the framing I keep coming back to, and I can’t unsee it now. A CLAUDE.md is to an AI agent exactly what a .py is to Python, a .js to Node, a .rb to Ruby. It is source code. The agent is the interpreter. You hand it a file of instructions and it executes them.

And I don’t say that as a complaint. That an agent will read a paragraph of plain English and just do it, no compiler, no ceremony, no forty lines of glue, is one of the more remarkable things to happen to this craft in my working life, and I lean on it every day. The catch is that the very thing that makes it marvellous, that it does what the instructions tell it, is the thing that makes a poisoned instruction file so dangerous. The power and the exposure are the same property.

The only real difference is that the language interpreters have spent decades growing rules to protect you: scopes, permissions, sandboxes, a standard library that asks before it does anything irreversible. The AI interpreter has almost none of that. It reads your prose and does what the prose says, with whatever access you happen to have, and the prose can come from anywhere. We’ve quietly built the most powerful interpreter in the stack, given it the fewest rules, and filed its source code under “documentation”.

You can’t just read it more carefully

The obvious answer is “review the file like code”, and it’s right, but TrapDoor is the reason it isn’t enough on its own. The instructions were written in zero-width Unicode. You can open the diff, read every visible word, approve it in good conscience, and merge something you were never able to see. “Docs: add dev standards” is precisely the pull request you nod through on a Friday afternoon.

So reading carefully is necessary and insufficient. You also need tooling that treats these files as executable: that flags invisible characters, diffs them as code, and refuses to let an agent act on a changed instruction file until a human has actually cleared it. I run a crude version of this already. In CI, if one of my prompt or rules files changes, no AI step is allowed to run until I’ve reviewed it by hand. It isn’t clever, but it closes the worst of the gap. Locally it’s much harder, and right now my real defence is that I’m the only contributor to most of my projects, so the audit is just me, usually noticing after the horse has bolted.

Signing won’t save you here

This is the part that stings, because I’ve spent a good chunk of this year building signing and provenance into my tools. A signature proves who published something. It says nothing about whether it’s safe. That was already true for poisoned-but-signed packages, and it lands twice as hard here: you can sign a release flawlessly, with a key the platform can’t forge, and still ship a CLAUDE.md inside it that tells the reader’s agent to rob them. A merged pull request is “signed” by the very act of merging, with perfect provenance, and the instruction in it is still hostile. Provenance is necessary. It was never sufficient, and it’s no defence at all against a payload made of sentences. A signature is only ever as good as the trust you place in the publisher.

So whose job is it?

Primarily, still ours. I said it in the supply-chain piece and I’ll stand on it: the responsibility sits with the developer doing the consuming, to pin, to read, to gate, to not run a stranger’s instructions with the keys to the kingdom in their pocket. And that gets harder, not easier, as we start consuming each other’s agent setups wholesale. The Claude skills marketplace and the things like it turn “borrow someone’s CLAUDE.md” into a one-click habit, and every one of those is unreviewed code from a stranger. Each skill needs vetting like the dependency it is.

But it isn’t only on us, and TrapDoor is the argument for better tooling. We have CVE databases, scanners and scorecards for packages, for all their flaws. We have nothing equivalent for an instruction file: no scoring, no advisory feed, no scanner that knows what a poisoned CLAUDE.md looks like. That’s a gap the ecosystem has to close, and it will, eventually. The catch is that the agent vendors will be slow about it. Sandboxing a feature people love precisely because it gets out of your way is a hard, unpopular, multi-quarter job, and I wouldn’t hold my breath.

The most dangerous machine is the one on your desk

Which is why I’m not waiting for them… and nor should you.

The most dangerous machine in your supply chain isn’t a build server or a registry. It’s the laptop you’re reading this on, and we’ve handed an AI the keys to it. The good news is that nearly everything you can do about that, you can do today, with nobody shipping you a feature first. Treat your CLAUDE.md and your rules files as source code, because they are: diff them, scan them for what you can’t see, and gate any agent run on a human clearing the change. Get your secrets out of plaintext environment variables and into something an opportunistic script can’t just read, which is exactly why go-tool-base keeps its credentials in the OS keychain. And vet a borrowed skill or rules file the way you’d vet any dependency, because that’s what it is.

None of that is new advice. It’s the same diligence the supply chain has always demanded. We just have to extend it to a file we’d decided was only documentation, running on an interpreter we forgot to sandbox.