Go on PHP Boy Scout

Reloading config without a restart

Mon, 27 Apr 2026 00:00:00 +0000

A config file changes. Someone edits a setting, rotates a credential, flips a feature flag. How does the running process find out? For most processes the answer is blunt: it doesn’t, until you restart it. For a short-lived CLI that’s completely fine. For a long-running service, “just restart it” is a much bigger ask than it sounds.

The default answer is a restart

Configuration lives in a file. The file changes: someone edits a setting, rotates a credential, flips a feature flag. How does the running process find out?

Overwhelmingly, the honest answer is that it doesn’t. A process reads its config once, at startup, and that snapshot is frozen for the life of the process. Change the file and nothing happens until you restart, at which point a fresh process reads the fresh file.

For a short-lived CLI invocation that’s completely fine. It reads config, does its job, exits, and the next invocation reads whatever the file says then. But the same frameworks are also used to build long-running services, and for a service “just restart it” is not the small thing it sounds like.

What a restart actually costs

Restarting a long-running service means every open connection drops. Any in-flight request is lost, or has to be retried by whoever sent it. Caches that took real time to warm are cold again. There’s a window, short but real, where the service simply isn’t serving.

If the thing you changed was a log level, or a feature flag, or a timeout, you’ve paid a disruption wildly out of proportion to the change. And the calculation only gets worse as the service gets more important, because the services you least want to bounce on a whim are exactly the ones that matter most.

Hot-reload: re-read in place

Hot-reload is the alternative, and both go-tool-base and rust-tool-base support it.

The process doesn’t read config once and freeze it. It watches the config file. When the file changes, it re-reads it, re-applies it, and carries on running. No new process, no dropped connections, no cold start. The change lands in the live process.

The shape is the same in both frameworks:

A file watcher notices the config file changed. Underneath, this is the operating system’s own file-notification facility, inotify on Linux and its equivalents elsewhere. rust-tool-base reaches it through the notify crate; go-tool-base, through the watcher built into Viper.
A debounce step waits for the writes to settle. Saving a file is often several separate operations, and you don’t want to reload three times for one edit.
The config is re-parsed from disk.
The new config is swapped in atomically.
Observers are notified, so the subsystems that care can react.

Steps four and five are the ones worth slowing down on, because they’re where a naive hot-reload quietly goes wrong.

The two details that make it safe

The atomic swap. You do not mutate the live config object in place. A reader on another thread, partway through reading it, would see a torn mix of old and new values, and that’s a genuinely nasty class of bug. Instead the process builds a new, complete config value and swaps the pointer to it in a single atomic operation. Any reader sees either the entire old config or the entire new one, never a blend. rust-tool-base does this with arc-swap; go-tool-base does the equivalent. Reads stay cheap and lock-free, and an update is one pointer swap.

The observer notification. Re-reading the file isn’t the end of the job. Some subsystems have to do something when config changes: a connection pool resizes, a logger changes level, a rate limiter takes a new ceiling. So a hot-reload system has to let those subsystems subscribe. rust-tool-base hands observers a watch::Receiver, a channel that always holds the latest value; go-tool-base exposes an Observable interface. A subsystem subscribes once and reacts every time config changes, for the life of the process.

Where this earns its keep: a Kubernetes pod

Hot-reload is a nicety on a developer’s laptop. Inside a Kubernetes pod it becomes genuinely valuable, and the reason is a neat fit between how Kubernetes delivers config and how a file watcher works.

In Kubernetes you don’t usually bake configuration into the container image. It lives in ConfigMap and Secret objects, and the clean way to consume them is to mount them as volumes. Mount a ConfigMap as a volume and each key becomes a file in the pod’s filesystem.

Here’s the part that connects to everything above. When you update that ConfigMap or Secret, Kubernetes does not restart your pod. The kubelet notices the object changed and rewrites the projected files inside the still-running pod. The files on disk change underneath a process that never stopped.

That file rewrite is exactly the event a hot-reload watcher exists to catch. So the whole chain becomes:

You kubectl apply an updated ConfigMap, or rotate a Secret.
The kubelet updates the projected files inside the pod.
The framework’s file watcher sees the write.
The config is re-parsed, swapped in atomically, and observers are notified.
The new configuration is live, and the pod never cycled.

You’ve changed a running service, in a running pod, with no rollout, nothing terminated and recreated, no dropped traffic. Rotate a database credential, raise a log level to debug an incident in progress, flip a feature flag: all of it live. For a service where a restart is the very thing you’re trying hard to avoid, the kind of long-running service these frameworks are built for, that’s the difference between a config change being routine and being an event.

The honest caveats

Two things, so this doesn’t read as magic.

First, not everything can be hot-reloaded. Some configuration genuinely needs a restart: the port a server binds to, the size of a thread pool, anything wired up exactly once at process start. Hot-reload covers the large category of settings a subsystem can re-read and re-apply; it doesn’t abolish restarts. A config system worth its salt is clear about which settings are live and which are not.

Second, a Kubernetes gotcha that catches people out. The in-place file update happens for ConfigMaps and Secrets mounted as volumes. Consume the same ConfigMap as environment variables instead, and those are fixed when the container starts and never update, short of a restart. If you want hot-reload in a pod, mount config and secrets as files, not env vars. And even with volumes the update isn’t instant: the kubelet syncs on a period, around a minute by default, so a reload is “within a minute or so”, not “the moment you hit apply”.

What it comes down to

A config file changes, and the default way to pick it up is to restart the process. For a long-running service that restart costs dropped connections, lost work and a cold start, often for a change as small as a log level.

go-tool-base and rust-tool-base both support hot-reload instead: a file watcher catches the change, the config is re-parsed and swapped in atomically so no reader sees torn state, and observers are notified so subsystems can react, all in a live process. The setting where it pays off most is a Kubernetes pod, where ConfigMaps and Secrets mounted as volumes are rewritten in place by the kubelet and the watcher catches that write directly. Mount them as volumes rather than env vars, allow for the kubelet’s sync delay, accept that some settings still need a restart, and within those limits “the config changed” stops meaning “cycle the pod”.

Verifying your own downloads: how I solved it for self-updating CLI tools

Fri, 24 Apr 2026 00:00:00 +0000

Way back in the introduction I promised I’d come back to the self-update integrity checks. Here we are. And the honest starting point is a slightly uncomfortable admission: for a good long while, go-tool-base’s update command was the most trusting line of code in the entire tool.

The most trusting line of code in the tool

Self-update is a lovely feature. The user runs yourtool update, the tool fetches the latest release, swaps itself out, and they’re current. go-tool-base has had this since early on, wired to GitHub, GitLab, Bitbucket, Gitea and a few others.

But look closely at what that feature actually does. It reaches out to the internet, pulls down a file, and then replaces the executable that’s currently running with that file. The next time the user invokes the tool, they’re running whatever those bytes turned out to be.

The original implementation downloaded the release asset over HTTPS and extracted it. HTTPS gets you transport security: the bytes weren’t tampered with in flight. It tells you nothing about whether the bytes were right when they left, or whether they’re even the bytes you meant to fetch. A truncated download, a CDN cache serving a mangled object, a release asset that got swapped after the fact… HTTPS waves all of those straight through. For the one operation in the whole tool that replaces the binary, “we didn’t check” is an uncomfortable place to be sitting.

GoReleaser already does half the job

The good news is that the build side was already producing exactly what I needed. GoReleaser, which builds go-tool-base’s releases, generates a checksums.txt for every release: one SHA-256 per published artefact, the same format sha256sum emits. It was sitting right there as a release asset and nothing was reading it.

So Phase 1 of the integrity work is exactly that: read it.

When update downloads the platform binary, it now also fetches checksums.txt from the same release, looks up the entry for the asset it just pulled, and compares the SHA-256 of the downloaded bytes against the expected hash before anything gets extracted or installed. Mismatch, and the update aborts before it has so much as touched the installed binary. The hash comparison runs in constant time, which is more defence-in-depth than strictly necessary here, but it costs nothing and means every hash comparison in the codebase is the same and reassuringly audit-boring.

Fail open, or fail closed?

The interesting design question wasn’t the hashing. It was: what do you do when there is no checksums.txt?

Plenty of older releases predate this feature. A release might have been cut by hand without GoReleaser. If go-tool-base flatly refused to update whenever a manifest was missing, the very act of shipping this feature would brick the update path for every existing tool the moment they upgraded into it. That’s a cure worse than the disease.

So the default is fail-open: no manifest, log a clear warning, proceed. It matches how the existing offline-update path already behaved with its optional .sha256 sidecar, and it keeps upgrades working.

Fail-open as a default is not the same as fail-open being right for everyone, though. A security-sensitive tool should be able to say “no manifest, no update, full stop”. Two ways to get there:

Tool authors flip a compile-time switch (setup.DefaultRequireChecksum = true in main()) and their binary ships fail-closed from day one.
End users override either way through config (update.require_checksum) or an environment variable.

go-tool-base itself ships with the strict setting turned on, because a tool whose entire job is being a careful framework should hold itself to the stricter bar.

The honest caveat

Here’s the part I want to be straight about, because security features oversell themselves constantly.

A checksum hosted next to the binary it describes protects you from accidents. Corruption, truncation, a CDN serving stale junk, a release asset that got partially clobbered. It does not protect you from a determined attacker who’s compromised the release platform itself. If someone can replace the binary, they can replace checksums.txt in the same breath, and your tool will cheerfully verify a malicious download against a malicious manifest and pronounce it good.

That’s not a flaw in the implementation. It’s the inherent ceiling of same-origin integrity: the manifest and the artefact share a trust root, so they fall together. Closing that gap needs a signature whose trust root is somewhere the release platform can’t reach, a key the attacker doesn’t have. That’s the next phase of this work, and it’s a bigger piece: GPG-signing the manifest, with the public half both embedded in the binary and published independently so a single platform compromise isn’t enough.

Phase 1 is the floor, not the ceiling. But it’s a floor worth having, because the overwhelming majority of real-world “the download was wrong” incidents are accidents, not attacks, and accidents are exactly what a same-origin checksum catches.

Pulling it together

The update command is the most trusting code in a self-updating tool: it fetches bytes from the internet and then becomes them. go-tool-base now verifies the SHA-256 of every self-update download against the release’s own checksums.txt before installing. It fails open by default so shipping the feature doesn’t strand anyone on an un-updatable version, fails closed for tool authors who ask (go-tool-base itself does), and stays honest that a same-origin checksum stops accidents, not a platform compromise.

Verifying your own downloads is a low bar. The point is that the previous height of that bar was zero.

What survives a port, and what doesn't

Thu, 23 Apr 2026 00:00:00 +0000

Rebuilding go-tool-base in Rust turned out to be the most honest design review I’ve ever sat through, and I didn’t have to do anything except keep going. Porting a framework into a language with completely different idioms forces a separation you can’t fake: the parts that survive the move are design, and the parts that don’t are just habit.

Two columns

When you port a system between languages that don’t share idioms, every piece of it sorts itself into one of two columns, without you having to make the call.

In the first column is the outcome a piece of the design produces: every command receives the framework’s services, configuration is layered with a fixed precedence, commands register themselves, errors carry guidance to the user. In the second column is the mechanism that produced that outcome in the original language.

Things in the first column survive the port. You rebuild them, differently, because the tool genuinely needs them. Things in the second column do not survive. You find their replacement, and the Go version turns out to have been one valid implementation of an idea, not the idea itself. Doing this for go-tool-base, mechanism by mechanism, was more honest about my own design than any amount of sitting and staring at it would have been.

The container

go-tool-base hands every command a Props struct. It carries the logger, the config, the assets, the filesystem handle. Some of it is reached through loosely-typed accessors. It works well, and I wrote a whole post about it.

The outcome is column one: a command should receive one object, and that object should carry the framework’s services so the command doesn’t go assembling them itself. That survived. RTB hands every command an App.

The loosely-typed accessors were column two. In Rust an App is a plain struct with concrete fields, each one an Arc<T> so a clone is a few atomic increments rather than a deep copy. Nothing is keyed by string. Nothing is fetched by name and asserted to a type. The thing the container is for survived; the way Go expressed it did not.

Registration

A go-tool-base command self-registers using a package-level init() function, which Go runs before main() and which appends the command to a global slice.

The outcome, column one, is that a command lives in its own file and inserts itself into the framework with no central list to edit. That’s genuinely worth keeping.

The init() mechanism is column two, and Rust doesn’t even offer it: Rust deliberately has no code that runs before main(). The replacement is link-time registration through distributed slices, which gets its own post next. Same outcome, no global mutable state, assembled by the linker rather than by a startup phase.

Configuration

go-tool-base layers configuration with a precedence: flags over environment over file over defaults. Some of it is read back through key lookups.

The layering and the precedence are column one. They survived exactly. RTB layers config with the same ordering.

The key lookups were column two. In Rust the merged configuration is deserialised into your own serde struct, so a config value is a typed field you access like any other field, and a typo is a compile error instead of a missing key at runtime. The precedence survived; reading values back out of a string-keyed bag did not.

The error path

go-tool-base routes every error through one handler so presentation is consistent, which I also wrote up.

One consistent exit for errors is column one. It survived. What didn’t survive was the handler: RTB has no error-handler object at all, because Rust’s own return-from-main convention plus a report hook does the job the handler was built to do. That one has its own post too.

What the exercise was actually worth

Every mechanism told the same story. The container, the registration, the config access, the error path, the cancellation signal that go-tool-base carries on a context.Context and RTB carries on a CancellationToken. In every case the thing it achieved walked across to Rust untouched, and the Go code that achieved it was left behind.

That’s the useful result. Before this port I couldn’t have told you, for any given pattern in go-tool-base, whether it was load-bearing design or just the idiomatic Go way to write it that day. Now I can, because each one was forced to prove itself by being rebuilt from nothing in a language that flatly wouldn’t accept the original. Whatever survived was real. Whatever I had to replace was always replaceable, which means it was never really the point.

The upshot

Porting a framework into a language with different idioms separates design from habit for free. The outcome a pattern produces is design, and it survives the move. The mechanism that produced it is idiom, and it gets left behind for the new language’s equivalent.

go-tool-base’s Props bag, its init() registration, its key-based config access and its error handler were all idiom. The single context object, self-registration, layered precedence and a consistent error exit were all design, and all four came through to RTB intact. The next three posts take the most interesting replacements one at a time, starting with how a Rust command registers itself when the language won’t run anything before main.

rust-tool-base: the same idea, in a language that argues back

Wed, 22 Apr 2026 00:00:00 +0000

I built go-tool-base because I was sick of rebuilding the same CLI scaffolding every time I started a new Go tool. You’d think that would have taught me a lesson about doing things more than once. Apparently not, because I’ve now started building rust-tool-base: the same idea, the same itch, for Rust.

In my defence, there’s method in it.

The same itch, a different language

go-tool-base exists because I kept writing the same couple of hundred lines of wiring every time I started a new Go CLI. Config loading, logging setup, an update check, an error path, a help system. None of it was the tool. All of it had to be there before the tool could be.

Lately I’ve been learning Rust, and two things collided. The first is how I tend to learn a language. I’ve always picked them up reasonably quickly, and the way I do it isn’t with a tutorial that builds a toy, it’s by rebuilding something whose shape I already know cold, so that every decision is about the language rather than the problem. The second is that every time I started a Rust CLI of any size, I hit the very same gap I’d already filled once in Go.

So rather than learn Rust on a throwaway, I decided to learn it by building rust-tool-base: the same idea, the same niche, for Rust.

The gap in Rust

The Rust ecosystem has a well-earned reputation for sharp, focused crates and a deliberate shortage of big opinionated frameworks. clap for argument parsing, figment for layered config, tracing for logging, miette for errors, ratatui for terminal UI, reqwest and tokio underneath. Each of them is genuinely best-in-class.

What nobody hands you is the assembly. Wiring those into one coherent product, and then adding self-update, AI integration, an MCP server, embedded documentation, credential handling, telemetry and a scaffolder, is real work, and it’s the same work on every project.

The closest existing neighbours stop short of it. cli-batteries is a thin preamble: argument parsing plus a logging subscriber plus panic and signal handling. starbase has a proper session and lifecycle model but is CLI-agnostic and shaped around the moonrepo tooling it came from. cargo-dist and cargo-release are about release packaging, not the runtime. Good tools, all of them, but none is the opinionated, full-lifecycle, scaffolded base that go-tool-base is in the Go world. That space is empty, and rust-tool-base is built to fill it.

Why it is not a port

The obvious way to build this would be to open go-tool-base and translate it file by file. I’m not doing that, and the reason matters enough that it’s the rule the whole project is built around.

go-tool-base is full of Go. It leans on a Props struct that carries the framework’s services in loosely-typed fields. It configures things with functional options. It registers commands using package-level init(). It threads a context.Context through every call. Those are all good, idiomatic Go. Transliterated into Rust they’d become code that argues with the compiler on every single line, because Rust has its own answers to every one of those problems and they are emphatically not the Go answers.

So rust-tool-base reaches the same outcomes by Rust’s means. Commands still self-register, but through link-time machinery instead of init(). There’s still one context object per command, but it’s strongly typed rather than a loosely-keyed bag. Configuration is still layered, but it lands in your own typed struct instead of a string-keyed lookup. Same philosophy, same shape of product, an entirely different ecosystem underneath. The README says it plainly: it’s a sibling, not a port.

Why do it twice at all

Three reasons, and they reinforce each other.

The first is plain usefulness. The next time I want a Rust CLI tool, I want the same head start go-tool-base already gives me in Go.

The second is the learning. Rebuilding a system I understand forces me to meet Rust’s idioms where they actually bite, not where a tutorial gently stages them. You learn ownership properly when a real design is pushing back at you.

The third is the one I didn’t expect, and it’s the subject of the next post. Building the same framework twice, in two languages, turns out to be the cleanest way to find out which of your original decisions were genuine design and which were merely idiom. The design survives the move. The idiom does not. Sorting one from the other has been the most interesting part so far.

Boiling it down

rust-tool-base is the Rust sibling of go-tool-base: the same batteries-included, scaffolded, opinionated CLI framework, aimed at the same gap, which in Rust is the gap between a pile of excellent crates and a coherent product.

It’s not a port. Transliterating Go idioms into Rust produces code that fights the language, so RTB reaches the same outcomes through Rust’s own mechanisms instead. The posts after this one walk through the specific cases: how commands register, how the builder works, how errors are reported, and a few things RTB can do that the Go version structurally can’t. First, though, the thing the exercise taught me about my own design.

The blank import that keeps a dependency out of your binary

Wed, 22 Apr 2026 00:00:00 +0000

go-tool-base can stash your credentials in the OS keychain, which most people building on it are perfectly happy about. But some of them ship into regulated and air-gapped environments where the binary isn’t permitted to contain keychain or session-bus code at all… not dormant, not unused, simply not there.

So I had a feature most users want and a minority must be able to provably not have. The way I ended up solving it is one of my favourite little bits of honest Go.

A feature some users have to be able to not have

go-tool-base needs somewhere to keep secrets: AI provider keys, VCS tokens, the occasional app password. The best home for those on a developer’s machine is the operating system’s own keychain. macOS Keychain, GNOME Keyring or KWallet on Linux via the Secret Service, Windows Credential Manager. So I wanted go-tool-base to support all three. (This is the keychain mode I mentioned back in the credentials post, finally getting the explanation I promised it.)

The Go library for that is go-keyring, and it’s good. The catch is what it drags in behind it. On Linux it talks to the Secret Service over D-Bus, which means godbus. On Windows it pulls wincred. Perfectly reasonable dependencies for a desktop tool.

Now here’s the constraint that made this interesting. Some of the people building tools on go-tool-base don’t ship to developer laptops. They ship into regulated sectors and air-gapped deployments where a security review will scan the binary, enumerate every dependency, and ask pointed questions about anything that does inter-process communication. For those builds, “the keychain code is there but we never call it” is not an acceptable answer. The reviewer’s position, and it’s a fair one, is that code which isn’t in the binary cannot be a finding.

So I had a feature that most users want, and a minority of users must be able to provably not have. Same framework, same release.

Why I didn’t reach for a build tag

The obvious Go answer is a build tag. Compile with -tags keychain to get it, leave the tag off to not. I started down that road. I even spent a while on an inverted version, a nokeychain tag, on the theory that the regulated build should be the one that has to ask, so a forgotten flag fails safe.

It works. It also isn’t very nice. Build tags are invisible at the call site. Nothing in the source tells you that a file only exists in some builds. The two worlds drift, because the tagged-out path isn’t compiled in your normal editor session and quietly rots. And the ergonomics for a downstream consumer are poor: every tool built on go-tool-base would have to know the right magic incantation and thread it through their own release pipeline correctly, forever.

I tried a second approach too: pull the keychain backend out into a completely separate Go module. That genuinely solves the dependency question (a module you don’t require can’t contribute to your go.sum). But a separate module for one backend is clunky. Separate versioning, separate release, separate repo, all for a single file’s worth of behaviour. It felt like using a shipping container to post a letter.

The shape that actually fits: a registry and an `init()`

The version I’m happy with leans on two boring, well-worn Go mechanisms and lets them do something quietly clever together.

First, pkg/credentials defines a Backend interface and a registry. By default the registry holds a stub backend that politely returns “unsupported” for everything. The framework only ever talks to the registered backend, whatever that happens to be.

Second, the keychain implementation lives in its own package, pkg/credentials/keychain, still inside the same module, no separate release to manage. That package has an init() that registers its go-keyring-backed backend:

//nolint:gochecknoinits // registration via import is the whole point
func init() {
 credentials.RegisterBackend(Backend{})
}

And go-keyring, godbus, wincred, the whole IPC dependency chain, are only imported by that package.

Now the trick. To switch keychain support on, you import the package. You don’t have to use anything from it. A blank import is enough, because a blank import still runs the package’s init():

// cmd/gtb/keychain.go - the entire file.
package main

import _ "gitlab.com/phpboyscout/go-tool-base/pkg/credentials/keychain"

That single line is the on/off switch for the shipped gtb binary. The blank import means init() runs, the keychain backend registers itself, and credential operations start routing through the OS keychain. No flag, no tag, no config.

The part that makes it provable

Here’s why this beats the build tag, and it comes down to one guarantee in the Go toolchain: the linker only includes packages that are actually imported.

If cmd/gtb/keychain.go exists, the keychain package is in the import graph, so go-keyring, godbus and wincred are linked in. Delete that one file and rebuild, and the keychain package is no longer reachable from main. The linker performs dead-code elimination, and the entire go-keyring chain is gone. Not dormant. Not present-but-unused. Absent from the binary.

That’s the bit a regulated build needs. It isn’t a promise that the code won’t run. It’s a structural fact that the code isn’t there, and you can hand a security reviewer an SBOM that proves it. go-keyring won’t appear, because it genuinely isn’t linked.

For a downstream tool built on go-tool-base the story is the same, and just as cheap. Want keychain support? Add the one-line blank import to your own cmd package. Must ship keychain-free? Don’t. Your binary’s dependency graph follows your import graph, exactly as Go always promised it would. The default (no import) is the locked-down one, which is the right way round for a safety property.

Why I like this more than I expected to

Build tags hide a decision in the compiler invocation. This pattern puts the decision in the source, as an import, where it’s greppable, obvious in code review, and impossible to get subtly wrong. There’s a real file called keychain.go whose entire content is one import, and it reads as exactly what it is: a switch.

It’s also just honest Go. No reflection, no plugin loader, no clever runtime. A registry, an init(), and the linker doing the one job it’s always done. The cleverness, such as it is, is in the arrangement, not in any individual piece.

Stepping back

go-tool-base needed OS keychain support for the many, and a way to provably exclude it for the few. Build tags could express the toggle but hid it in the build invocation and rotted in the dark. A separate module solved the dependency question but was far too much machinery for one backend.

Putting the keychain backend in its own package, activated by a blank import _ that fires its init(), gets you both: a one-line, in-source, code-reviewable switch, and, because the linker only links what’s imported, a build with the import omitted that contains none of the keychain dependency chain. Provable absence, not promised disuse.

If you’re carrying an optional dependency that some of your users need gone rather than merely idle, this is the pattern. Let the import graph be the feature flag.

Where should a CLI keep your API keys?

Mon, 20 Apr 2026 00:00:00 +0000

Your CLI tool needs the user’s API key. It has to come from somewhere, and it has to survive between runs, so the obvious move is to ask once and write it into the config file. One tidy api_key: line. Job done.

It works beautifully on the first afternoon. And then, months later, it’s quietly become a liability nobody actually decided to create.

The config file that quietly becomes a liability

Your CLI tool needs the user’s API key. It has to come from somewhere, and it has to survive between invocations, so the obvious move is to ask once and write it into the tool’s config file. ~/.config/yourtool/config.yaml, a nice api_key: line, done.

It works on the first afternoon. It keeps working. And then, slowly, it becomes a problem nobody decided to create.

The config file gets committed to a dotfiles repo. It gets caught in a tar of someone’s home directory that lands in a backup bucket. It scrolls past in a screen share. It sits, world-readable, on a shared build box. None of these are exotic. They’re just a Tuesday. The plaintext key was fine right up until the file went somewhere the key shouldn’t, and config files go places.

I didn’t want go-tool-base handing every tool built on it that same slow-motion liability by default. So credential handling got rebuilt around a simple idea: the config file should usually hold a reference to the secret, not the secret itself.

Three modes, and which one you get

go-tool-base supports three ways to store a credential.

Environment-variable reference, the default. The config records the name of an environment variable, not its value:

anthropic:
 api:
 env: ANTHROPIC_API_KEY

The secret itself lives in your shell profile, your direnv setup, or your CI platform’s secret store, wherever you already keep that sort of thing. The config file now contains nothing sensitive at all. You can commit it, back it up, paste it into a bug report. The reference is inert on its own.

OS keychain, opt-in. The config holds a <service>/<account> reference and the actual secret goes into the operating system’s keychain: macOS Keychain, GNOME Keyring or KWallet via the Secret Service, Windows Credential Manager.

anthropic:
 api:
 keychain: mytool/anthropic.api

This one is opt-in by design, because the keychain backend carries dependencies that some deployments simply aren’t allowed to ship. (That opt-in mechanism turned out to be an interesting little problem all of its own, and it gets its own post in a couple of days.)

Literal value, legacy and grudging. The old behaviour. The secret sits in the config in plaintext:

anthropic:
 api:
 key: sk-ant-...

It still works, because breaking every existing tool’s config on an upgrade would be its own kind of vandalism. But it’s the last resort, it’s documented as the last resort, and the setup wizard puts a warning in front of you when you pick it.

The one place literal mode is not allowed

There’s a single hard “no” in all of this. If go-tool-base detects it’s running in CI (CI=true, which every major CI platform sets) the setup flow will refuse to write a literal credential, and exits non-zero.

The reasoning is that a plaintext secret written during a CI run is a plaintext secret written onto an ephemeral, often shared, frequently-logged machine, by an automated process that no human is watching. That’s the exact situation where the slow-motion liability becomes a fast one. CI environments inject secrets as environment variables already; there’s no good reason for a tool to be writing one to disk there, so go-tool-base simply won’t.

How it decides at runtime

A credential can be configured more than one way at once. You might have an env reference and an old literal key still lurking. So resolution follows a fixed precedence, highest to lowest:

The *.env reference. If that env var is set, use it.
Otherwise the *.keychain reference. If a keychain entry resolves, use it.
Otherwise the literal *.key / *.value, the legacy path.
Otherwise a well-known fallback env var (ANTHROPIC_API_KEY and friends), so a tool still picks up the ecosystem-standard variable with no config at all.

The useful property here is that adding a more secure mode transparently wins. Drop an env reference next to an old literal key and the next run uses the env var. You can migrate a credential to a better home without first removing it from its worse one, which makes the migration safe to do incrementally instead of as one nervous big-bang edit.

The tool tells on itself

A precedence rule is no use if nobody knows their config still has a plaintext key three layers down. So the built-in doctor command grew a check for exactly that. Run doctor, and if any literal credential is sitting in your config it reports a warning, names the offending keys (the key names, never the values) and points you at how to migrate.

It’s not an error. Literal mode is still legal. But the tool will quietly keep reminding you that you left the campsite messier than you could have, until you go and tidy it. (Old Scout habits die hard, and they’ve leaked all the way into the framework.)

The gist

A CLI tool that writes your API key into a plaintext config file isn’t doing anything wrong, exactly. It’s just handing you a liability that activates later, when the file travels somewhere the key shouldn’t. go-tool-base’s answer is three storage modes: an env-var reference by default, the OS keychain on request, and a plaintext literal only as a documented last resort that CI environments can’t use at all. Runtime resolution runs in a fixed precedence so a more secure mode always wins, which makes migrating a credential safe to do gradually. And doctor keeps an eye on the config so a stray plaintext secret doesn’t get to hide forever.

The secret should live in a secret store. The config file should just know its name.

I had the framework audited: every finding was the same shape

Fri, 17 Apr 2026 00:00:00 +0000

When a real security audit lands back in your inbox, the temptation is to read it as a shopping list of unrelated mistakes. Fix one, fix the next, tick them off, move on. I did exactly that the first time. The second time, I noticed something far more useful: the findings weren’t scattered at all. They clustered. Almost every one was the same sentence with the nouns swapped out.

Findings cluster, they don’t scatter

When you get a real security audit back, the instinct is to read it as a list of unrelated mistakes. Finding 1, unrelated to Finding 2, unrelated to Finding 3. Triage each, fix each, move on.

That’s not what the go-tool-base audits looked like once I stopped reading them as a list. The findings clustered. Strip away the specifics and almost every one was the same sentence with the nouns swapped: untrusted input reaches a powerful operation, and nothing checks it in between.

That reframe is worth more than any individual fix, because it turns “we patched some bugs” into “we know where to look next time”. A framework’s attack surface isn’t spread evenly. It’s concentrated at the boundaries: the handful of points where data from outside (a config file, a command-line flag, something typed into a TUI, an HTTP response) flows into machinery that can be made to misbehave. Audit the boundaries and you’ve audited most of the risk. Three examples make the pattern obvious.

Boundary one: a regex compiler

Somewhere in the tool, a user-supplied string gets compiled into a regular expression. A search pattern typed into the docs browser, a filter from a config file. Feeding user input to regexp.Compile feels harmless. It’s just pattern matching, after all.

It isn’t quite harmless. A regular expression is a tiny program, and some tiny programs are catastrophically slow. A pattern with the wrong kind of nested repetition can take exponential time to evaluate against a modestly hostile input. That’s the class of bug known as ReDoS. A user, or something feeding the user’s config, hands you a pathological pattern and your tool wedges, burning a whole core, on what looked for all the world like a search box.

The fix isn’t to ban user-supplied regexes. It’s to stop treating “compile this string” as free. go-tool-base routes any regex whose pattern came from outside the binary through a regexutil.CompileBounded helper. It caps the pattern length and puts a hard timeout on compilation. A pattern known at build time can still use plain regexp.MustCompile, because that isn’t a boundary, it’s a constant. The discipline only applies where the input genuinely crosses in.

Boundary two: a URL opener

The tool needs to open a URL in the user’s browser, a docs link or an OAuth flow. Under the hood that’s the OS handler: xdg-open, or open, or rundll32.

Now ask where the URL came from. If any part of it is influenced by config, by a server response, by user input, then “open this URL” has quietly become “ask the operating system to do something with an attacker-influenced string”. A file:// URL. A javascript: URL. Something with control characters smuggled into it. The browser-open was never the dangerous part. The unvalidated string was.

So go-tool-base funnels every URL-open through one package, pkg/browser, and that package is a gate. It enforces an allowlist of schemes (https, http, mailto, and nothing else), bounds the length, and rejects control characters before the OS ever sees the string. The rule that makes it stick is that nothing else is allowed to call the OS handler directly. One door, and the door has a lock. A scattered capability with no chokepoint can’t be secured; a capability that has a chokepoint can. (You’ll have spotted the “one door out” idea by now… it’s the same instinct as the single error handler, pointed at security instead of consistency.)

Boundary three: a log sink

This one’s the sneakiest, because it runs the wrong way round. The first two boundaries are about dangerous input coming in. This one is about sensitive data leaking out.

The tool handles credentials. It also logs, emits telemetry, and reports errors, and all three of those are exit boundaries: places where strings leave the process for somewhere more persistent and more public, like a log aggregator, an analytics backend, an error tracker. If a token ever ends up in a string that flows to one of those, you haven’t logged an event, you’ve published a secret.

The defence is pkg/redact. Any free-form string heading for an observability surface goes through it first, and it strips the usual suspects: credentials in URL userinfo, sensitive query parameters, Authorization headers, the well-known provider key prefixes (sk-, ghp_, AIza and friends), long opaque tokens. The places most likely to leak, command arguments and error messages in telemetry, get it applied automatically rather than relying on every caller to remember.

Same pattern as the other two. A boundary, and something standing on it checking what goes through.

The unglamorous part

None of these fixes is clever. There’s no exploit demo, no neat trick to show off. Bound a length. Check a scheme against an allowlist. Run a string through a redactor. The work was almost entirely in noticing the boundary existed, and then making sure everything routes through the one checked path instead of dotting raw calls all over the codebase.

That’s the actual lesson of a security audit, and it’s why the cluster reframe matters. The value wasn’t the dozen-or-so individual fixes. It was learning that the next risk will be at a boundary too, the next place untrusted input meets a powerful operation with nothing in between, and that the job is to find those points and put a single, mandatory, checked door on each.

To sum up

A security audit of a CLI framework reads like a list of unrelated bugs and isn’t one. go-tool-base’s findings nearly all reduced to the same shape: untrusted input reaching a powerful operation unchecked. A regex compiler that needed a length and time bound (regexutil.CompileBounded). A URL opener that needed a scheme allowlist and a single chokepoint (pkg/browser). Log and telemetry sinks that needed credentials redacted on the way out (pkg/redact).

The fixes were structural and dull, which is exactly right. Find your boundaries (config, flags, TUI input, network responses, log and telemetry sinks), give each one a single mandatory checked path, and you’ve spent your audit effort where the risk actually lives.

The test-mocking pattern that races

Thu, 16 Apr 2026 00:00:00 +0000

I’m going to tell you about a bug go-tool-base shipped, because it’s one of those bugs that’s so reasonable-looking you’ll find it in textbooks, conference talks, and an awful lot of otherwise excellent Go code. We had it too. It passed every test on my laptop, every single time, and then quietly fell over on CI while blaming an innocent bystander.

It’s the classic Go trick for mocking a dependency, and it races.

A pattern that looks completely reasonable

Here’s a thing you need to do constantly in Go tests: stop a function from really shelling out. It calls exec.LookPath to find a binary, or exec.Command to run one, and your test very much does not want it touching the real $PATH or spawning a real process.

The Go community has a well-worn answer. Hoist the function into a package-level variable, call that, and let tests reassign it:

// production code
var execLookPath = exec.LookPath

func findTool() (string, error) {
 return execLookPath("sometool")
}

// test
func TestFindTool(t *testing.T) {
 old := execLookPath
 defer func() { execLookPath = old }()
 execLookPath = func(string) (string, error) {
 return "/fake/path", nil
 }
 // ...assert...
}

It’s tidy. No interface to thread through, no constructor to change. You’ll find it in a great deal of Go code, including some very respectable Go code indeed. go-tool-base had it too.

And it works. It works on your machine, it works in code review, it works the first hundred times CI runs it. Which is precisely what makes it dangerous, because it’s wrong, and it’s just been biding its time.

Add one line and it detonates

Go’s t.Parallel() is more or less free performance. Mark your tests with it and the runner overlaps them instead of plodding through one at a time. On a package with a few hundred tests it’s a real, worthwhile speed-up, so naturally you reach for it.

Now picture two tests, both using the pattern above, both marked t.Parallel(). They run concurrently. Test A assigns its fake to execLookPath. Test B assigns its fake to execLookPath. Test A reads execLookPath expecting its own fake. Two goroutines, one variable, writes and reads with nothing synchronising them. That’s a textbook data race, and the textbook is right: the behaviour is undefined. Test A might see B’s fake. The deferred restore might land in the wrong order and leave the variable pointing at a fake after both tests have finished, poisoning a third one for good measure.

The truly nasty part is the intermittency. Whether the race actually bites depends on goroutine scheduling, which depends on machine load and core count. Your laptop running eight tests at once might never lose the coin-toss. A CI runner under load, scheduling differently, loses it and fails a test that has nothing obviously to do with the change in the commit. You re-run the pipeline, it passes, everyone shrugs and moves on. A test suite that fails one run in twenty trains your team to ignore it, and an ignored CI failure is worse than no CI at all.

I can tell you this one from direct, slightly embarrassed experience, because go-tool-base shipped exactly this bug and CI caught it the honest way: green on the laptop, red on the runner, with the failure cheerfully pointing at innocent bystander tests rather than the global that was actually the culprit. go test -race will name it for you if you crank the parallelism up high enough to lose the toss reliably… but you have to go looking, and you only go looking once it’s already ruined an afternoon.

The fix isn’t synchronisation, it’s structure

The instinct is to slap a mutex around the variable. Resist it. A mutex makes the race defined, but it doesn’t make the design any good. You’ve still got global mutable state, you’ve just queued the fight instead of cancelling it. And tests that serialise on a shared lock aren’t really parallel any more, so you’ve also handed back the speed-up you came for in the first place.

The real fix is to not have a shared variable at all. The dependency was always an input to the code; the package-level var was just a way of avoiding saying so out loud. So say it. Inject it.

A struct field:

type Finder struct {
 lookPath func(string) (string, error) // defaults to exec.LookPath
}

func (f *Finder) find() (string, error) {
 return f.lookPath("sometool")
}

Or a functional option, if you’d rather keep the zero value clean. Either way, each test constructs its own Finder with its own fake. There’s no shared variable, so there’s no race, and t.Parallel() is free again because the tests genuinely don’t touch each other.

go-tool-base wrote this straight into its standing rules: no package-level mocking hooks, full stop. Dependencies come in through struct fields, functional options, or config fields. (The same injection discipline that makes Props so testable, applied one rung further down.) And to stop everyone hand-rolling the same exec fakes, there’s a small internal package, internal/exectest, with ready-made LookPath and CommandContext doubles you construct per-test. The pattern is gone, and the door it came in through is shut.

The rule worth taking away

A package-level variable that tests reassign is shared mutable state. It reads as a harmless convenience because in a single-threaded test run it behaves like one. t.Parallel() is the thing that reveals it was never harmless, only unobserved.

The general lesson is older than Go: if a value is an input to your code, make it an input. Smuggling it in as a global is borrowing test-time convenience against a debt that comes due, with interest, the day someone wants their tests to run in parallel. Pay cash. Inject the dependency.

Worth remembering

Mocking via a reassignable package-level variable is a beloved Go shortcut and a latent data race. It survives because single-threaded test runs hide it; t.Parallel() exposes it as intermittent, bystander-blaming CI flake that’s miserable to trace. A mutex only makes the bad design defined. The fix is structural: inject the dependency as a struct field or functional option, so each test owns its own double and there’s no shared state to race over. go-tool-base banned the global-hook pattern outright and ships internal/exectest so nobody’s tempted back to it.

If a piece of code depends on something, let it say so in its signature. Your future self, staring at a CI failure that flatly refuses to reproduce, will thank you.

Testing code that calls an LLM: yes, you actually can

Wed, 08 Apr 2026 00:00:00 +0000

“You can’t test code that calls an AI.” I’ve heard it said with great confidence, and it’s half right, which is the most dangerous kind of right. You genuinely can’t assert on what a non-deterministic model says. But the model isn’t your code, and the bits sitting either side of it most certainly are.

“You can’t test AI code”

It’s a fair worry. Your command calls an LLM. The LLM returns something slightly different every run. A test that asserts response == "..." is broken before you’ve finished typing it. So the conclusion arrives quickly: the AI path can’t be tested, leave it uncovered.

Which is a shame, because the AI call is usually the riskiest line in the whole command.

The conclusion is also wrong. It mistakes “I can’t test the model” for “I can’t test my code”. The model is not your code. Your code is the two pieces sitting on either side of it.

Your code is a prompt and a handler

Strip the command down to what it actually does:

It builds a prompt. It assembles a system prompt, the user’s input, perhaps some context, and sends it.
The model does something. This is not your code.
It takes the response and does something with it. It parses it, branches on it, prints it, stores it.

Steps one and three are entirely yours, and entirely deterministic. The same inputs build the same prompt and handle the same response the same way, every single time. That’s testable. Step two is the only part that isn’t, and step two was never yours to test in the first place.

So the job is to pin step two to a known value, and then test one and three properly.

Test the prompt: snapshot it

Step one produces a prompt, and a prompt is just a string, which means you can pin it.

Both frameworks lean on snapshot testing here. go-tool-base uses a golden-file approach: the prompt your code generates is recorded to a file, and the test re-generates it and compares against that file. rust-tool-base does the same with insta, snapshotting the request body the client would send.

The reason this matters is that the prompt is load-bearing and quietly easy to break. You refactor how context gets assembled. Without noticing, you’ve changed the wording, or the ordering, or dropped a line the model was leaning on. Nothing fails to compile. The behaviour just drifts, silently.

A snapshot test catches exactly that. It fails, it shows you the diff between the old prompt and the new one, and it makes you stop and make a decision. Was this change intended? If yes, you accept the new snapshot and move on. If no, you’ve just caught a bug before it shipped. Either way the prompt never changes by accident, which for AI code is most of the battle.

Test the handler: mock the response

Step three needs a response to handle, and in a unit test you don’t get that response from the real model. You supply it.

go-tool-base ships generated mocks for the ChatClient interface. A test builds a mock client, tells it “when Ask is called, return this canned value”, and runs the command against it:

mockClient := mock_chat.NewMockChatClient(t)
mockClient.EXPECT().
 Ask(mock.Anything, mock.Anything, mock.AnythingOfType("*main.Analysis")).
 RunAndReturn(func(_ context.Context, _ string, target any) error {
 *(target.(*Analysis)) = Analysis{Severity: "critical"}
 return nil
 })

Because the interface is only four methods, that mock is trivial to set up and complete by construction. rust-tool-base takes the same idea one layer down: HTTP-bound tests use wiremock, which stands up a fake server returning a canned response body. The client makes a real HTTP request; it just goes to a fake endpoint the test controls.

Either way, step two is now fixed to a value you chose, which makes step three deterministic. And that unlocks the tests that actually matter: given a malformed response, does the command fail gracefully? Given a rate-limit error, an empty answer, a field missing? Those are the cases a live model almost never hands you on demand, and a mock hands you every time, on the first run.

This is, incidentally, the same discipline as the test-mocking work elsewhere in the framework: the dependency is injected, so the test gets to decide what it does.

What you deliberately don’t test

One honest boundary. None of this tests whether the model gives good answers. That question is real, but it’s a different activity (evaluations, run as their own suite) and not something to mix into the unit tests.

The unit suite’s job is your code: that it builds a sound prompt, and that it handles every shape of response correctly, including the ugly ones. Keep that well away from “is the model clever today”. A unit test that depends on the model being clever is a unit test that fails when the weather changes, and a flaky test just teaches people to ignore the whole suite.

What it comes down to

Code that calls an LLM is testable; the model is not, and those are different statements. Your code is a prompt builder and a response handler, both deterministic, with the model sat in between.

go-tool-base and rust-tool-base converge on the same approach. Snapshot the prompt, with golden files or insta, so a refactor can’t change what you send without a test noticing. Mock the response, with generated ChatClient mocks or a wiremock server, so tests run with no network and you can feed in the malformed and error cases a real model won’t reliably produce. Leave “are the answers any good” to a separate evaluation suite. Test the two halves you own, and the non-determinism in the middle stops being an excuse to leave the riskiest line uncovered.

The AI provider that isn't an API

Mon, 06 Apr 2026 00:00:00 +0000

go-tool-base’s chat package puts five AI providers behind one interface. Four of them are exactly what you’d guess: HTTP calls to OpenAI, Claude, Gemini, and anything OpenAI-compatible. The fifth one isn’t an API at all. It shells out to a binary.

That sounds like a slightly mad thing to want, right up until you’ve worked somewhere the network says no.

The fifth provider shells out

The chat package speaks to five providers through one ChatClient interface. Four of them are what you’d expect: HTTP requests to OpenAI, to Claude, to Gemini, to any OpenAI-compatible endpoint. The tool author picks one in config, and the rest of the code never knows the difference.

The fifth, ProviderClaudeLocal, is different in kind. It doesn’t make an HTTP request at all. It shells out. It runs the claude CLI binary as a child process, passes the prompt in, and reads the answer back from the binary’s output.

That sounds like an odd thing to want until you’ve been stuck in the environment it was built for.

Why you’d want that

Picture a corporate network with its egress locked right down. Outbound HTTPS to api.anthropic.com is blocked by policy. A tool built on go-tool-base that uses AI would simply fall over there. It tries to reach the API, there’s no route, and that’s the end of the feature.

But the developer at that machine has the claude CLI installed, and has run claude login. That binary is permitted. It’s an approved, managed tool, and it has its own sanctioned path out. The direct API call is blocked; the claude command is not.

ProviderClaudeLocal is what bridges those two facts. If your tool’s AI calls go through that already-blessed binary instead of straight at the API, they work, in an environment where the direct call cannot. That’s the whole reason the provider exists. It isn’t faster (a real API call has lower latency) and it isn’t more capable. It’s for the place where the API call simply isn’t an option, and “isn’t an option” is a surprisingly common place to find yourself inside a large organisation.

What it costs, honestly

It’s worth being straight about the trade, because ProviderClaudeLocal is the reduced-capability provider.

It doesn’t do tool calling. It doesn’t do parallel tools. It doesn’t stream. Those need a live, structured connection to the model’s API, and a subprocess that runs once and prints an answer is not that. What it does support is plain chat and structured output, the latter through the binary’s own --json-schema flag.

So the honest positioning, and the package’s documentation says exactly this, is: prefer the API providers when you can reach them, because they’re lower latency and feature-complete. Reach for ProviderClaudeLocal when API access is restricted. You accept the narrower capability set as the price of working at all. For a tool whose AI feature is “answer a question” or “return a structured analysis”, that price is often nothing you’d even notice. For one built on an agentic tool-calling loop, it’s a real limitation, and you’d know to expect it.

How it stays behind the same interface

Here’s the part that makes it pleasant rather than a special case to maintain. Despite being a subprocess and not an API, ProviderClaudeLocal is still a ChatClient. Your feature code calls Chat and Ask exactly the way it would for any other provider.

Everything that makes a subprocess provider awkward stays inside the provider. Spawning the binary, feeding it the prompt, parsing its output, capturing stderr and surfacing it when the binary exits non-zero, and threading multi-turn continuity through session identifiers passed back on the next call with --resume: all of that is the provider’s problem, and all of it sits behind the interface. The code in your tool that uses AI doesn’t know, and has no way to find out, that this particular provider is a child process rather than an HTTPS call.

That’s a unified interface genuinely earning its place. It’s easy to put a uniform face on four things that already work the same way underneath. The real test of the abstraction is whether something that works in a completely different way, a subprocess instead of a socket, can still slot in without the caller changing a line. Here it can. You swap one config value, and a tool that talked to an API now talks through a binary, and nothing downstream so much as blinks.

The bottom line

go-tool-base’s chat package puts five providers behind one ChatClient interface, and ProviderClaudeLocal is the one that isn’t an API. It runs the locally installed, pre-authenticated claude CLI as a subprocess.

It exists for the locked-down environment where outbound HTTPS to the AI API is blocked but the claude binary is allowed: there, AI features keep working where a direct call would fail. The trade is a narrower capability set (no tool calling, no streaming, plain chat and structured output only) so you prefer the API providers when you can reach them and fall back to this when you can’t. And because it’s still a ChatClient, all the subprocess machinery stays hidden, and your code uses it without knowing it’s there. That last part is the real test of an abstraction: a provider that works in an entirely different way still slots in unchanged.

AI conversations you can resume

Sat, 04 Apr 2026 00:00:00 +0000

An AI conversation is, fundamentally, its own history. The model’s next answer depends on everything said so far. And a CLI tool, by its very nature, forgets everything the moment it exits. Put those two facts together and you get the problem: run an AI command, exit, run it again, and you’re talking to someone who’s never met you.

A CLI forgets everything

A long-running service keeps its state in memory for as long as it runs. A CLI tool doesn’t get that luxury. It starts, does one thing, exits. The next invocation is a brand-new process with no memory of the last one.

For most commands that’s exactly right, and you wouldn’t want it any other way. But an AI conversation is a different kind of beast, because a conversation is its history. The model’s next answer depends on everything said so far. Run an AI command, exit, run it again, and you’ve started a fresh conversation with someone who’s never met you. For an interactive assistant, or any AI workflow that unfolds across several invocations, that’s plainly the wrong behaviour. The user expects to pick up where they left off.

Save and restore

The chat package handles this through a PersistentChatClient interface. Like streaming, it’s an optional capability discovered with a type assertion, sitting beside the four-method core rather than bloating it. A client that supports persistence also satisfies this interface:

if pc, ok := client.(chat.PersistentChatClient); ok {
 snapshot, err := pc.Save()
 // store the snapshot somewhere
}

A snapshot is a serialisable value that captures the conversation. You store it. Next run, you load it, Restore it onto a fresh client, re-register your tools, and call Chat again. “Where were we?” works, because the model is handed back the whole history.

A snapshot is opinionated about what it carries

The interesting part is what a snapshot does and doesn’t contain, because that’s a series of deliberate decisions.

It carries the messages, the system prompt, the model name, and tool metadata: the names, descriptions and parameter schemas of the tools that were registered.

It does not carry tool handlers. Handlers are code, not data; you can’t serialise a function meaningfully, so after a restore you re-register them with SetTools. The snapshot remembers that a tool called read_file existed and what its shape was; it doesn’t try to remember the Go function behind it.

And it does not carry API tokens. This is the one to dwell on. A snapshot is a file. A file gets synced, backed up, copied between machines, attached to a support ticket by a user trying to be helpful. A snapshot that carried the API key would be a credential leak the moment it left the laptop it was made on. So the snapshot never contains a token, at all. On restore, the client picks the credential up again the ordinary way, from the environment or the keychain. The conversation and the secret are kept in separate places on purpose, and only one of them is ever in the file.

Encrypted at rest, if you want it

The package ships a FileStore that writes snapshots as JSON files, with 0600 permissions in a 0700 directory, and it can encrypt them. Pass WithEncryption a 32-byte key and snapshots are written with AES-256-GCM.

That option exists because a conversation can hold sensitive content even when it holds no credential. The log a user pasted in for analysis, the source file they asked the model to review, the internal details tucked into their questions: none of that is an API key, and all of it might be something you’d rather not have sitting in plain JSON in a backup somewhere. Encryption at rest covers it.

The FileStore is also careful about the snapshot identifiers it’s handed. An ID has to be a canonical UUID, and the resolved file path is checked to lie inside the store directory, so a snapshot ID arriving from an untrusted source (a CLI flag, a request payload) can’t be bent into a path-traversal that reads or writes somewhere it shouldn’t. Persisting conversations adds a small filesystem surface, and the store treats it as exactly that.

The short version

A CLI tool forgets everything between invocations, which is correct for most commands and wrong for an AI conversation, because a conversation is its history.

go-tool-base’s chat package lets you persist one. PersistentChatClient saves a snapshot you can store and restore later, picking the conversation back up where it ended. The snapshot is deliberate about its contents: messages, system prompt and tool metadata yes; tool handlers no, because they’re code you re-register; API tokens never, because a snapshot is a file and a file travels. The built-in FileStore can encrypt snapshots at rest with AES-256-GCM and validates snapshot IDs against path traversal. Resumable conversations, without the conversation file turning into a place secrets leak from.

An AI agent that has to make the build pass

Thu, 02 Apr 2026 00:00:00 +0000

Most AI code generation works on a charming little principle I’ll call generate-and-hope. The model writes the code, the model stops at the closing brace, and whether the thing actually compiles is left as an exercise for you. For a snippet you paste into an editor, fine. For a whole generated command, that’s just outsourcing the disappointment.

go-tool-base does something I’m rather happier with: the AI has to make the build pass before it’s allowed to claim it’s done.

Generate and hope

The usual shape of AI code generation is this. You ask for code, the model produces it, and the model’s job ends at the closing brace. Whether it compiles, whether the tests pass, whether the imports even resolve, none of that has been checked. The model produced something that looks right. You find out whether it is right when you build it.

For a snippet you paste into an editor, that’s perfectly fine. The compiler tells you in a second. But go-tool-base’s generator, driven by gtb generate command --script or --prompt, produces a whole command: the implementation, its tests, the lot. “Generate and hope” at that scale means handing the user a project that may or may not build, and quietly making them the one who finds out which.

Drafting is only step one

So the generator doesn’t stop at drafting. Writing the first version of the implementation and its tests is step one of two. Step two is an autonomous repair agent.

Once the draft is on the filesystem, a separate agent takes over. It’s an LLM running in a loop, but a loop aimed at one narrow, checkable job: make this project build and pass its tests. It isn’t asked to be creative. It’s asked to get to green.

A fixed set of tools, and no shell

The agent is not handed a shell. It’s given a fixed, defined set of tools and nothing else. Three of them let it explore and edit the project: list_dir, read_file, write_file. Four of them let it verify the project:

go_build runs the build and captures the compiler errors.
go_test runs the tests and captures the failures.
go_get resolves a missing dependency.
golangci_lint runs the project’s linter.

That restriction is the design, not a limitation of it. The agent can’t delete arbitrary files, can’t reach the network, can’t run anything that isn’t on the list. It has exactly what it needs to make code compile and nothing it would need to do damage. Its file writes are confined to the project directory by an explicit path check, so even write_file can’t go wandering up into /etc. A coding agent you’d actually let near a filesystem is one whose abilities are an allowlist, not a denylist. (I keep coming back to that principle through this series… safety as a boundary you draw, not a behaviour you hope for.)

The loop

The repair loop is a ReAct loop, the same reason-act-observe shape as the tool-calling loop, only this time pointed at a goal:

The draft is on disk.
Verify: run go_build and go_test.
If verification failed, read the error logs, the compiler error or the failing test.
Reason about the cause: an undefined variable, a missing import, a wrong signature.
Act: call write_file to patch the code, or go_get to add the dependency.
Loop. Steps two to five repeat until the project is green, or the agent hits its step limit, which defaults to 15.

What makes this work is treating the error output as feedback rather than as a failure to log and walk away from. A compiler error is the single most useful sentence you can hand a model that’s trying to fix code. It says what’s wrong, and usually where. The loop feeds it straight back in, and the model fixes against it.

Verification changes what “done” means

Here’s the real shift, and the agent’s own documentation puts it well: the agent “doesn’t just say it fixed a bug; it uses a Test tool to verify the fix before reporting success.”

A generate-and-hope model reports success when it finishes writing. It has no idea whether the code works, and it isn’t really claiming otherwise. “Done” means “I produced text”. The repair agent reports success when go_build and go_test actually pass. “Done” means “the build is green”. Those are two completely different claims, and only the second is worth anything to the person who asked for the command.

That’s the line between an AI that’s a creative writer and an AI that’s a collaborator you can hand a task to. And when the agent can’t reach green, when it spends its whole step budget and the project is still broken, the generator fails safely: it leaves the best-attempt code in place, commented out so the project still compiles, and tells the user what to finish by hand. There’s also an --agentless flag for anyone who’d rather have a plain single-shot retry than the multi-step agent. The default, though, is the agent, because the default should be code that’s been checked.

Where this leaves us

Most AI code generation generates and hopes: the model writes code and the user discovers whether it works. For a whole generated command, that pushes a may-or-may-not-build project onto the user.

go-tool-base’s generator drafts the command and then hands it to an autonomous repair agent. The agent has a fixed set of tools (explore and edit the project, build it, test it, lint it, fetch dependencies) and no shell at all, with file writes confined to the project directory. It runs a ReAct loop, reading each error and patching against it, until the build is green or it exhausts its steps. The point is what “done” comes to mean: not “the model finished writing”, but “the build passes”. Only one of those is a claim worth trusting.

Stop regex-ing the LLM's prose

Tue, 31 Mar 2026 00:00:00 +0000

Ask an LLM a question and it hands you back prose. Lovely to read, miserable to program against. You wanted the one number buried in the middle of it, and now you’re writing a regular expression to fish a word out of three well-written paragraphs that phrase themselves slightly differently every single time you run them.

There’s a much better way, and it’s the difference between forever interpreting an LLM and actually building on one.

The problem with a paragraph

You ask an LLM to analyse a log file and tell you the severity of what it found and a suggested fix. It comes back with three well-written paragraphs. Somewhere in there is the word “critical”, and somewhere is the fix.

Your program now has to extract those two facts from prose, and prose has no contract. The next run, the model phrases it differently. It leads with a caveat. It says “severe” where last time it said “critical”. It puts the fix first. Anything that worked by finding “critical” in the text is now quietly wrong, and you didn’t change a line. Parsing free text for structured facts is a game you lose slowly.

What you actually wanted was never a paragraph. It was a value: a thing with a severity field and a fix field, that you can branch on and store and pass around like any other.

Ask for the struct, not the prose

go-tool-base’s chat package draws the line with two methods. Chat gives you text. Ask gives you a struct.

You define the Go type you want back:

type Analysis struct {
 Severity string `json:"severity"`
 Fix string `json:"fix"`
}

var result Analysis
err := client.Ask(ctx, "Analyse this log file: "+logText, &result)

The framework generates a JSON Schema from that struct, sends it to the model as the required response format, and unmarshals the reply straight into result. You never lay a finger on the prose. You get result.Severity and result.Fix, typed, ready to use. If you want the model’s answer to drive a switch statement, this is the method that lets it.

The struct is the schema is the contract

The detail that makes this hold up over time: you don’t write the schema. The struct is the schema.

The framework derives the JSON Schema from your type. In go-tool-base that’s GenerateSchema[T](); in rust-tool-base the schema comes from your Rust type through schemars. (Yes, there’s a Rust sibling now. I’ll introduce it properly in a few weeks, but it keeps gatecrashing these posts because the two frameworks deliberately share ideas.) Either way there’s one definition, your type, and the schema is just a projection of it.

That matters, because otherwise two things have to agree. There’s the schema you tell the model to obey, and there’s the type you unmarshal the answer into. Hand-write the schema and those two can drift: add a field to the struct, forget to add it to the schema, and the model is never told to produce it, so it silently never appears. Deriving the schema from the type collapses the two into one. They can’t disagree, because there’s only one of them.

Both frameworks, with one extra step in Rust

go-tool-base does this with Ask and a ResponseSchema set on the client config. rust-tool-base does it with chat_structured::<T>, where T is any type that’s both deserialisable and JsonSchema.

rust-tool-base adds one step worth calling out. Before it deserialises the model’s reply into your T, it validates the raw response against the schema with a JSON Schema validator. That splits the failure into two distinct, named cases: the response didn’t match the schema, or it matched the schema but still wouldn’t deserialise. A model that returns subtly wrong JSON fails loudly and specifically, with an error that tells you which of those happened, instead of quietly handing you a zero-valued struct that you end up debugging an hour later.

When you’d reach for it

The line is simple, and it’s about who reads the answer.

If a human reads the answer, prose is right. Chat, free text, let the model write well. A summary, an explanation, an interactive reply: leave all of those as prose.

If a program consumes the answer, you want a value. Classification, extraction, a code review scored out of a hundred with a list of issues, a yes-or-no with reasons: anything where the next thing that happens is your code branching on the result. There, Ask and chat_structured turn the LLM from something you have to interpret into something that returns a value, and a typed value is a thing you can actually build on.

To sum up

An LLM returns prose by default, and prose has no contract, so a program that picks structured facts out of it breaks the moment the model rephrases.

Structured output asks for the value instead. You define a struct, the framework derives a JSON Schema from it, the model is constrained to that shape, and you get a typed result. go-tool-base’s Ask and rust-tool-base’s chat_structured both work this way, with the schema derived from your type so the schema and the type can’t drift; rust-tool-base additionally validates the response against the schema before deserialising. Use it whenever the answer feeds code rather than a human. It’s one of the four methods that make up go-tool-base’s small chat interface, and it’s the one that makes an LLM safe to program against.

Telemetry that asks first

Mon, 30 Mar 2026 00:00:00 +0000

Usage telemetry is genuinely useful. Knowing which commands people actually run, where the errors cluster, whether anyone ever touched the feature you spent a fortnight on… that’s the stuff that makes you a better maintainer. Wanting it is completely legitimate.

The trouble is that the usual way of getting it, on by default and quietly hoovering up everything, is a small betrayal of the people who installed your tool to get a job done. I wasn’t willing to build that, so go-tool-base’s telemetry starts from a different question.

The data you want, and the line you shouldn’t cross

If you maintain a tool, you want to know how it’s actually used. Which commands matter and which are dead weight. Where the error rate spikes. Whether anyone touched the feature you spent that fortnight on. That information makes you a better maintainer, and, to say it again, wanting it is completely legitimate.

The trouble is the standard way of getting it. Telemetry on by default. An opt-out buried three levels down in a settings file nobody reads. And once it’s running, it quietly collects far more than it ever admitted to: the arguments people passed, the paths they were working in, an IP address for good measure.

Every one of those is a small betrayal of someone who installed your tool to get a job done, not to become a data point. And the cost when users notice isn’t a slap on the wrist. It’s trust, and trust in a developer tool does not grow back quickly. A tool that surprises you once with what it was quietly collecting is a tool you uninstall and warn your colleagues about.

So go-tool-base’s telemetry started from a different question. Not “how do we collect the most data” but “how do we collect useful data without ever putting the user in a position they didn’t choose”.

Rule one: it is off until you say otherwise

The foundation is the simplest possible rule, and it’s absolute. Telemetry is never enabled by default. A freshly installed tool built on go-tool-base sends nothing. Not a heartbeat, not a ping, nothing at all.

It only starts collecting when the user makes an explicit, visible choice to let it. Three honest doors: they run telemetry enable, they say yes to a clear prompt during init, or they set TELEMETRY_ENABLED themselves. All three are deliberate acts. None of them is a pre-ticked box or a default they have to discover and then undo.

This is opt-in, and the distinction from a well-hidden opt-out is the entire point. Opt-out telemetry treats consent as something to be assumed and grudgingly reversed. Opt-in treats it as something that has to be given. Only one of those is actually consent.

Rule two: no personally identifiable information, full stop

Consent to “some telemetry” is not consent to “any telemetry”, so the second rule constrains what can ever be collected, even from a user who’s opted in.

No personally identifiable information. The framework does not record command arguments (they routinely contain paths, hostnames, the occasional secret someone’s pasted in). It does not record file contents. It does not record IP addresses.

It does need some notion of “distinct installations” for the numbers to mean anything, so it derives a machine ID from a handful of system signals and runs it through SHA-256. What leaves the machine is a hash. It tells you “this is the same install as last week” and tells you precisely nothing about whose install it is, and the hash can’t be walked backwards into the signals it came from.

The events themselves are deliberately thin. Which command ran, roughly how long it took, whether it errored. The shape of usage, not a transcript of it.

Rule three: the author picks the destination

Even with consent given and PII excluded, there’s a third question: where does the data actually go? go-tool-base doesn’t answer that for you, because it can’t. A corporate internal tool, an open-source CLI and an air-gapped utility have completely different right answers.

So the backend is the tool author’s choice. The framework ships several (a noop backend, stdout, a file, plain HTTP, and OpenTelemetry over OTLP) and supports custom ones. The noop backend matters more than it looks: it lets a tool wire up the whole telemetry surface, commands and all, while sending data precisely nowhere. A perfectly reasonable, fully supported configuration.

Pluggable backends also mean the data never has to touch any infrastructure I run. It goes where the tool’s author decides, on their terms. The framework provides the plumbing and stays well out of the destination.

And a way back out

One last thing, because it’s the part that makes the opt-in real rather than decorative. A user who opted in can opt straight back out, and the package includes a GDPR-aligned deletion path, so “stop, and remove what you have” is an actual supported request rather than a polite fiction.

Consent you can’t withdraw isn’t consent. It’s a one-way door with a friendly sign on it. The deletion path is what keeps the front door an actual door.

The bottom line

Telemetry is genuinely useful to a maintainer and genuinely dangerous to the trust of the people running the tool, and the usual implementation (on by default, opt-out buried, collecting everything) spends that trust recklessly. go-tool-base’s telemetry holds three lines: never enabled without an explicit user action, never collecting personally identifiable information even once enabled, and always sending data to a destination the tool’s author chose, up to and including nowhere. A real deletion path makes the opt-in something you can take back.

You can have your usage numbers. You just have to ask for them, the way you would for anything else that wasn’t yours to begin with.

Letting the AI call your Go functions

Sun, 29 Mar 2026 00:00:00 +0000

An AI that can only produce text can describe your system. An AI that can call your Go functions can actually operate it. That gap, between describing and doing, is the difference between a chatbot and something genuinely useful, and crossing it comes down to one fiddly mechanism: tool-calling, and the loop that drives it.

Talking about the system versus operating it

Wire an AI provider into a CLI command and you get something that can talk. Ask it a question, get a paragraph back. Useful, up to a point.

But notice the ceiling. An AI that can only generate text can describe things. It can tell you what it would do. What it can’t do is look at the actual current state of your system, or take a real action, because it has no hands. It’s reasoning in a vacuum about a world it can’t reach out and touch.

The thing that gives it hands is tool-calling. You hand the AI a set of functions it’s allowed to call. Now, mid-conversation, it can decide it needs to read that file before it can answer, or run that query, or check that status, and actually go and do it, and then reason about the real result. The AI stops describing your system and starts operating it.

The loop is the hard part

Tool-calling has a shape, and the shape is a loop. The literature calls it ReAct: Reason, Act, Observe.

The AI reasons about the prompt and decides whether it needs a tool.
If it does, it acts, asking for a specific tool with specific arguments.
Your code runs the tool and feeds the result back. The AI observes that result.
Round again. Reason about the new information, maybe call another tool, maybe several. Keep going until the AI has what it needs and produces a final text answer with no more tool calls.

Conceptually simple. Tedious and error-prone to implement by hand every single time: parsing the model’s tool-call requests, dispatching to the right function, marshalling arguments in and results out, feeding observations back in the exact format the provider expects, knowing when to stop, and not looping forever if the model gets itself stuck.

That orchestration is pure plumbing, and it’s identical for every tool and every command. So you can probably guess what’s coming: go-tool-base’s chat package owns it. You don’t write the loop. You write the tools.

Defining a tool

A chat.Tool is four things: a name, a description, a parameter schema, and a handler. The description is what the AI reads to decide whether to use the tool, so it’s worth writing well. The schema describes the arguments, and you don’t hand-write it. You write a tagged Go struct and let it generate:

type ReadFileParams struct {
 Path string `json:"path" jsonschema_description:"Relative path to the file"`
}

The struct is the contract. The framework derives the JSON Schema the AI is given straight from those tags, so the schema and the Go type the handler receives can’t drift apart, because they share a single source. The handler is then just an ordinary Go function that takes those parameters and returns a result.

You register your tools with SetTools, call Chat, and that’s the whole of your involvement. The framework runs the ReAct loop and Chat returns the AI’s final text answer once the loop settles.

Two details that show it was built for real use

A couple of decisions in the loop tell you it’s meant for production, not a demo.

Tool errors don’t abort the conversation. When a handler returns an error, the framework doesn’t crash the loop. It hands the error back to the AI as a string, as just another observation. That’s deliberate, and it’s right. A real agent should be able to call a tool, watch it fail, and react: try different arguments, take a different route, or tell the user it couldn’t manage it. A loop that aborted on the first tool error would be far more brittle than the model driving it.

The loop is bounded. There’s a MaxSteps limit, default 20. An AI that gets confused could otherwise call tools forever, and a CLI command that never returns is a worse failure than a wrong answer. The cap guarantees the command terminates. The agent gets room to genuinely work a problem across many steps, but not infinite room to flail about in.

There’s also parallel tool execution: when the model asks for several tools in a single step (three independent file reads, say) the framework runs them concurrently rather than one after another, because there’s no reason to make the AI sit and wait out a sequence of things that don’t depend on each other.

Boiling it down

A text-only AI can describe your system; an AI that can call your functions can operate it. Bridging that gap means tool-calling, and tool-calling means the ReAct loop (reason, act, observe, repeat) whose orchestration is fiddly, identical every time, and not a problem worth solving twice.

go-tool-base’s chat package runs the loop for you. You define chat.Tool values (name, description, a tagged parameter struct that generates its own schema, a handler), call SetTools and Chat, and get the final answer. Tool errors go back to the AI as observations so it can recover, and a MaxSteps cap guarantees the command always terminates. You write Go functions. The framework turns them into things an agent can reach for.

Nobody reads the manual

Sun, 29 Mar 2026 00:00:00 +0000

Let me describe the actual lifecycle of a user meeting your CLI tool, because it’s a bit humbling. They run it. It doesn’t quite do what they expected. They run it again with --help. They get a wall of monospaced flag descriptions, skim it, don’t find the thing they wanted, and either give up or go and ask a human who already knows.

Your documentation might be magnificent. It doesn’t matter, because the user never reached it.

The manual loses on location, not quality

That’s the lifecycle, and notice exactly where it breaks. The documentation might be excellent. It might answer their precise question in full. It doesn’t matter, because it’s on a website, in another window, behind a search box, and the user is here, in the terminal, mid-task. The docs lost not on quality but on location. They simply weren’t where the work was.

go-tool-base’s answer starts with a decision about location: the documentation gets embedded into the binary itself. Your docs/ folder ships inside the tool, the same way its default config does. Wherever the tool is installed, the docs are right there alongside it, no network, no browser. That embedding is what makes everything else possible, and there are two things built on top of it.

A browser, in the terminal

The first is the docs command, and it’s not --help with extra steps. It launches a proper Terminal User Interface, built on Bubble Tea.

It has a sidebar, structured from the project’s own zensical.toml or mkdocs.yml, so the docs are a navigable tree rather than one flat scroll. Markdown renders with real formatting through Glamour (colour, tables, lists, headings) instead of collapsing into monospaced soup. There’s live search across every page, regex included.

Compared with man and --help, the difference isn’t a nicer coat of paint. man gives you linear scrolling and grep; this gives you a structured tree, rich rendering and real search. It’s the documentation experience a modern developer expects, except it followed the tool into the terminal instead of demanding the user leave it.

A documentation assistant that won’t make things up

The second thing built on the embedded docs is the one I find genuinely transformative: docs ask.

The user doesn’t navigate anything. They just ask:

mytool docs ask "how do I point this at a self-hosted server?"

and get a direct, specific answer. Under the hood, the framework collates the tool’s embedded markdown and hands it to the configured AI provider (Claude, OpenAI, Gemini, Claude Local, any OpenAI-compatible endpoint) as the context for the question.

Now, “an AI answers questions about my tool” should immediately make you nervous, and the correct thing to be nervous about is hallucination. An AI that confidently invents a flag that doesn’t exist, or describes behaviour the tool simply doesn’t have, is worse than no assistant at all, because the user trusts it.

This is where embedding the docs pays off a second time, and it’s why I keep stressing that the corpus is closed. The model is instructed to answer only from the tool’s actual documentation, and the context it’s handed is exactly that documentation and nothing else. It isn’t drawing on a vague memory of similar tools from its training data. It’s answering from this tool’s real, shipped, version-matched docs. The corpus is small, closed and authoritative, which is the combination that keeps the answers honest. “Zero hallucination by design” isn’t a slogan about the model. It’s a property of bounding what the model is allowed to look at, which is the same instinct I leaned on with the mcp command: the safety comes from the boundary you drew, not from trusting the AI to behave itself.

There’s a nice second-order effect, too. The answer is always about the version of the tool the user actually has, because the docs were embedded into that build. No mismatch between a website documenting the latest release and the slightly older binary sitting on the user’s machine.

The upshot

Documentation usually loses to --help not on quality but on location: it’s in a browser, and the user is in the terminal. go-tool-base embeds the docs into the binary and surfaces them two ways: a docs command that’s a real TUI browser with a sidebar, rich markdown and search, and docs ask, which answers natural-language questions using the embedded docs as context.

Because that context is the tool’s own closed, shipped documentation and the model is told to use nothing else, the assistant stays grounded, and it’s always describing the exact version the user is holding. The fix for unread documentation was never to write more of it. It was to put it where the work happens and let it answer back.

BDD where it earns its place, and nowhere else

Sat, 28 Mar 2026 00:00:00 +0000

I have a slightly complicated relationship with BDD. I’ve watched it turn a tangled test suite into something the whole team could read and reason about, and I’ve watched it turn a perfectly good unit test into a paragraph of ceremonial English that nobody benefits from. So when go-tool-base brought in Cucumber-style BDD, the interesting decision wasn’t adopting it. It was being ruthless about where not to.

Two tests that hurt for different reasons

Most of go-tool-base’s tests are ordinary table-driven Go tests, and they’re absolutely fine. A function, a slice of input/expected pairs, a loop. Nobody needs Gherkin to understand a parser test.

But two areas were genuinely painful, and they were painful in the same way: the test had become harder to understand than the thing it was testing.

The first was pkg/controls, the service-lifecycle package. It runs a small state machine (Unknown, Running, Stopping, Stopped) with signal handling, health monitoring, restart policies and graceful shutdown all woven through it. The integration tests for graceful shutdown had grown to over three hundred lines of imperative goroutine and channel coordination. They worked. But reviewing them was a slog, and a test you can’t review with confidence is a test you can’t trust when it fails. The behaviour being checked, “when a shutdown signal arrives mid-startup, the controller stops cleanly”, was a simple sentence buried under a heap of synchronisation scaffolding.

The second was the CLI itself. init, update, doctor are user workflows. “Given a config file with a custom value, when I run init, then the custom value survives the merge.” That’s already a Given/When/Then; it just happened to be written out as Go.

Godog, and the line I drew

Godog is the official Go implementation of Cucumber. You write .feature files in plain Gherkin and bind each step to a Go function. The shutdown scenario stops being three hundred lines of channels and becomes this:

Scenario: graceful shutdown completes within the deadline
 Given a controller with two registered services
 When a shutdown signal is received
 Then both services stop in registration order
 And the controller reports a clean shutdown

The goroutine choreography doesn’t vanish, of course. It moves into the step definitions, written once and reused. What changes is that the scenario is now readable by someone who’s never opened the file before, including someone from an ops team who’ll never write a line of Go but absolutely has opinions about how shutdown should behave.

Here’s the part I want to dwell on, because it’s the part most BDD adoptions get wrong. The first design decision written down for this work was: strategic, not universal. Use Godog only where BDD adds clarity. Keep table-driven Go tests as the baseline everywhere else.

That sounds obvious written down. It is not obvious in practice, because BDD has a gravitational pull. Once a team has feature files, there’s a powerful urge to express everything as feature files, for consistency. And that’s how you end up with Gherkin scenarios for a pure function (Given the number 2, When I double it, Then I get 4) which is pure ceremony. You’ve wrapped a one-line table test in a paragraph of English and a step-definition indirection, and made it actively worse.

The honest test for whether BDD belongs is this: is this test a narrative, or is it a matrix?

A matrix is the same logic with many input/output pairs. That’s a table-driven test, that’s most unit tests, and Gherkin actively harms them. A narrative is a sequence of steps where the ordering and the state between steps is the thing under test, and that’s where Gherkin pays for itself. Lifecycle transitions are narratives. A user running three commands in sequence is a narrative. Doubling a number is not.

go-tool-base drew that line and stuck to it. Feature files live in features/ at the project root, where a non-Go developer can find and read them. Step definitions live in test/e2e/, kept well away from the unit tests. And the unit tests stayed exactly what they were, because they were already the right tool.

Made to fit, not bolted on

A couple of smaller decisions kept the BDD layer from feeling like a foreign object.

It runs under go test. There’s no separate Cucumber runner to install or remember. A godog.TestSuite is invoked from an ordinary TestFeatures(t *testing.T), so the BDD scenarios run in the same go test ./... as everything else. CI didn’t need a new concept bolted onto it.

And the CLI end-to-end tests build the gtb binary once and reuse it across every scenario. Compiling a binary per scenario would make the suite slow enough that people would quietly start skipping it, and a test suite people skip is just decoration. Build once, test many.

Stepping back

go-tool-base brought in Godog for BDD, but the decision worth writing about is the restraint. BDD was applied to exactly two things: the service-lifecycle state machine, where a 300-line goroutine tangle became a four-line scenario anyone can review, and CLI workflows, which are Given/When/Then by their very nature. Everywhere else, table-driven Go tests remained the baseline, because wrapping a matrix test in Gherkin makes it worse, not better.

The useful rule: BDD fits a narrative, ordered steps with meaningful state in between, and fights a matrix. Adopt it as a scalpel for the narratives. Resist the pull to turn it into a religion.

An AI interface that fits on one screen

Fri, 27 Mar 2026 00:00:00 +0000

The moment you decide a CLI tool should talk to an LLM, there’s a strong gravitational pull towards reaching for LangChain, or one of its many relatives. It’s the obvious move. It’s also, for most CLI work, a bit like hiring a removals firm to carry a single box up the stairs.

Let me explain why go-tool-base went the other way, and what “the other way” actually looks like.

The instinct, and why it overshoots

When you add AI to a tool, the instinct is to reach for the big general-purpose framework. LangChain and its relatives are capable, and they exist for a real need: orchestrating complex multi-step AI applications, with retrieval pipelines, memory stores, chains of calls, whole fleets of agents.

Now look at what a CLI tool actually needs from an LLM. It needs to send a prompt and get text back. Sometimes it wants structured data back instead of prose. Sometimes it wants to let the model call a few of the tool’s own functions. That’s pretty much the whole list.

Pulling in a framework built to orchestrate retrieval and agent swarms in order to do that is a poor trade. You take on a large new vocabulary of concepts, a wide dependency surface, and a great deal of abstraction you’ll never touch, all to perform three or four operations. The framework isn’t wrong. It’s just answering a far bigger question than the one a CLI tool is asking.

What go-tool-base chose instead

go-tool-base didn’t reach for a framework. The decision is on the record in its own design notes: before a single line was written, LangChain Go, go-openai, Vercel’s AI SDK and around ten other options were evaluated, and not one of them matched what a CLI framework actually needs. So the chat package was built deliberately small.

How small? The entire core ChatClient interface is four methods:

type ChatClient interface {
 Add(prompt string) error
 Chat(ctx context.Context, prompt string) (string, error)
 Ask(question string, target any) error
 SetTools(tools []Tool) error
}

Add appends a message to the conversation. Chat sends a prompt and returns text. Ask sends a prompt and returns a typed Go struct, the model’s answer unmarshalled straight into a value you defined. SetTools hands the model a set of your own functions it’s allowed to call. That’s the whole surface. Downstream code that uses AI never holds anything larger than this, and never has to know which provider is behind it.

The package’s own documentation has a word for this: right-sized. Large enough to solve genuine provider-abstraction complexity, small enough that the full interface fits on a single screen.

“Thin” is not the same as “does little”

This is the part worth being precise about, because “four methods” can sound like “barely does anything”, and that’s the wrong read entirely.

Behind those four methods sits genuinely awkward work. Five providers (OpenAI, Claude, Gemini, a locally installed claude binary, and any OpenAI-compatible endpoint) each with a different wire API, all normalised behind the one interface. A tool-calling loop. Structured output via JSON Schema, made to behave consistently across providers that each express it differently. Error normalisation. Token chunking.

The point of a thin abstraction is not that there’s little underneath it. It’s that the interface stays small while the implementation quietly absorbs the complexity. Four methods on the surface; five provider integrations and a tool-calling loop below the waterline. The thinness is a property of what the caller sees, not of what the package does. A reach-for-LangChain decision gets that backwards: it exposes the caller to all the machinery, whether or not the caller will ever need it.

The core stays small even as features grow

There’s a neat detail in how chat keeps the interface from creeping. The package also supports streaming responses and conversation persistence, both of which are real features with real surface area. Neither of them is in the four-method core.

Instead they’re separate, optional interfaces. A streaming-capable client also satisfies StreamingChatClient; a persistable one also satisfies PersistentChatClient. Code that wants those capabilities does a type assertion to ask for them, and code that doesn’t simply never sees them. So the common path stays four methods forever. New capabilities arrive as opt-in interfaces alongside the core, not as new methods bolted onto it. The thing that fits on one screen keeps fitting on one screen.

Extensible without forking, testable without a network

Two more properties keep the package small without making it limiting.

It’s extensible. The provider list isn’t closed. A RegisterProvider call lets any package contribute a new provider, and chat.New will route to it. You add a backend without forking pkg/chat or sending a patch upstream.

And it’s testable. The package ships generated mocks. A downstream tool’s AI features can be tested against a mock ChatClient returning canned responses, with no network, no API key, and no flakiness. Because the interface is four methods, that mock is trivial to set up and complete by construction. A sprawling framework interface is a sprawling thing to fake; a four-method one is not. (I’ll come back to testing AI code properly in a later post, because it deserves a whole article of its own.)

The right size

When a CLI tool needs AI, the instinct is a large framework like LangChain. For orchestrating retrieval pipelines and agent swarms, that’s exactly the right tool. For sending a prompt, getting a struct back, and letting the model call a few functions, it’s enormous overkill.

go-tool-base’s chat package is the deliberate alternative, chosen only after LangChain Go and a dozen others were weighed up and rejected. Its core ChatClient interface is four methods. Underneath sit five normalised providers, a tool-calling loop, structured output and error handling, but the caller sees four methods and never learns which provider is active. Streaming and persistence are opt-in interfaces beside the core, not additions to it. It extends without forking and tests without a network. Right-sized: the complexity is real, but it lives under the interface rather than in it.

Half your users don't have eyes

Wed, 25 Mar 2026 00:00:00 +0000

Run a command in your favourite CLI tool and look at what comes back. Colour. Neatly aligned columns. A friendly little summary sentence. Lovely… if you happen to be a human with eyes.

But a good half of any tool’s users aren’t people at all. They’re scripts, CI pipelines, bits of automation. And that pretty output you’re so proud of is, to them, actively hostile.

Your tool has two audiences and only serves one

I made more or less this same point about AI assistants when I argued that your CLI is already an AI tool. The machines are users too. Here it isn’t an AI doing the calling, it’s a humble shell script, but the principle is identical.

Run a CLI command and look at what comes back. Colour. Aligned columns. A friendly summary sentence. It’s designed for a person reading a terminal, and for a person reading a terminal it’s great.

Now picture the other half of your users. A deploy script that needs to know which version is installed. A CI job that runs doctor and wants to fail the build on one specific check. A bit of automation gluing your tool to three others. None of them have eyes. They have parsers.

So what do they do with your beautiful human output? They butcher it. They grep for a keyword, awk out the third field, sed off a prefix. It works in the demo. Then someone rewords a status line, or adds a column, or the colour codes shift, and every script downstream breaks at once. Silently, too, because a broken grep returns nothing rather than an error. You changed a sentence and quietly took out somebody’s pipeline without ever knowing.

The human-readable output was never the contract. It just got used as one, because it was the only output there was.

Give the machines their own channel

The fix is not to make the human output more parseable. That’s a trap. You’d be constraining prose meant for people in order to satisfy programs, and end up serving neither of them well. The fix is to give programs their own output format, declared and stable, kept well away from the prose.

So every command built with go-tool-base gets a --output flag. Leave it alone and you get the friendly human rendering. Pass --output json and you get something a parser can actually rely on.

And not just some JSON. JSON with a fixed shape.

One envelope, every command

The temptation with JSON output is to let each command emit whatever structure happens to suit it. Don’t. A consumer scripting against five of your commands then has to learn five shapes, and “where’s the actual payload?” has a different answer every single time.

go-tool-base wraps every command’s JSON in one standard Response envelope:

{
 "status": "success",
 "command": "deploy",
 "data": {
 "environment": "production",
 "version": "1.4.0",
 "replicas": 3
 }
}

status says how it went. command says what produced it. data holds the command-specific payload, and only the payload. Every built-in command (version, doctor, update, init) emits exactly this shape. So does every command you write, because pkg/output hands you the envelope rather than letting you freelance:

format, _ := cmd.Flags().GetString("output")
w := output.NewWriter(os.Stdout, output.Format(format))

return w.Write(output.Response{
 Status: output.StatusSuccess,
 Command: "deploy",
 Data: result,
})

The consumer-side payoff is the whole point. A script can check .status without ever touching .data. It can pull .data.version and know the field is there because it’s typed, not scraped. It learns the envelope once, and every command in your tool, and every tool built on the framework, honours it. The contract is explicit, versioned, and the same everywhere, which is precisely what the abused human output never was.

The human output gets to relax

There’s a quiet second benefit, and it’s my favourite kind: the sort you get for free. Once programs have their own reliable channel, the human output is freed. It no longer has to stay accidentally parseable. You can reword a status line, add colour, restructure a table, make it genuinely nicer to read, and not break a single script, because no script is reading it any more. They’re all over on --output json, where the real contract lives.

Two audiences, two formats, each one actually suited to its reader. That’s the deal a CLI tool ought to be offering, and most of them don’t.

In short

A CLI tool that only emits human-readable output is only half-built, because half its users are programs that end up grep-ing prose and shattering the moment that prose changes. go-tool-base gives every command a --output json flag and one standard Response envelope (status, command, data) used identically by every built-in command and by anything you write through pkg/output. Machines get a stable, explicit, learn-it-once contract; humans get output that’s now free to be properly readable, because nothing fragile depends on its wording any more.

If your tool will ever be called by another program (and it will), give that program a front door. Don’t make it climb in through the window.

Lifecycle management for when your CLI grows up into a service

Tue, 24 Mar 2026 00:00:00 +0000

There’s a moment in the life of a lot of CLI tools where they stop being a CLI tool. Nobody quite decides it. It just happens. Someone needs the thing to also expose a little HTTP endpoint, or poll a queue, or run a scheduler, so it grows a serve command… and the honest command-line utility you wrote is suddenly a long-running service wearing a CLI as a hat.

And a service needs a whole pile of production plumbing that a one-shot command never did.

The command that stops being a command

go-tool-base is CLI-first. It is not CLI-only, and the reason is a pattern I’ve watched play out more times than I can count.

A tool starts its life as an honest command-line utility. It runs, it does its thing, it exits. Then someone needs it to expose a small HTTP endpoint. Or poll a queue. Or run a scheduler. So it grows a serve command, or a run command, and the moment it does, the thing that was a CLI tool is now a long-running service that happens to have a CLI bolted on the front.

And a long-running service needs a whole category of plumbing a one-shot command never did. It has to start things up in a sensible order. It has to shut them down gracefully when someone sends a SIGTERM, finishing in-flight work rather than dropping it on the floor. It has to tell an orchestrator whether it’s alive, and whether it’s ready. It has to do something sensible when one of its internal services quietly falls over at 3am.

Hand-rolled, that’s a few hundred lines of goroutine choreography, channel-wrangling and signal handling that every such tool reinvents, slightly differently and slightly wrong each time. It’s the first-afternoon problem all over again, just turning up later in the project’s life. So go-tool-base ships it: pkg/controls.

A controller and the things it controls

The model is small. A Controller manages any number of services, each of which satisfies a Controllable interface, which at heart is just a StartFunc and a StopFunc. An HTTP server, a background worker, a scheduler, anything with a “begin” and an “end”.

You register your services with the controller and it owns their collective lifecycle. They share a common set of channels (errors, OS signals, health, control messages) so the whole set can react together. A SIGTERM doesn’t get caught by one service off in a corner; it reaches the controller, and the controller takes everything down in order, each StopFunc handed a context with a deadline so that one sulking service can’t wedge the whole shutdown forever.

That ordering and timeout handling is the bit nobody enjoys writing and everybody needs. Centralising it means a tool that adds a second service later inherits correct coordinated shutdown for free, rather than discovering on its first production SIGTERM that it only half shuts down.

Probes, because something is usually watching

If the service ends up in Kubernetes (and a lot of them do) the orchestrator wants to ask two different questions, and they really are different questions.

Liveness: are you alive, or are you wedged and in need of a kill? Readiness: are you alive and able to take traffic right now? A service can quite easily be live but not ready… still warming a cache, still waiting on a dependency. Conflate the two and you get yourself killed during a slow startup, or sent traffic before you can actually serve it.

controls keeps them separate. You attach a WithLiveness probe and a WithReadiness probe to a service, each just a function returning a health report, and the controller exposes them. The tool answers Kubernetes honestly, in Kubernetes’ own terms, without you hand-wiring two more HTTP handlers.

Self-healing, but only if you ask

The last piece is what happens when a service fails. A worker’s StartFunc returns an error. Health checks start failing. In a hand-rolled setup this is where you either crash the whole process or write yourself a bespoke restart loop.

controls has a supervisor that can restart a failed service for you, and the important word in that sentence is can. It’s off by default. A service is only supervised if you hand it a RestartPolicy at registration:

controls.WithRestartPolicy(controls.RestartPolicy{
 MaxRestarts: 5,
 InitialBackoff: time.Second,
 MaxBackoff: 30 * time.Second,
 HealthFailureThreshold: 3,
})

With a policy in place, the controller restarts the service if its StartFunc errors out, or if it racks up more consecutive health-check failures than the threshold allows. Restarts back off exponentially, from InitialBackoff up to a MaxBackoff ceiling, so a service that’s failing because its database is down doesn’t sit there hammering that database flat with a tight restart loop. MaxRestarts caps the attempts, because a service that’s failed five times in a row is not going to be rescued by a sixth go, and at that point honest failure beats a thrashing pretence of health.

Opt-in matters here. Automatic restarts are exactly right for a resilient daemon and exactly wrong for a tool where a failure should stop the line and get a human’s attention. The framework doesn’t make that call for you. It gives you the supervisor and lets you point it at the services that genuinely want it.

The bottom line

A surprising number of CLI tools become long-running services the day they grow a serve command, and the day they do, they need coordinated startup, graceful ordered shutdown, real liveness and readiness probes, and a considered answer to a service falling over. That’s a few hundred lines of fiddly, easy-to-get-wrong plumbing.

pkg/controls provides it: a Controller over Controllable services with shared channels and deadline-bounded graceful shutdown, separate Kubernetes-style liveness and readiness probes, and an opt-in supervisor that restarts failed services with exponential backoff and a restart ceiling. Your tool can start as a command and grow into a daemon without that growth turning into a rewrite.

CLI-first, but not stuck there.

Middleware for CLI commands, not just web servers

Tue, 24 Mar 2026 00:00:00 +0000

Every CLI tool past a certain size grows a category of logic that doesn’t really belong to any one command, and yet has to happen for loads of them. Timing. An auth check. Panic recovery, so a crash becomes a clean error instead of a stack-trace all over someone’s terminal. A log line saying the command started and how it finished.

Web frameworks sorted this out years ago. CLIs, for some reason, mostly still copy-paste it around.

The logic that belongs to no single command

That category of logic doesn’t belong to any one command, yet needs to happen for many of them. Time how long the command took. Check the user is authenticated before a command that needs it. Recover from a panic so a crash becomes a clean error rather than a stack-trace vomited across the screen. Log that the command started and how it ended.

None of that is the command’s job. The deploy command’s job is to deploy. But timing and recovery and auth still have to happen around it, and around build, and around sync.

Put that logic inside each command’s RunE and you’ve copied the same six lines into thirty functions, which means thirty places to fix when the logging format changes and thirty chances to forget one of them. Cross-cutting concerns copied by hand don’t stay consistent. They drift, every time.

Web frameworks already solved this

This is not a new problem. It’s about the oldest problem in web frameworks, and they settled on an answer a long time ago: middleware. Gin has it, Echo has it, every HTTP stack you’ve ever touched has it. A middleware is a wrapper that sits around a handler, runs its cross-cutting logic, and calls through to the handler in the middle.

A CLI command is, structurally, just a handler too. So go-tool-base brings the same pattern to the Cobra command tree, with the same functional Chain shape:

type Middleware func(
 next func(cmd *cobra.Command, args []string) error,
) func(cmd *cobra.Command, args []string) error

A middleware receives the next handler in the chain and returns a new handler that wraps it. You compose a stack of them, and each command’s real RunE runs in the middle of the onion. Write the timing logic once, as one middleware, and every command in the chain is timed. Change the log format once and all thirty commands change with it, because there was only ever one copy. (The “write it once, in a place where everyone inherits it” drum again, which I will keep banging until the series runs out.)

“But Cobra already has PreRun”

It does, and this is the objection worth answering properly, because Cobra ships PersistentPreRun and PreRun hooks and they look, at a glance, like they cover this.

They don’t, and the reason is structural. A PreRun hook is a thing that happens before the command. That’s all it is. It can’t run anything after. It can’t wrap the command in a defer. It can’t catch a panic the command throws. It can’t measure how long the command took, because measuring a duration needs a start point and an end point, and the hook only owns the start.

A middleware wraps the entire execution. Because it’s a function that calls next() in its own body, it straddles the command:

func TimingMiddleware(next HandlerFunc) HandlerFunc {
 return func(cmd *cobra.Command, args []string) error {
 start := time.Now()
 err := next(cmd, args) // the command runs here
 log.Debug("command finished", "took", time.Since(start))
 return err
 }
}

Before, after, and around. A recovery middleware can put a defer recover() in place that a PreRun hook structurally cannot. An auth middleware can check a condition and return an error instead of calling next() at all, refusing to let the command run in the first place. PreRun can’t veto the command; it runs, and then the command runs regardless.

PreRun is a notification that the command is about to happen. Middleware is control over whether and how it happens. For genuine cross-cutting concerns you need the second thing, not the first.

To sum up

Timing, auth, recovery and logging are cross-cutting concerns: necessary for many commands, owned by none. Hand-copied into every RunE, they drift out of sync. Web frameworks fixed this with middleware years ago, and a CLI command is structurally just another handler.

go-tool-base brings the functional Chain middleware pattern to the Cobra command tree. A middleware wraps a command’s whole execution, so it acts before and after and can decide whether the command runs at all… strictly more than Cobra’s PreRun hooks, which only fire beforehand and can’t wrap, recover, time, or veto. Write the concern once, wrap the chain, and every command inherits it consistently.

A logging interface that doesn't leak its backend

Mon, 23 Mar 2026 00:00:00 +0000

The same tool, in two different lives, wants two completely different kinds of log.

On my laptop I want logs I can actually read: colour, alignment, friendly timestamps. The very same tool running as a daemon in a container wants none of that. It wants structured JSON, one object a line, ready for a log aggregator to swallow. And in a test I want the logger to shut up entirely. The interesting question is what it costs you to move between the three.

The same tool wants different logs

On a developer’s machine the tool is a CLI. You want logs that are pleasant to read in a terminal: colour, alignment, human-friendly timestamps. The charmbracelet logger does that beautifully.

Then the very same tool grows a serve command and gets deployed as a daemon in a container. Now coloured terminal output is worse than useless. The log aggregator wants structured JSON, one object per line, machine-parseable. slog does that.

And in tests you want neither. You want the logger to exist, satisfy the interface, and stay completely silent.

That’s three different logging backends, wanted by one tool across three different lives. So what does switching between them actually cost?

What it costs depends on what your packages imported

If your packages import a concrete logger, if pkg/config and pkg/setup and twenty others each have import "github.com/charmbracelet/log" and take a *log.Logger, then the backend is welded into the entire codebase. Switching to JSON for the container build means editing the import and the parameter type in every single one of those packages. The backend has leaked. A detail that should have been one decision has become a property of a hundred files.

go-tool-base doesn’t let it leak. Every package in the framework accepts a logger.Logger, an interface, and nothing else. No package anywhere imports a concrete logging library. A package states, in its types, “I need something I can log through”, and stops right there. It has no idea, and no way to find out, what’s actually on the other end.

// what every package depends on
type Logger interface {
 Debug(msg string, args ...any)
 Info(msg string, args ...any)
 Warn(msg string, args ...any)
 Error(msg string, args ...any)
 // ...
}

The backend gets chosen once, at the top, when the tool builds its Props. It travels down to every package as the interface, through the Props container. The packages underneath never see the concrete type, so the concrete type can change without a single one of them noticing. (There’s that “decide it once, in one place” theme again. I did warn you it runs through everything.)

Three backends, and the swap is one line

go-tool-base ships three implementations of that interface:

charmbracelet (logger.NewCharm(w, opts...)). Coloured, styled, for humans at a terminal. The CLI default.
slog JSON, a slog-backed backend emitting structured JSON, for daemons and containers feeding a log aggregator.
noop, which does precisely nothing, for tests that want a real Logger and total silence.

Switching the tool from a friendly CLI logger to container-ready JSON is a change to the one line in main() that constructs the logger. That’s the lot. pkg/config doesn’t change. pkg/setup doesn’t change. None of the twenty packages change, because none of them ever knew which backend they had. The decision was always one line; the interface is what kept it one line.

The noop backend deserves its own mention, because it’s the one people underrate. A test for a command shouldn’t be spraying log output all over the test run, but the command still needs a non-nil Logger to function. logger.NewNoop() gives you exactly that: interface satisfied, output binned, test quiet. And because it’s just another implementation of the same interface, no test needs any special logging machinery. It passes a different backend, exactly the way the container build does.

The general shape

There’s nothing exotic going on here. It’s “depend on interfaces, not implementations”, which every Go developer has had drilled into them at some point. The bit worth holding onto is where the rule actually pays out, and it’s at the seams between a stable core and a detail you know full well you’ll want to vary.

A logging backend is exactly such a detail. You will want it different in a terminal, in a container, and in a test. So the thing your code depends on has to be the interface, and the concrete backend has to be chosen at one well-known point and nowhere else. Get that boundary right and “we need JSON logs in production” is a one-line change. Get it wrong and it’s a refactor and a bad afternoon.

What it comes down to

One tool legitimately wants three different logging backends across its life: coloured output in a terminal, structured JSON in a container, silence in a test. The cost of moving between them is decided entirely by whether your packages imported a concrete logger or an interface.

go-tool-base’s packages depend only on logger.Logger, never a backend. Three implementations ship (charmbracelet, slog JSON, noop) and the backend is chosen once, in main(), then carried everywhere as the interface through Props. Switching is one line at the top, because the detail was never allowed to leak into the hundred files below it.

Errors that tell the user what to do next

Sun, 22 Mar 2026 00:00:00 +0000

Here’s an error message I’ve been on the receiving end of more times than I’d care to count:

error: failed to read config file

True. Also completely useless! I now know something is broken and I haven’t the faintest idea what to do about it. Which file? Why couldn’t it be read? Should I create it, run some init command, fix a permission, set an environment variable? The message states the problem and then abandons me at it, rather like a sat-nav cheerfully announcing “you have arrived” in the middle of a motorway.

A message is not a fix

The instinct, the moment you notice this, is to go and write a better message:

error: failed to read config file at ~/.config/mytool/config.yaml.
Run 'mytool init' to create one, or set MYTOOL_CONFIG to point at an existing file.

Better for the human, no question. But look at what you’ve just done to the error as a value. The recovery advice is now welded into the error string. Any code that wants to ask “is this the config-missing error?” is reduced to substring-matching English prose. Reword the advice and you break the check. So you’ve helped the user and quietly sabotaged the program at the same time, because you’ve made one poor little string do two completely incompatible jobs… being a stable identity for code, and being friendly guidance for people.

Why I changed error libraries

go-tool-base started out on github.com/go-errors/errors. It’s a perfectly fine library and it gave us stack traces. What it didn’t give us was any way to attach human guidance to an error without shoving it into the message string. So the codebase did exactly the daft thing I just described: multi-line suggestion text baked straight into errors.Errorf calls, user-facing content and programmatic identity all mashed into one value.

That’s the whole reason for the migration to github.com/cockroachdb/errors. Not novelty, and not because I fancied a weekend of find-and-replace. One specific capability: cockroachdb/errors lets you attach a hint to an error as a separate, structured field.

return errors.WithHint(
 errors.New("failed to read config file"),
 "Run 'mytool init' to create one, or set MYTOOL_CONFIG to point at an existing file.",
)

Now there are two things, cleanly apart. errors.New("failed to read config file") is the identity… stable, matchable, the program’s handle on the error. The hint is the guidance… for the human, and rewordable as much as you like without breaking a single check, because no check ever looks at it. errors.Is and errors.As work properly through every wrapper layer, so code matches on identity and never has to read prose.

The migration brought a few other things worth having. Stack traces print with a plain %+v instead of a type assertion. Errors can carry structured, machine-readable metadata. Multiple errors from concurrent work can be combined as a first-class value. But the hint is the one that actually changed the user’s day, because the hint is the recovery step, stored where it belongs.

One door out, and it knows where the help is

Separating the hint is only half of it. The other half is making sure those hints actually reach the user, every time, and that comes down to having a single way out.

Every go-tool-base command returns its errors the idiomatic Cobra way, through RunE. They all funnel into one Execute() wrapper at the root, which routes every error (runtime failure, flag parse error, pre-run failure) through one ErrorHandler. One door out. So error presentation gets decided in exactly one place, and no command can render an error differently from the command sat next to it.

And because there’s one handler, it can pull off something the individual commands never could. The framework knows your tool’s metadata, including its configured support channel, be it a Slack workspace or a Teams channel. So the error handler can finish a fatal error not just with the what and the recovery hint, but with where to go if the hint didn’t help:

error: failed to read config file
hint: Run 'mytool init' to create one, or set MYTOOL_CONFIG.
 Still stuck? Ask in #mytool-support on Slack.

The user is never left at a dead end. The error tells them what broke, the hint tells them the most likely fix, and if that’s still not enough the handler tells them which door to go and knock on. A failure becomes a signpost instead of a full stop.

The short version

An error that only reports what went wrong leaves the user stranded, and the obvious fix (writing the recovery advice into the message) quietly wrecks the error as a value, because now your code has to substring-match prose just to work out what it’s looking at.

go-tool-base moved from go-errors to cockroachdb/errors to get hints: a structured, separate field for human guidance that leaves the error’s identity clean for errors.Is and errors.As. Every command’s errors leave through one Execute() wrapper and one ErrorHandler, so presentation stays consistent, and because that handler knows the tool’s support channel it can point a stuck user at real help.

State the problem for the program. Give the fix to the human. And for pity’s sake, keep the two in different fields.

Many embedded filesystems, one merged view

Sat, 21 Mar 2026 00:00:00 +0000

Go’s embed package is one of those features that makes you slightly giddy the first time you use it. One //go:embed directive and your default config, your templates, your docs are all baked into the binary. The tool just works the moment it’s installed, with nothing external to lose or forget to ship.

And then you go and build something modular on top of it, and you discover the catch nobody warned you about.

`embed.FS` is an island

An embed.FS has a property that’s easy to miss until it bites: it’s local to the package that declared it. The //go:embed directive can only see files at or below its own source file. So in any project bigger than a toy, you don’t have an embedded filesystem. You have many. The root package embeds one. Each feature, each subcommand that ships its own templates or defaults, embeds another. They’re islands, one per package, and Go gives you no native way to make them behave as a whole.

For most files that’s perfectly fine. A feature’s templates can stay on the feature’s island; nothing else needs them.

It stops being fine the moment features need to contribute to something shared.

The shared-config problem

Here’s the case that forces the issue. A go-tool-base tool has a global config.yaml of defaults, embedded at the root. Now you add a feature, and that feature has its own configuration keys, with their own sensible defaults.

Where do those defaults go?

The naive answer is: edit the root config.yaml and add the feature’s section. And that’s a genuinely bad answer, because it inverts the dependency. The root config now has to know about every feature. Add a feature, edit the centre. Remove one, edit the centre again. The central file becomes a pinch point that every feature has to reach into, and a modular architecture where you can’t add a module without editing the core isn’t really modular at all… it just has more files.

What you actually want is for the feature to ship its own slice of default config, on its own island, and for the global config the tool reads to somehow already contain it. The feature contributes; the centre doesn’t budge.

`props.Assets`: merge the islands

That’s the job of props.Assets. (Yes, it lives on Props, the load-bearing container I keep going on about. Most of the good stuff does.) It’s a layer that implements the standard fs.FS interface, and into it you Register each embed.FS under a name:

// root main.go
Assets: props.NewAssets(props.AssetMap{"root": &assets}),

// a feature's command constructor
//go:embed assets/*
var assets embed.FS

func NewCmdFeature(p *props.Props) *cobra.Command {
 p.Assets.Register("feature", &assets)
 // ...
}

Now Props carries one Assets value that represents all the islands as a single filesystem. The root’s files and every registered feature’s files, addressable through one fs.FS. Each registration is named, so the islands stay individually identifiable, but they read as one.

That alone solves the addressing problem. The genuinely clever part is what happens for structured files.

Opening a file that exists in several places

When you Open a path through props.Assets and that path has a structured extension (.yaml, .yml, .json, .csv) it doesn’t simply return the first match it stumbles across. It does this:

Discovery. It finds every instance of that path, across every registered filesystem.
Parsing. It unmarshals each one.
Merging. It deep-merges the parsed data, using mergo.
Re-serialisation. It hands you back a single fs.File whose contents are the combined, merged result.

So picture the shared-config problem again, only solved this time. The root ships a config.yaml with the base defaults. Each feature ships a config.yaml on its own island carrying only its own keys. Nobody edits anybody else’s file. When the init command opens config.yaml through props.Assets, it doesn’t get the root’s copy. It gets the deep-merge of the root’s copy and every registered feature’s copy: one config.yaml that contains every default in the tool, assembled at runtime from contributions that never knew about each other.

A feature contributes its defaults simply by existing and registering. The centre never changes. That’s the modular property the naive approach couldn’t give you, and it generalises well beyond config… the same merge applies to a shared commands.csv, or any structured file features want to add rows or keys to.

There’s also a Mount method for attaching an arbitrary fs.FS at a virtual path, which is handy for surfacing something external (a temp directory, say) as part of the same tree. But the structured merge is the feature that really earns Assets its place.

Boiling it down

embed.FS is per-package by design, so a modular CLI ends up with many embedded filesystems, one island per feature. Most of the time that’s fine. It fails specifically when features need to contribute to a shared resource like the global config.yaml, because the naive fix forces every feature to reach in and edit a central file.

props.Assets merges all the registered islands into a single fs.FS, and for structured files it goes further: opening a .yaml, .json or .csv discovers every copy across every island, deep-merges them, and returns the combined whole. A feature drops its own defaults onto its own island, registers, and the merged config the tool reads already includes them. Contribution without coupling, which is rather the whole point of being modular in the first place.

Props: the container that does the heavy lifting

Sat, 21 Mar 2026 00:00:00 +0000

I name-dropped Props back in the introduction and then rather glossed over it, which was a bit unfair of me, because it’s the single most important design decision in the whole framework. So let’s give it the attention it actually deserves.

And the best place to start, oddly enough, is the name.

Start with the name

The container at the centre of go-tool-base is called Props, and the name is doing real work, so we’ll start there.

It is not short for “properties”, though it does hold a few. A prop is the heavy timber or steel beam that stops a structure quietly collapsing in on itself. And for anyone who follows the rugby: a prop is the position in the scrum, the broad-shouldered forward whose entire job is to provide structural support so everyone else can get on with the game.

That’s the design brief, in a single word. Props is not where the clever, flashy work happens. It scores no tries. It’s the unglamorous, load-bearing thing that holds the framework up so that your actual command logic gets to be the interesting part. Understand the name and you understand what the struct is for.

What it carries

Props is the single object passed to every command constructor in a go-tool-base tool. It holds the dependencies a command might need:

Tool, metadata about the CLI (name, summary, release source).
Logger, the logging abstraction.
Config, the loaded configuration container.
FS, a filesystem abstraction (afero), so a command never touches the real disk directly.
Assets, the embedded-resource manager.
Version, build information.
ErrorHandler, the centralised error reporter.

A command constructor’s signature is, accordingly, boring on purpose:

func NewCmdExample(p *props.Props) *cobra.Command { ... }

One parameter. Everything the command could possibly need is reachable through it. No globals, no init()-time wiring, no twelve-argument constructor that quietly grows a thirteenth argument next month.

Why a struct, and not `context.Context`

Here’s the design decision I actually want to defend, because it’s the one Go developers tend to raise an eyebrow at. Go already has a well-known way to carry things through a call tree: context.Context. So why not just put the logger and the config in the context and pass that around?

Because context.Context carries its values as interface{}, and that’s the wrong trade for dependencies.

Pull a dependency out of a context and you get this:

l := ctx.Value("logger").(logger.Logger) // a runtime type assertion

That one line has two separate ways to hurt you. The key is a bare string, so a typo compiles perfectly happily and then fails at runtime. The type assertion is unchecked, so if the wrong thing is sitting under that key, your tool panics in front of a user. Neither failure is visible to the compiler. Neither is visible to your IDE. You find out when it breaks, which is to say at the worst possible time.

Pull the same dependency out of Props and you get this:

p.Logger.Info("starting") // a field access

p.Logger is a typed field. If it doesn’t exist, or you’ve used it wrong, the code simply doesn’t compile. Your IDE autocompletes it. Refactor the Logger interface and every misuse lights up at build time. There’s no runtime type assertion, because there’s no interface{} to assert from in the first place.

context.Context is the right tool for what it was designed for: cancellation, deadlines, request-scoped signals that genuinely cross API boundaries. It’s the wrong tool for “here are my program’s services”, because it trades away the compiler’s help for a flexibility you really don’t want here. Dependencies should be declared, somewhere the compiler checks them. Props is that somewhere.

What you get back for it

That one decision pays out in three currencies.

Testability. A command is now a pure function of its Props. To test it, you build a Props with the doubles you want (an in-memory FS instead of the real disk, a no-op Logger, a config you’ve populated by hand) and call the constructor. No global state to reset between tests, no monkey-patching, no init() order to puzzle over. The dependency is an argument, so the test just passes a different one.

Consistency. Cross-cutting changes have exactly one place to happen. When the global --debug flag flips the log level, it does so on the Logger inside Props, and because every command reads its logger from the same Props, every command gets the new level. No command can drift, because none of them owns its own copy.

Extensibility. Adding a new framework-wide service is just adding a field to one struct. Every command can immediately reach it; none of them needed touching to make it reachable.

To sum up

Props is the dependency-injection container at the heart of go-tool-base: one struct, passed to every command, holding the logger, config, filesystem, assets, error handler and tool metadata. It’s a concrete struct rather than a context.Context payload entirely on purpose, because dependencies belong somewhere the compiler can check them, not behind a string key and a hopeful runtime type assertion. That single choice buys you testability, consistency and easy extension.

The name says it best, really. Props doesn’t score the tries. It’s the broad-shouldered thing in the scrum that stops the whole framework folding, so the rest of your code is free to go and play.

Design your whole CLI in one file

Fri, 20 Mar 2026 00:00:00 +0000

Here’s a question that sounds trivial and really isn’t: where, exactly, does a CLI tool’s structure live? Not the logic of each command… the structure. Which commands exist, what they’re called, which flags they take, what’s nested under what.

I’d never properly thought to ask it until go-tool-base forced me to, and the honest answer turned out to be a little bit embarrassing.

Where does a CLI’s structure actually live?

Picture a CLI tool with twenty commands, some nested under others. In a typical project, where does its structure live? The honest answer is “smeared across the codebase”. It’s in twenty cmd.go files. It’s in the AddCommand calls that stitch them together. It’s in the flag registrations. To understand the shape of the tool you have to read all of it and assemble the picture in your head, because the picture exists nowhere as a single thing you can point at.

That’s a strange state of affairs for the single most important design fact about a CLI. The command tree is the tool’s interface, it’s the thing users actually touch, and yet it hasn’t got a home.

The manifest gives it one

go-tool-base’s generator gives that structure a home: .gtb/manifest.yaml. The manifest is a single readable file describing the command tree. Every command, its name, its short description, its flags, its place in the hierarchy, whether it carries assets or an initialiser. The shape of the whole tool, in one place you can open and read top to bottom.

And the manifest isn’t documentation about the project. It’s the thing the project’s wiring is generated from. When you run regenerate project, the generator reads the manifest and rebuilds the boilerplate to match it: the command registration, the AddCommand wiring, the flag definitions. The manifest is the source of truth, and the Go wiring is its output.

Design-first, when you want it

This unlocks a way of working that the smeared-across-the-codebase approach simply can’t offer. You can design the interface first, in the manifest, and let the code follow.

Want to rename a command? Edit one line in the manifest, run regenerate, and the rename propagates through every wiring file that ever mentioned it. Want to move a subcommand under a different parent? Change its place in the manifest hierarchy and regenerate. Want to add a flag to three related commands? Add it in the manifest, in three obvious places, and regenerate, instead of going on a little hunting expedition for three flag-registration blocks scattered across the tree.

You’re editing the tool’s interface as a design, in the file whose entire job is to hold that design, and the generator does the mechanical donkey-work of making the code reflect it. The thing you change is the thing that describes the structure. The code is downstream.

If that shape sounds familiar, it should. It’s the same instinct behind spec-driven and test-driven development: write down what the thing should be before you assemble how it works, and keep that statement of intent as a first-class, living artefact rather than a comment that quietly rots in a corner. The manifest is a spec for your command tree, and regenerate is what keeps the implementation honest to it.

It doesn’t trap you

There’s an obvious worry about any generated-from-a-manifest system: am I now locked into editing the manifest? What if I just want to open a Go file and write some Go like a normal person?

You can. The generator is careful not to own everything. It owns the wiring (the registration and the structural boilerplate) and it leaves your command logic well alone. The RunE function where your command actually does its work is yours; the manifest hasn’t got an opinion about it. And the generator tracks the files it produces by content hash, so if you do hand-edit something it generated, regeneration notices and asks before overwriting rather than steamrolling you. That mechanism turned out interesting enough to get its own post.

So the manifest is an option, not a cage. Design-first via the manifest when that suits the change. Drop into Go directly when that suits it better. The two stay in sync because regeneration reconciles them, not because one of them has been forbidden.

Pulling it together

A CLI’s command tree is its most important design surface, and in most projects it has no single home… it gets reconstructed in your head from twenty scattered files every time you need to reason about it. go-tool-base gives it one: .gtb/manifest.yaml, a readable description of the whole tree that the generator rebuilds the wiring code from. Edit the manifest, run regenerate, and the boilerplate follows.

It makes CLI structure something you design in one place, in the spirit of spec-driven development, while still leaving you free to write Go directly when that’s the better tool for the job. The manifest is the spec for your interface. The generator just keeps the code faithful to it.

Scaffolding that respects your edits

Fri, 20 Mar 2026 00:00:00 +0000

When I introduced go-tool-base I made a passing promise to come back to “the generator that won’t clobber your edits”. This is me keeping it, partly because it’s the feature I’m quietly most proud of, and partly because it took the most head-scratching of anything to get right.

The problem it solves is one that every code generator runs into eventually, usually the hard way and usually at the worst possible moment.

The generator’s awkward second act

A project generator has an easy first act. gtb generate skeleton, and you’ve got a complete, wired, idiomatic Go CLI project. Everyone’s happy, me included.

The second act is the hard one. The framework moves on. A convention changes, a new built-in capability appears, the recommended CI shape shifts. Your project, scaffolded three months ago, is now subtly out of date, and you’d quite like the generator to drag it back up to spec.

Except by now it isn’t a fresh scaffold. It’s your project. You tuned the CI workflow. You rewrote the justfile. You added a stanza to the Dockerfile that took an afternoon and a fair bit of swearing to get right. The generated files and your edited files are one and the same files.

A naive generator handles this with breathtaking confidence: it regenerates everything from the template and overwrites the lot. Run it once, lose your afternoon. You learn that lesson exactly once and then never run regeneration again, which means the upkeep feature you were sold is dead on arrival. A scaffold you can’t safely re-run is just a one-shot cp with extra steps.

What the generator needs to know

The thing standing between “safe to overwrite” and “absolutely do not” is a single fact: has this file changed since the generator last wrote it?

If it hasn’t, the file is still pristine boilerplate and the generator owns it. Overwrite away. If it has, a human has been in there, and the generator must not touch it without asking first.

The generator can’t just eyeball that, of course. It needs a record. So every time gtb generate writes a file, it computes a SHA-256 of the content and stores it in the project’s manifest, .gtb/manifest.yaml, as a Hashes map of relative path to hash. The manifest is the generator’s memory of the exact bytes it last produced.

Regeneration becomes a three-way decision

With that record in hand, regeneration stops being “overwrite everything” and becomes a per-file decision with three branches.

The file doesn’t exist. Easy. Write it, store its hash.

The file exists and its current hash matches the manifest. It’s byte-for-byte what the generator last wrote, so nobody has touched it. The generator owns it outright, regenerates from the template and updates the stored hash. No prompt, no fuss. This is the common case, and it’s silent precisely because it’s safe.

The file exists and its hash does not match. Someone has been in there since generation. The generator stops and asks. It will not silently overwrite your hard-won afternoon. You decide: take the new version, or keep yours.

The detail I’m genuinely fond of is what happens when you decline. Declining is non-fatal. Generation carries on with the rest of the files, and the manifest keeps the file’s stored hash rather than dropping it. That matters more than it looks, because it means the file stays tracked. Next time you regenerate, the generator can still tell that file has been modified, and still asks. Skipping a file once doesn’t quietly evict it from the generator’s awareness forever. It stays a known, watched, customised file across every future run.

When you want it to stop asking

Per-file prompting is the right default, but for files you’ve permanently taken ownership of, being asked on every single regeneration is just noise. If you’ve rewritten the CI workflows wholesale and you are never, ever going back to the generated version, you don’t want a prompt. You want the generator to leave them well alone and not bring it up again.

That’s what .gtb/ignore is for. It sits next to the manifest and takes gitignore-style patterns:

# I own the CI workflows now
.github/workflows/**

# ...except the release workflow, keep that managed
!.github/workflows/release.yml

# and my build config
justfile
Dockerfile

Anything matching is skipped during regeneration with no prompt at all. Patterns evaluate top to bottom and later ones win, so the negation (!) behaves the way you’d expect from .gitignore: exclude a whole directory, then claw one file back.

It’s a deliberate escalation ladder. Unmodified files are handled silently. Modified files get a prompt. Files you’ve formally claimed get total silence. Each rung asks for less of your attention than the last, and you choose how far up to climb, file by file.

Stepping back

A generator earns its keep twice: once when it scaffolds your project, and then continuously, every time it drags that project back up to the framework’s current shape. The second job is worth nothing if regeneration flattens your customisations, because you’ll simply stop running it, and who could blame you.

go-tool-base’s generator gets around that by remembering. It hashes every file it writes into .gtb/manifest.yaml, and on regeneration it re-hashes before overwriting: unchanged files it owns and updates silently, changed files it stops and asks about, and .gtb/ignore lets you mark files as permanently yours. Skipped files stay tracked, so the generator never loses sight of what you’ve made your own.

The point of a scaffold isn’t the first five minutes. It’s that you can still run it in month three without holding your breath.

Your CLI is already an AI tool

Thu, 19 Mar 2026 00:00:00 +0000

“Make it work with AI” has become one of those requests that lands on a developer’s desk with a thud and not much further detail attached. My instinct, the first time, was to brace for a big lump of integration work… a bespoke adapter for this assistant, another for that one, a treadmill of little wrappers stretching off into the distance.

Turns out I’d already done most of the work. So have you, if your CLI tool is any good. Let me explain what I mean.

You already described your capabilities

Stop and think for a second about what a well-built CLI tool actually is. It’s a set of named operations, each with a human-readable description, each taking a set of typed, named, documented parameters. You wrote all of that already, because a CLI without it is unusable by people.

Now look at what an AI assistant needs in order to call a tool. A set of named operations. A description of each, so it knows when to reach for them. A typed parameter schema for each, so it knows how to call them.

It’s the same list! A good CLI is already, structurally, a description of a set of capabilities. The information an AI agent needs isn’t extra work you have to go and do. It’s work you finished the moment your --help output was any good.

The only thing missing is a translator. Something that takes “this is a CLI” and presents it as “this is a set of tools an AI can call”.

MCP is that translator, and it’s a standard

The temptation, when you want your tool to be AI-usable, is to sit down and write an integration. A little adapter for Claude Desktop. Another for Cursor. Another for whatever turns up next month. Each one a bespoke wrapper, each one a thing to maintain, and the list never stops growing because new assistants keep appearing. That’s the treadmill I was bracing for.

The Model Context Protocol exists to kill that list. MCP is an open standard for how an AI model discovers and calls local tools. Implement it once and your tool works with every assistant that speaks it. Write once, not once-per-client.

So go-tool-base implements it once, in the framework, for everyone. (That’s rather the theme of this whole series, if you hadn’t spotted it yet… do the annoying thing once, properly, in a place where every tool inherits it.)

The `mcp` command, and the mapping it does for free

Every tool built on go-tool-base inherits a built-in mcp command. Run it:

mytool mcp

and the tool starts a JSON-RPC server over standard I/O, speaking MCP. That’s the whole user-facing surface. One command.

Behind it, the framework walks your Cobra command tree and maps it straight onto MCP tool definitions:

Each command becomes a tool.
Each command’s short description becomes the tool’s description, the text the AI reads to decide whether this is the tool it wants.
Each command’s flags and arguments become the tool’s JSON Schema parameters.

There’s no second schema to write and then keep in sync (and we all know how well “keep these two things aligned by hand” tends to go). The command tree is the schema. Add a new command to your CLI and it’s a new tool for the agent, automatically, with the description and flags you already gave it. Nobody has to remember to update an MCP manifest, because there’s no separate MCP manifest to forget about.

Configuring an assistant to use it

On the assistant’s side it’s just as undramatic. You tell your AI client (Claude Desktop, Cursor, anything MCP-aware) to launch mytool mcp. From then on the assistant:

Starts your tool in MCP mode when it boots.
Discovers every command as a callable tool.
Calls the right one, with the right parameters, when a user’s request needs it.

Your CLI tool has quietly become something the AI can pick up and use, mid-conversation, on its own initiative.

The safety property worth noticing

Now, “let an AI run things on my machine” is rightly a sentence that makes people nervous. It makes me nervous, and I built the thing. So it’s worth noticing the constraint sitting quietly in this design.

The AI can only call what you defined. The tools it sees are exactly the commands in your tree, and the parameters it can pass are exactly the flags and arguments you declared, validated against the JSON Schema generated from them.

It can’t invent a command. It can’t pass a parameter you never defined. The boundary of what the agent can do is the boundary of what your CLI does, and you drew that boundary already, back when you built the tool. Exposing the CLI over MCP doesn’t widen the surface one inch. It just makes the existing surface reachable. The AI isn’t running things. It’s running your commands, the ones you wrote, tested and shipped, and nothing else.

The gist

A CLI tool, built properly, is already a structured description of a set of capabilities: named operations, descriptions, typed parameters. Which is also exactly what an AI agent needs in order to call a tool. The gap between the two is only a translator, and writing a bespoke one per assistant is a treadmill you don’t need to step onto.

go-tool-base puts the translator in the framework. Every tool gets an mcp command that serves the command tree over the Model Context Protocol… commands become tools, descriptions become descriptions, flags become JSON Schema parameters, with no second schema to maintain. Point any MCP-aware assistant at it and your CLI is an agent-callable tool, bounded to exactly the commands you shipped.

You did the hard part when you built a good CLI. MCP just opens the door you’d already framed.

go-tool-base: I got tired of reinventing the wheel

Wed, 18 Mar 2026 00:00:00 +0000

If you’ve written more than two or three command-line tools in Go, you’ll recognise the shape of the first afternoon. I certainly do! You reach for Cobra for the command tree, Viper for config, and then you start the part nobody ever puts in the README… the plumbing.

Where does config live? A file, an env var, an embedded default? In what order do they override each other? How does the tool tell the user there’s a newer version, and how does it actually update itself? What does logging look like, and is it the same logging the next tool will use? And how do you wire all of that into each command without every command reaching into a pile of globals?

None of it is hard. That’s the problem! It’s not hard, it’s just there, every single time, and every single time I’d find myself reinventing it slightly differently to the last time. Different override precedence here. A subtly different update flow there. Logging that didn’t quite match the tool I’d written three months earlier. Each new tool was a fresh re-litigation of decisions I’d already made and then promptly forgotten.

Now, I’ve banged on about the Boy Scout rule for years (leave the codebase better than you found it), but it has an uncomfortable corollary. If you keep turning up to the same campsite and finding it in the same mess, at some point the honest thing to do is to stop tidying it and go and build a better campsite.

First, just packages

So I started pulling the recurring pieces out into their own packages. Nothing grand. A config package that did the hierarchical merge the way I always ended up doing it anyway. A version package that knew how to compare semver and spot a development build. A setup package that handled first-run bootstrap and self-updating from a release. They lived as separate repos, and if you go digging through my GitHub history you can still find the scruffy ancestors of them scattered about.

Separate packages was the right first move. It forced each piece to stand on its own and earn its keep on a real project before I trusted it on the next one. A package that’s only ever been used in the repo it was born in hasn’t really been tested… it’s just been agreed with.

But separate packages come with a tax. Each one has its own release cadence, its own changelog, its own CI. Worse, they have to agree with each other at the seams, and when they’re versioned independently those seams drift. I’d bump the config package, and the setup package that depended on it would quietly need a matching bump, and the tool that used both would need telling about both. I’d traded “reinvent the wheel” for “keep a dozen wheels in sync”, and I’m really not convinced that’s a better deal.

Then, one library

Once the packages had been used enough (used in anger, on real tools, by people who weren’t me) the shape of them stopped moving. The interfaces settled. The arguments about precedence and defaults were over, because the answers had survived contact with reality.

That’s the point where separate packages stop being a virtue and start being friction. So I forged them into one and called it go-tool-base. One module, one version number, one changelog, and one set of seams that are now internal and can’t drift, because they ship together.

The heart of it is a dependency-injection container, a Props struct, that holds the things every command needs: the logger, the config, the embedded assets, the filesystem handle, the error handler, the tool’s own metadata. Commands are handed Props explicitly rather than reaching for globals, which means a command is just a function of its inputs and is therefore trivially testable. That one decision has quietly paid for itself on every tool I’ve built since.

Around that container sits all the stuff I was so tired of rewriting: hierarchical config, structured logging, version checking, self-update from GitHub or GitLab releases, an interactive TUI documentation browser, AI integration, service lifecycle management. A new tool inherits the lot and gets to spend its first afternoon on the thing that’s actually novel… its own logic.

Finally, a generator

A library still leaves you staring at a blank main.go. You still have to know the conventions, wire the container, lay out the directories, register the commands. All knowable, but all boilerplate. And boilerplate is exactly the enemy I set out to kill in the first place.

So go-tool-base ships a generator. gtb generate skeleton scaffolds a complete, working, idiomatic project: directory layout, the wired Props container, the command tree, CI, the whole lot. gtb generate command adds a new command and registers it for you. The generator also handles upkeep: when the framework’s conventions move, it can regenerate the scaffolding of an existing project without trampling all over the code you’ve written on top. (That last bit turned out to be a properly interesting problem in its own right, and a future post.)

The goal is blunt. Creating a CLI tool should be about the tool, not the scaffolding. The first afternoon should be spent on the part that’s actually worth writing.

One thing I was careful about

There’s a nasty failure mode with “batteries-included” frameworks: the day you outgrow them, they hold you hostage. You either stay inside the framework’s worldview forever, or you face a rewrite. I’ve been burned by that before and I had no intention of inflicting it on anyone else.

So go-tool-base generates idiomatic, standard-library-compliant Go. There’s no magic runtime you can’t see, no clever code you couldn’t have written by hand. If you ever outgrow the framework the generated code stands on its own and you walk away with a perfectly normal Go project. A framework should be a starting point you’re glad you took, not a room you can’t get out of.

Where this leaves me

go-tool-base exists because I was spending the first afternoon of every Go CLI tool rebuilding the same plumbing, and rebuilding it slightly wrong relative to last time. It started life as separate packages so each piece could earn its place on real projects; once they’d stopped moving I forged them into a single library so the seams couldn’t drift; and then I wrapped a generator around it so a new tool starts as a working project rather than a blank file.

It’s a framework for the unglamorous 80% (config, versioning, updates, logging, lifecycle) so you can spend your time on the 20% that’s actually yours.

Over the coming posts I’ll dig into the individual pieces… the generator that won’t clobber your edits, the credential handling, the self-update integrity checks, and a few Go techniques I’m rather pleased with along the way. Stay tuned!

Go on PHP Boy Scout

Reloading config without a restart

The default answer is a restart

What a restart actually costs

Hot-reload: re-read in place

The two details that make it safe

Where this earns its keep: a Kubernetes pod

The honest caveats

What it comes down to

Verifying your own downloads: how I solved it for self-updating CLI tools

The most trusting line of code in the tool

GoReleaser already does half the job

Fail open, or fail closed?

The honest caveat

Pulling it together

What survives a port, and what doesn't

Two columns

The container

Registration

Configuration

The error path

What the exercise was actually worth

The upshot

rust-tool-base: the same idea, in a language that argues back

The same itch, a different language

The gap in Rust

Why it is not a port

Why do it twice at all

Boiling it down

The blank import that keeps a dependency out of your binary

A feature some users have to be able to not have

Why I didn’t reach for a build tag

The shape that actually fits: a registry and an init()

The part that makes it provable

Why I like this more than I expected to

Stepping back

Where should a CLI keep your API keys?

The config file that quietly becomes a liability

Three modes, and which one you get

The one place literal mode is not allowed

How it decides at runtime

The tool tells on itself

The gist

I had the framework audited: every finding was the same shape

Findings cluster, they don’t scatter

Boundary one: a regex compiler

Boundary two: a URL opener

Boundary three: a log sink

The unglamorous part

To sum up

The test-mocking pattern that races

A pattern that looks completely reasonable

Add one line and it detonates

The fix isn’t synchronisation, it’s structure

The rule worth taking away

Worth remembering

Testing code that calls an LLM: yes, you actually can

“You can’t test AI code”

Your code is a prompt and a handler

Test the prompt: snapshot it

Test the handler: mock the response

What you deliberately don’t test

What it comes down to

The AI provider that isn't an API

The fifth provider shells out

Why you’d want that

What it costs, honestly

How it stays behind the same interface

The bottom line

AI conversations you can resume

A CLI forgets everything

Save and restore

A snapshot is opinionated about what it carries

Encrypted at rest, if you want it

The short version

An AI agent that has to make the build pass

Generate and hope

Drafting is only step one

A fixed set of tools, and no shell

The loop

The shape that actually fits: a registry and an `init()`

`embed.FS` is an island