<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>AI on PHP Boy Scout</title><link>https://blog-570662.gitlab.io/tags/ai/</link><description>Recent content in AI on PHP Boy Scout</description><generator>Hugo -- gohugo.io</generator><language>en-gb</language><copyright>Matt Cockayne</copyright><lastBuildDate>Sat, 02 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://blog-570662.gitlab.io/tags/ai/index.xml" rel="self" type="application/rss+xml"/><item><title>Supporting a provider, or actually using it</title><link>https://blog-570662.gitlab.io/supporting-a-provider-or-actually-using-it/</link><pubDate>Sat, 02 May 2026 00:00:00 +0000</pubDate><guid>https://blog-570662.gitlab.io/supporting-a-provider-or-actually-using-it/</guid><description>&lt;img src="https://blog-570662.gitlab.io/supporting-a-provider-or-actually-using-it/cover-supporting-a-provider-or-actually-using-it.png" alt="Featured image of post Supporting a provider, or actually using it" /&gt;&lt;p&gt;If your CLI tool talks to an AI model, you don&amp;rsquo;t want to hard-wire one vendor. So you reach for a single client interface over several providers, which is the right call. The trap is the next step: build that interface on only what every provider has in common, and you quietly throw away the very features that made you want a particular provider in the first place. rust-tool-base&amp;rsquo;s &lt;code&gt;rtb-ai&lt;/code&gt; refuses to make that trade.&lt;/p&gt;
&lt;h2 id="the-pull-toward-one-interface"&gt;The pull toward one interface
&lt;/h2&gt;&lt;p&gt;If your CLI tool talks to an AI model, hard-wiring one vendor is a poor bet. One user has an Anthropic key, another an OpenAI key. Someone&amp;rsquo;s on Gemini. Someone runs Ollama locally because their data can&amp;rsquo;t leave the building. Someone points at an OpenAI-compatible endpoint from a provider you&amp;rsquo;ve never heard of. You don&amp;rsquo;t want a separate code path for each, so you want one &lt;code&gt;AiClient&lt;/code&gt; that all of them slot behind.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;rtb-ai&lt;/code&gt; gets that unification from the &lt;code&gt;genai&lt;/code&gt; crate, which already speaks to Anthropic, OpenAI, Gemini, Ollama and OpenAI-compatible endpoints. One interface, five providers, the tool author picks one in config. The Go sibling makes the same bet: go-tool-base&amp;rsquo;s &lt;code&gt;chat&lt;/code&gt; package also unifies several providers, behind &lt;a class="link" href="https://blog-570662.gitlab.io/an-ai-interface-that-fits-on-one-screen/" &gt;an interface deliberately kept to four methods&lt;/a&gt;. So far this is the obvious design, and if it were the whole design there&amp;rsquo;d be nothing to write about.&lt;/p&gt;
&lt;h2 id="what-unified-quietly-costs-you"&gt;What &amp;ldquo;unified&amp;rdquo; quietly costs you
&lt;/h2&gt;&lt;p&gt;Here&amp;rsquo;s the catch in any unified interface. It can only expose what every provider behind it has in common.&lt;/p&gt;
&lt;p&gt;The common subset is plain chat. Messages go in, text comes out, optionally streamed token by token. That&amp;rsquo;s real and it&amp;rsquo;s useful and every provider does it. But the common subset is also the &lt;em&gt;floor&lt;/em&gt;, and the features that make a particular provider worth choosing are almost never on the floor. They&amp;rsquo;re the things only that provider does.&lt;/p&gt;
&lt;p&gt;Anthropic is the sharp example, because it has three features that matter and not one of them is common-subset.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Prompt caching.&lt;/strong&gt; You can mark the stable parts of a request, the system prompt and the tool list, as cacheable. The provider keeps them warm, and on the next turn you aren&amp;rsquo;t billed to re-send and re-process text that didn&amp;rsquo;t change. On a long agent loop, where the same large system prompt rides along on every single turn, that&amp;rsquo;s a substantial saving in both cost and latency.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Extended thinking.&lt;/strong&gt; The model works through a hard problem in a visible, budgeted reasoning pass before it commits to an answer, and you can see that reasoning.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Citations.&lt;/strong&gt; Structured references back to source material in the response.&lt;/p&gt;
&lt;p&gt;A client built strictly on the common subset can&amp;rsquo;t express any of those. It has no field for them, because four of the five providers wouldn&amp;rsquo;t know what to do with the field. So a purely lowest-common-denominator client would &amp;ldquo;support&amp;rdquo; Anthropic and then use it badly, leaving its best features unreachable. Support as a checkbox, not as the point.&lt;/p&gt;
&lt;h2 id="the-escape-hatch"&gt;The escape hatch
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;rtb-ai&lt;/code&gt;&amp;rsquo;s answer is to not choose. It runs two implementations under one interface.&lt;/p&gt;
&lt;p&gt;For OpenAI, Gemini, Ollama and OpenAI-compatible endpoints, calls route through &lt;code&gt;genai&lt;/code&gt;, the unified path. For Anthropic, every method drops to a direct &lt;code&gt;reqwest&lt;/code&gt; implementation straight against the Messages API. Same &lt;code&gt;AiClient&lt;/code&gt; on the surface, a different implementation underneath, selected by which provider the config names.&lt;/p&gt;
&lt;p&gt;And the request type has deliberate room for the difference:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-rust" data-lang="rust"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;pub&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="nc"&gt;ChatRequest&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;pub&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;system&lt;/span&gt;: &lt;span class="nb"&gt;Option&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nb"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;pub&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;: &lt;span class="nb"&gt;Vec&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;Message&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;pub&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;temperature&lt;/span&gt;: &lt;span class="nb"&gt;Option&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;f32&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;pub&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;max_tokens&lt;/span&gt;: &lt;span class="nb"&gt;Option&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;u32&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="sd"&gt;/// Anthropic-only: enables prompt caching at every stable point.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="sd"&gt;/// Ignored on non-Anthropic providers.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;pub&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;cache_control&lt;/span&gt;: &lt;span class="kt"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="sd"&gt;/// Anthropic-only: extended-thinking budget. `None` disables.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="sd"&gt;/// Ignored on non-Anthropic providers.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;pub&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;thinking&lt;/span&gt;: &lt;span class="nb"&gt;Option&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;ThinkingMode&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Set &lt;code&gt;cache_control&lt;/code&gt; and the Anthropic-direct path inserts cache breakpoints at the three stable points: the system prompt, the tool list, and the first message. Set &lt;code&gt;thinking&lt;/code&gt; and it adds the thinking block, and streaming surfaces a separate &lt;code&gt;ThinkingToken&lt;/code&gt; event so you can show the reasoning apart from the answer. On a non-Anthropic provider, both fields are simply ignored. The interface carries them; only the implementation that understands them acts on them.&lt;/p&gt;
&lt;h2 id="a-hatch-not-a-leak"&gt;A hatch, not a leak
&lt;/h2&gt;&lt;p&gt;It&amp;rsquo;s worth being precise about why this isn&amp;rsquo;t the thing it superficially resembles, which is a leaky abstraction.&lt;/p&gt;
&lt;p&gt;A leaky abstraction is one where implementation details bleed through that you didn&amp;rsquo;t intend and can&amp;rsquo;t reason about. The abstraction quietly fails to abstract, and you&amp;rsquo;re left guessing which provider you&amp;rsquo;re really talking to.&lt;/p&gt;
&lt;p&gt;This is the opposite of that. The two Anthropic-only fields aren&amp;rsquo;t a leak. They&amp;rsquo;re named, documented as Anthropic-only, inert everywhere else, and right there in the public type for anyone to see. The interface is uniform for the common case and &lt;em&gt;deliberately, visibly&lt;/em&gt; non-uniform at exactly the points where uniformity would have cost you the good features. You opt into provider-specifics by setting a field. You stay fully portable by leaving it at its default. Nothing bleeds; you decide.&lt;/p&gt;
&lt;p&gt;The same design line explains what &lt;em&gt;does&lt;/em&gt; stay in the unified path. Structured output, &lt;code&gt;chat_structured::&amp;lt;T&amp;gt;&lt;/code&gt;, sends a JSON Schema derived from your Rust type with the request and validates the reply against it before handing you a typed &lt;code&gt;T&lt;/code&gt;. That&amp;rsquo;s a portability win that costs nothing across providers, so it belongs in the common interface. The split isn&amp;rsquo;t &amp;ldquo;Anthropic versus the rest&amp;rdquo;. It&amp;rsquo;s &amp;ldquo;features that are free to unify go in the unified path; features that aren&amp;rsquo;t get a designed door&amp;rdquo;. Prompt caching and extended thinking get the door, because flattening them away would be the expensive kind of convenient.&lt;/p&gt;
&lt;h2 id="to-sum-up"&gt;To sum up
&lt;/h2&gt;&lt;p&gt;A CLI tool that integrates AI wants one client over several providers, and a unified interface can only expose what those providers share. The shared floor is plain chat, and the features worth choosing a provider &lt;em&gt;for&lt;/em&gt;, like Anthropic&amp;rsquo;s prompt caching, extended thinking and citations, are never on the floor.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;rtb-ai&lt;/code&gt; keeps both. &lt;code&gt;genai&lt;/code&gt; provides the unified path across five providers; an Anthropic-direct &lt;code&gt;reqwest&lt;/code&gt; path drops below the abstraction for the features &lt;code&gt;genai&lt;/code&gt; can&amp;rsquo;t reach, and &lt;code&gt;ChatRequest&lt;/code&gt; carries the Anthropic-only fields openly, ignored elsewhere. Uniform where uniformity is free, with a designed escape hatch where it isn&amp;rsquo;t. That&amp;rsquo;s the difference between supporting a provider and actually using it.&lt;/p&gt;</description></item><item><title>Testing code that calls an LLM: yes, you actually can</title><link>https://blog-570662.gitlab.io/testing-code-that-calls-an-llm/</link><pubDate>Wed, 08 Apr 2026 00:00:00 +0000</pubDate><guid>https://blog-570662.gitlab.io/testing-code-that-calls-an-llm/</guid><description>&lt;img src="https://blog-570662.gitlab.io/testing-code-that-calls-an-llm/cover-testing-code-that-calls-an-llm.png" alt="Featured image of post Testing code that calls an LLM: yes, you actually can" /&gt;&lt;p&gt;&amp;ldquo;You can&amp;rsquo;t test code that calls an AI.&amp;rdquo; I&amp;rsquo;ve heard it said with great confidence, and it&amp;rsquo;s half right, which is the most dangerous kind of right. You genuinely can&amp;rsquo;t assert on what a non-deterministic model says. But the model isn&amp;rsquo;t your code, and the bits sitting either side of it most certainly are.&lt;/p&gt;
&lt;h2 id="you-cant-test-ai-code"&gt;&amp;ldquo;You can&amp;rsquo;t test AI code&amp;rdquo;
&lt;/h2&gt;&lt;p&gt;It&amp;rsquo;s a fair worry. Your command calls an LLM. The LLM returns something slightly different every run. A test that asserts &lt;code&gt;response == &amp;quot;...&amp;quot;&lt;/code&gt; is broken before you&amp;rsquo;ve finished typing it. So the conclusion arrives quickly: the AI path can&amp;rsquo;t be tested, leave it uncovered.&lt;/p&gt;
&lt;p&gt;Which is a shame, because the AI call is usually the riskiest line in the whole command.&lt;/p&gt;
&lt;p&gt;The conclusion is also wrong. It mistakes &amp;ldquo;I can&amp;rsquo;t test the model&amp;rdquo; for &amp;ldquo;I can&amp;rsquo;t test my code&amp;rdquo;. The model is not your code. Your code is the two pieces sitting on either side of it.&lt;/p&gt;
&lt;h2 id="your-code-is-a-prompt-and-a-handler"&gt;Your code is a prompt and a handler
&lt;/h2&gt;&lt;p&gt;Strip the command down to what it actually does:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;It builds a prompt. It assembles a system prompt, the user&amp;rsquo;s input, perhaps some context, and sends it.&lt;/li&gt;
&lt;li&gt;The model does something. This is not your code.&lt;/li&gt;
&lt;li&gt;It takes the response and does something with it. It parses it, branches on it, prints it, stores it.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Steps one and three are entirely yours, and entirely deterministic. The same inputs build the same prompt and handle the same response the same way, every single time. That&amp;rsquo;s testable. Step two is the only part that isn&amp;rsquo;t, and step two was never yours to test in the first place.&lt;/p&gt;
&lt;p&gt;So the job is to pin step two to a known value, and then test one and three properly.&lt;/p&gt;
&lt;h2 id="test-the-prompt-snapshot-it"&gt;Test the prompt: snapshot it
&lt;/h2&gt;&lt;p&gt;Step one produces a prompt, and a prompt is just a string, which means you can pin it.&lt;/p&gt;
&lt;p&gt;Both frameworks lean on snapshot testing here. go-tool-base uses a golden-file approach: the prompt your code generates is recorded to a file, and the test re-generates it and compares against that file. rust-tool-base does the same with &lt;code&gt;insta&lt;/code&gt;, snapshotting the request body the client would send.&lt;/p&gt;
&lt;p&gt;The reason this matters is that the prompt is load-bearing and quietly easy to break. You refactor how context gets assembled. Without noticing, you&amp;rsquo;ve changed the wording, or the ordering, or dropped a line the model was leaning on. Nothing fails to compile. The behaviour just drifts, silently.&lt;/p&gt;
&lt;p&gt;A snapshot test catches exactly that. It fails, it shows you the diff between the old prompt and the new one, and it makes you stop and make a decision. Was this change intended? If yes, you accept the new snapshot and move on. If no, you&amp;rsquo;ve just caught a bug before it shipped. Either way the prompt never changes by accident, which for AI code is most of the battle.&lt;/p&gt;
&lt;h2 id="test-the-handler-mock-the-response"&gt;Test the handler: mock the response
&lt;/h2&gt;&lt;p&gt;Step three needs a response to handle, and in a unit test you don&amp;rsquo;t get that response from the real model. You supply it.&lt;/p&gt;
&lt;p&gt;go-tool-base ships generated mocks for the &lt;code&gt;ChatClient&lt;/code&gt; interface. A test builds a mock client, tells it &amp;ldquo;when &lt;code&gt;Ask&lt;/code&gt; is called, return &lt;em&gt;this&lt;/em&gt; canned value&amp;rdquo;, and runs the command against it:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-go" data-lang="go"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nx"&gt;mockClient&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;mock_chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;NewMockChatClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;t&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nx"&gt;mockClient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;EXPECT&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;Ask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;mock&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Anything&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;mock&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Anything&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;mock&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;AnythingOfType&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;#34;*main.Analysis&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;)).&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;RunAndReturn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;_&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;_&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;target&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;any&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;target&lt;/span&gt;&lt;span class="p"&gt;.(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="nx"&gt;Analysis&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Analysis&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;Severity&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;#34;critical&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;nil&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Because &lt;a class="link" href="https://blog-570662.gitlab.io/an-ai-interface-that-fits-on-one-screen/" &gt;the interface is only four methods&lt;/a&gt;, that mock is trivial to set up and complete by construction. rust-tool-base takes the same idea one layer down: HTTP-bound tests use &lt;code&gt;wiremock&lt;/code&gt;, which stands up a fake server returning a canned response body. The client makes a real HTTP request; it just goes to a fake endpoint the test controls.&lt;/p&gt;
&lt;p&gt;Either way, step two is now fixed to a value you chose, which makes step three deterministic. And that unlocks the tests that actually matter: given a malformed response, does the command fail gracefully? Given a rate-limit error, an empty answer, a field missing? Those are the cases a live model almost never hands you on demand, and a mock hands you every time, on the first run.&lt;/p&gt;
&lt;p&gt;This is, incidentally, the same discipline as &lt;a class="link" href="https://blog-570662.gitlab.io/the-test-mocking-pattern-that-races/" &gt;the test-mocking work elsewhere in the framework&lt;/a&gt;: the dependency is injected, so the test gets to decide what it does.&lt;/p&gt;
&lt;h2 id="what-you-deliberately-dont-test"&gt;What you deliberately don&amp;rsquo;t test
&lt;/h2&gt;&lt;p&gt;One honest boundary. None of this tests whether the model gives &lt;em&gt;good&lt;/em&gt; answers. That question is real, but it&amp;rsquo;s a different activity (evaluations, run as their own suite) and not something to mix into the unit tests.&lt;/p&gt;
&lt;p&gt;The unit suite&amp;rsquo;s job is your code: that it builds a sound prompt, and that it handles every shape of response correctly, including the ugly ones. Keep that well away from &amp;ldquo;is the model clever today&amp;rdquo;. A unit test that depends on the model being clever is a unit test that fails when the weather changes, and a flaky test just teaches people to ignore the whole suite.&lt;/p&gt;
&lt;h2 id="what-it-comes-down-to"&gt;What it comes down to
&lt;/h2&gt;&lt;p&gt;Code that calls an LLM is testable; the model is not, and those are different statements. Your code is a prompt builder and a response handler, both deterministic, with the model sat in between.&lt;/p&gt;
&lt;p&gt;go-tool-base and rust-tool-base converge on the same approach. Snapshot the prompt, with golden files or &lt;code&gt;insta&lt;/code&gt;, so a refactor can&amp;rsquo;t change what you send without a test noticing. Mock the response, with generated &lt;code&gt;ChatClient&lt;/code&gt; mocks or a &lt;code&gt;wiremock&lt;/code&gt; server, so tests run with no network and you can feed in the malformed and error cases a real model won&amp;rsquo;t reliably produce. Leave &amp;ldquo;are the answers any good&amp;rdquo; to a separate evaluation suite. Test the two halves you own, and the non-determinism in the middle stops being an excuse to leave the riskiest line uncovered.&lt;/p&gt;</description></item><item><title>The AI provider that isn't an API</title><link>https://blog-570662.gitlab.io/the-ai-provider-that-isnt-an-api/</link><pubDate>Mon, 06 Apr 2026 00:00:00 +0000</pubDate><guid>https://blog-570662.gitlab.io/the-ai-provider-that-isnt-an-api/</guid><description>&lt;img src="https://blog-570662.gitlab.io/the-ai-provider-that-isnt-an-api/cover-the-ai-provider-that-isnt-an-api.png" alt="Featured image of post The AI provider that isn't an API" /&gt;&lt;p&gt;go-tool-base&amp;rsquo;s &lt;code&gt;chat&lt;/code&gt; package puts five AI providers behind one interface. Four of them are exactly what you&amp;rsquo;d guess: HTTP calls to OpenAI, Claude, Gemini, and anything OpenAI-compatible. The fifth one isn&amp;rsquo;t an API at all. It shells out to a binary.&lt;/p&gt;
&lt;p&gt;That sounds like a slightly mad thing to want, right up until you&amp;rsquo;ve worked somewhere the network says no.&lt;/p&gt;
&lt;h2 id="the-fifth-provider-shells-out"&gt;The fifth provider shells out
&lt;/h2&gt;&lt;p&gt;The &lt;code&gt;chat&lt;/code&gt; package speaks to five providers through one &lt;code&gt;ChatClient&lt;/code&gt; interface. Four of them are what you&amp;rsquo;d expect: HTTP requests to OpenAI, to Claude, to Gemini, to any OpenAI-compatible endpoint. The tool author picks one in config, and the rest of the code never knows the difference.&lt;/p&gt;
&lt;p&gt;The fifth, &lt;code&gt;ProviderClaudeLocal&lt;/code&gt;, is different in kind. It doesn&amp;rsquo;t make an HTTP request at all. It shells out. It runs the &lt;code&gt;claude&lt;/code&gt; CLI binary as a child process, passes the prompt in, and reads the answer back from the binary&amp;rsquo;s output.&lt;/p&gt;
&lt;p&gt;That sounds like an odd thing to want until you&amp;rsquo;ve been stuck in the environment it was built for.&lt;/p&gt;
&lt;h2 id="why-youd-want-that"&gt;Why you&amp;rsquo;d want that
&lt;/h2&gt;&lt;p&gt;Picture a corporate network with its egress locked right down. Outbound HTTPS to &lt;code&gt;api.anthropic.com&lt;/code&gt; is blocked by policy. A tool built on go-tool-base that uses AI would simply fall over there. It tries to reach the API, there&amp;rsquo;s no route, and that&amp;rsquo;s the end of the feature.&lt;/p&gt;
&lt;p&gt;But the developer at that machine has the &lt;code&gt;claude&lt;/code&gt; CLI installed, and has run &lt;code&gt;claude login&lt;/code&gt;. That binary is permitted. It&amp;rsquo;s an approved, managed tool, and it has its own sanctioned path out. The direct API call is blocked; the &lt;code&gt;claude&lt;/code&gt; command is not.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;ProviderClaudeLocal&lt;/code&gt; is what bridges those two facts. If your tool&amp;rsquo;s AI calls go &lt;em&gt;through&lt;/em&gt; that already-blessed binary instead of straight at the API, they work, in an environment where the direct call cannot. That&amp;rsquo;s the whole reason the provider exists. It isn&amp;rsquo;t faster (a real API call has lower latency) and it isn&amp;rsquo;t more capable. It&amp;rsquo;s for the place where the API call simply isn&amp;rsquo;t an option, and &amp;ldquo;isn&amp;rsquo;t an option&amp;rdquo; is a surprisingly common place to find yourself inside a large organisation.&lt;/p&gt;
&lt;h2 id="what-it-costs-honestly"&gt;What it costs, honestly
&lt;/h2&gt;&lt;p&gt;It&amp;rsquo;s worth being straight about the trade, because &lt;code&gt;ProviderClaudeLocal&lt;/code&gt; is the reduced-capability provider.&lt;/p&gt;
&lt;p&gt;It doesn&amp;rsquo;t do tool calling. It doesn&amp;rsquo;t do parallel tools. It doesn&amp;rsquo;t stream. Those need a live, structured connection to the model&amp;rsquo;s API, and a subprocess that runs once and prints an answer is not that. What it &lt;em&gt;does&lt;/em&gt; support is plain chat and structured output, the latter through the binary&amp;rsquo;s own &lt;code&gt;--json-schema&lt;/code&gt; flag.&lt;/p&gt;
&lt;p&gt;So the honest positioning, and the package&amp;rsquo;s documentation says exactly this, is: prefer the API providers when you can reach them, because they&amp;rsquo;re lower latency and feature-complete. Reach for &lt;code&gt;ProviderClaudeLocal&lt;/code&gt; when API access is restricted. You accept the narrower capability set as the price of working at all. For a tool whose AI feature is &amp;ldquo;answer a question&amp;rdquo; or &amp;ldquo;return a structured analysis&amp;rdquo;, that price is often nothing you&amp;rsquo;d even notice. For one built on an agentic tool-calling loop, it&amp;rsquo;s a real limitation, and you&amp;rsquo;d know to expect it.&lt;/p&gt;
&lt;h2 id="how-it-stays-behind-the-same-interface"&gt;How it stays behind the same interface
&lt;/h2&gt;&lt;p&gt;Here&amp;rsquo;s the part that makes it pleasant rather than a special case to maintain. Despite being a subprocess and not an API, &lt;code&gt;ProviderClaudeLocal&lt;/code&gt; is still a &lt;code&gt;ChatClient&lt;/code&gt;. Your feature code calls &lt;code&gt;Chat&lt;/code&gt; and &lt;code&gt;Ask&lt;/code&gt; exactly the way it would for any other provider.&lt;/p&gt;
&lt;p&gt;Everything that makes a subprocess provider awkward stays inside the provider. Spawning the binary, feeding it the prompt, parsing its output, capturing &lt;code&gt;stderr&lt;/code&gt; and surfacing it when the binary exits non-zero, and threading multi-turn continuity through session identifiers passed back on the next call with &lt;code&gt;--resume&lt;/code&gt;: all of that is the provider&amp;rsquo;s problem, and all of it sits behind the interface. The code in your tool that uses AI doesn&amp;rsquo;t know, and has no way to find out, that this particular provider is a child process rather than an HTTPS call.&lt;/p&gt;
&lt;p&gt;That&amp;rsquo;s a unified interface genuinely earning its place. It&amp;rsquo;s easy to put a uniform face on four things that already work the same way underneath. The real test of the abstraction is whether something that works in a &lt;em&gt;completely&lt;/em&gt; different way, a subprocess instead of a socket, can still slot in without the caller changing a line. Here it can. You swap one config value, and a tool that talked to an API now talks through a binary, and nothing downstream so much as blinks.&lt;/p&gt;
&lt;h2 id="the-bottom-line"&gt;The bottom line
&lt;/h2&gt;&lt;p&gt;go-tool-base&amp;rsquo;s &lt;code&gt;chat&lt;/code&gt; package puts five providers behind one &lt;code&gt;ChatClient&lt;/code&gt; interface, and &lt;code&gt;ProviderClaudeLocal&lt;/code&gt; is the one that isn&amp;rsquo;t an API. It runs the locally installed, pre-authenticated &lt;code&gt;claude&lt;/code&gt; CLI as a subprocess.&lt;/p&gt;
&lt;p&gt;It exists for the locked-down environment where outbound HTTPS to the AI API is blocked but the &lt;code&gt;claude&lt;/code&gt; binary is allowed: there, AI features keep working where a direct call would fail. The trade is a narrower capability set (no tool calling, no streaming, plain chat and structured output only) so you prefer the API providers when you can reach them and fall back to this when you can&amp;rsquo;t. And because it&amp;rsquo;s still a &lt;code&gt;ChatClient&lt;/code&gt;, all the subprocess machinery stays hidden, and your code uses it without knowing it&amp;rsquo;s there. That last part is the real test of an abstraction: a provider that works in an entirely different way still slots in unchanged.&lt;/p&gt;</description></item><item><title>AI conversations you can resume</title><link>https://blog-570662.gitlab.io/ai-conversations-you-can-resume/</link><pubDate>Sat, 04 Apr 2026 00:00:00 +0000</pubDate><guid>https://blog-570662.gitlab.io/ai-conversations-you-can-resume/</guid><description>&lt;img src="https://blog-570662.gitlab.io/ai-conversations-you-can-resume/cover-ai-conversations-you-can-resume.png" alt="Featured image of post AI conversations you can resume" /&gt;&lt;p&gt;An AI conversation is, fundamentally, its own history. The model&amp;rsquo;s next answer depends on everything said so far. And a CLI tool, by its very nature, forgets everything the moment it exits. Put those two facts together and you get the problem: run an AI command, exit, run it again, and you&amp;rsquo;re talking to someone who&amp;rsquo;s never met you.&lt;/p&gt;
&lt;h2 id="a-cli-forgets-everything"&gt;A CLI forgets everything
&lt;/h2&gt;&lt;p&gt;A long-running service keeps its state in memory for as long as it runs. A CLI tool doesn&amp;rsquo;t get that luxury. It starts, does one thing, exits. The next invocation is a brand-new process with no memory of the last one.&lt;/p&gt;
&lt;p&gt;For most commands that&amp;rsquo;s exactly right, and you wouldn&amp;rsquo;t want it any other way. But an AI conversation is a different kind of beast, because a conversation &lt;em&gt;is&lt;/em&gt; its history. The model&amp;rsquo;s next answer depends on everything said so far. Run an AI command, exit, run it again, and you&amp;rsquo;ve started a fresh conversation with someone who&amp;rsquo;s never met you. For an interactive assistant, or any AI workflow that unfolds across several invocations, that&amp;rsquo;s plainly the wrong behaviour. The user expects to pick up where they left off.&lt;/p&gt;
&lt;h2 id="save-and-restore"&gt;Save and restore
&lt;/h2&gt;&lt;p&gt;The &lt;code&gt;chat&lt;/code&gt; package handles this through a &lt;code&gt;PersistentChatClient&lt;/code&gt; interface. Like streaming, it&amp;rsquo;s an optional capability discovered with a type assertion, sitting beside &lt;a class="link" href="https://blog-570662.gitlab.io/an-ai-interface-that-fits-on-one-screen/" &gt;the four-method core&lt;/a&gt; rather than bloating it. A client that supports persistence also satisfies this interface:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-go" data-lang="go"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;pc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ok&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.(&lt;/span&gt;&lt;span class="nx"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;PersistentChatClient&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ok&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;snapshot&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;pc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Save&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c1"&gt;// store the snapshot somewhere&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;A snapshot is a serialisable value that captures the conversation. You store it. Next run, you load it, &lt;code&gt;Restore&lt;/code&gt; it onto a fresh client, re-register your tools, and call &lt;code&gt;Chat&lt;/code&gt; again. &amp;ldquo;Where were we?&amp;rdquo; works, because the model is handed back the whole history.&lt;/p&gt;
&lt;h2 id="a-snapshot-is-opinionated-about-what-it-carries"&gt;A snapshot is opinionated about what it carries
&lt;/h2&gt;&lt;p&gt;The interesting part is what a snapshot does and doesn&amp;rsquo;t contain, because that&amp;rsquo;s a series of deliberate decisions.&lt;/p&gt;
&lt;p&gt;It carries the messages, the system prompt, the model name, and tool &lt;em&gt;metadata&lt;/em&gt;: the names, descriptions and parameter schemas of the tools that were registered.&lt;/p&gt;
&lt;p&gt;It does not carry tool &lt;em&gt;handlers&lt;/em&gt;. Handlers are code, not data; you can&amp;rsquo;t serialise a function meaningfully, so after a restore you re-register them with &lt;code&gt;SetTools&lt;/code&gt;. The snapshot remembers that a tool called &lt;code&gt;read_file&lt;/code&gt; existed and what its shape was; it doesn&amp;rsquo;t try to remember the Go function behind it.&lt;/p&gt;
&lt;p&gt;And it does not carry API tokens. This is the one to dwell on. A snapshot is a file. A file gets synced, backed up, copied between machines, attached to a support ticket by a user trying to be helpful. A snapshot that carried the API key would be a credential leak the moment it left the laptop it was made on. So the snapshot never contains a token, at all. On restore, the client picks the credential up again the ordinary way, from &lt;a class="link" href="https://blog-570662.gitlab.io/where-should-a-cli-keep-your-api-keys/" &gt;the environment or the keychain&lt;/a&gt;. The conversation and the secret are kept in separate places on purpose, and only one of them is ever in the file.&lt;/p&gt;
&lt;h2 id="encrypted-at-rest-if-you-want-it"&gt;Encrypted at rest, if you want it
&lt;/h2&gt;&lt;p&gt;The package ships a &lt;code&gt;FileStore&lt;/code&gt; that writes snapshots as JSON files, with &lt;code&gt;0600&lt;/code&gt; permissions in a &lt;code&gt;0700&lt;/code&gt; directory, and it can encrypt them. Pass &lt;code&gt;WithEncryption&lt;/code&gt; a 32-byte key and snapshots are written with AES-256-GCM.&lt;/p&gt;
&lt;p&gt;That option exists because a conversation can hold sensitive content even when it holds no credential. The log a user pasted in for analysis, the source file they asked the model to review, the internal details tucked into their questions: none of that is an API key, and all of it might be something you&amp;rsquo;d rather not have sitting in plain JSON in a backup somewhere. Encryption at rest covers it.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;FileStore&lt;/code&gt; is also careful about the snapshot identifiers it&amp;rsquo;s handed. An ID has to be a canonical UUID, and the resolved file path is checked to lie inside the store directory, so a snapshot ID arriving from an untrusted source (a CLI flag, a request payload) can&amp;rsquo;t be bent into a path-traversal that reads or writes somewhere it shouldn&amp;rsquo;t. Persisting conversations adds a small filesystem surface, and the store treats it as exactly that.&lt;/p&gt;
&lt;h2 id="the-short-version"&gt;The short version
&lt;/h2&gt;&lt;p&gt;A CLI tool forgets everything between invocations, which is correct for most commands and wrong for an AI conversation, because a conversation is its history.&lt;/p&gt;
&lt;p&gt;go-tool-base&amp;rsquo;s &lt;code&gt;chat&lt;/code&gt; package lets you persist one. &lt;code&gt;PersistentChatClient&lt;/code&gt; saves a snapshot you can store and restore later, picking the conversation back up where it ended. The snapshot is deliberate about its contents: messages, system prompt and tool metadata yes; tool handlers no, because they&amp;rsquo;re code you re-register; API tokens never, because a snapshot is a file and a file travels. The built-in &lt;code&gt;FileStore&lt;/code&gt; can encrypt snapshots at rest with AES-256-GCM and validates snapshot IDs against path traversal. Resumable conversations, without the conversation file turning into a place secrets leak from.&lt;/p&gt;</description></item><item><title>An AI agent that has to make the build pass</title><link>https://blog-570662.gitlab.io/an-ai-agent-that-has-to-make-the-build-pass/</link><pubDate>Thu, 02 Apr 2026 00:00:00 +0000</pubDate><guid>https://blog-570662.gitlab.io/an-ai-agent-that-has-to-make-the-build-pass/</guid><description>&lt;img src="https://blog-570662.gitlab.io/an-ai-agent-that-has-to-make-the-build-pass/cover-an-ai-agent-that-has-to-make-the-build-pass.png" alt="Featured image of post An AI agent that has to make the build pass" /&gt;&lt;p&gt;Most AI code generation works on a charming little principle I&amp;rsquo;ll call generate-and-hope. The model writes the code, the model stops at the closing brace, and whether the thing actually compiles is left as an exercise for you. For a snippet you paste into an editor, fine. For a whole generated command, that&amp;rsquo;s just outsourcing the disappointment.&lt;/p&gt;
&lt;p&gt;go-tool-base does something I&amp;rsquo;m rather happier with: the AI has to make the build pass before it&amp;rsquo;s allowed to claim it&amp;rsquo;s done.&lt;/p&gt;
&lt;h2 id="generate-and-hope"&gt;Generate and hope
&lt;/h2&gt;&lt;p&gt;The usual shape of AI code generation is this. You ask for code, the model produces it, and the model&amp;rsquo;s job ends at the closing brace. Whether it compiles, whether the tests pass, whether the imports even resolve, none of that has been checked. The model produced something that &lt;em&gt;looks&lt;/em&gt; right. You find out whether it &lt;em&gt;is&lt;/em&gt; right when you build it.&lt;/p&gt;
&lt;p&gt;For a snippet you paste into an editor, that&amp;rsquo;s perfectly fine. The compiler tells you in a second. But go-tool-base&amp;rsquo;s generator, driven by &lt;code&gt;gtb generate command --script&lt;/code&gt; or &lt;code&gt;--prompt&lt;/code&gt;, produces a whole command: the implementation, its tests, the lot. &amp;ldquo;Generate and hope&amp;rdquo; at that scale means handing the user a project that may or may not build, and quietly making them the one who finds out which.&lt;/p&gt;
&lt;h2 id="drafting-is-only-step-one"&gt;Drafting is only step one
&lt;/h2&gt;&lt;p&gt;So the generator doesn&amp;rsquo;t stop at drafting. Writing the first version of the implementation and its tests is step one of two. Step two is an autonomous repair agent.&lt;/p&gt;
&lt;p&gt;Once the draft is on the filesystem, a separate agent takes over. It&amp;rsquo;s an LLM running in a loop, but a loop aimed at one narrow, checkable job: make this project build and pass its tests. It isn&amp;rsquo;t asked to be creative. It&amp;rsquo;s asked to get to green.&lt;/p&gt;
&lt;h2 id="a-fixed-set-of-tools-and-no-shell"&gt;A fixed set of tools, and no shell
&lt;/h2&gt;&lt;p&gt;The agent is not handed a shell. It&amp;rsquo;s given a fixed, defined set of tools and nothing else. Three of them let it explore and edit the project: &lt;code&gt;list_dir&lt;/code&gt;, &lt;code&gt;read_file&lt;/code&gt;, &lt;code&gt;write_file&lt;/code&gt;. Four of them let it verify the project:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;go_build&lt;/code&gt; runs the build and captures the compiler errors.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;go_test&lt;/code&gt; runs the tests and captures the failures.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;go_get&lt;/code&gt; resolves a missing dependency.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;golangci_lint&lt;/code&gt; runs the project&amp;rsquo;s linter.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That restriction is the design, not a limitation of it. The agent can&amp;rsquo;t delete arbitrary files, can&amp;rsquo;t reach the network, can&amp;rsquo;t run anything that isn&amp;rsquo;t on the list. It has exactly what it needs to make code compile and nothing it would need to do damage. Its file writes are confined to the project directory by an explicit path check, so even &lt;code&gt;write_file&lt;/code&gt; can&amp;rsquo;t go wandering up into &lt;code&gt;/etc&lt;/code&gt;. A coding agent you&amp;rsquo;d actually let near a filesystem is one whose abilities are an allowlist, not a denylist. (I keep coming back to that principle through this series&amp;hellip; safety as a boundary you draw, not a behaviour you hope for.)&lt;/p&gt;
&lt;h2 id="the-loop"&gt;The loop
&lt;/h2&gt;&lt;p&gt;The repair loop is a ReAct loop, the same reason-act-observe shape as &lt;a class="link" href="https://blog-570662.gitlab.io/letting-the-ai-call-your-go-functions/" &gt;the tool-calling loop&lt;/a&gt;, only this time pointed at a goal:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The draft is on disk.&lt;/li&gt;
&lt;li&gt;Verify: run &lt;code&gt;go_build&lt;/code&gt; and &lt;code&gt;go_test&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;If verification failed, read the error logs, the compiler error or the failing test.&lt;/li&gt;
&lt;li&gt;Reason about the cause: an undefined variable, a missing import, a wrong signature.&lt;/li&gt;
&lt;li&gt;Act: call &lt;code&gt;write_file&lt;/code&gt; to patch the code, or &lt;code&gt;go_get&lt;/code&gt; to add the dependency.&lt;/li&gt;
&lt;li&gt;Loop. Steps two to five repeat until the project is green, or the agent hits its step limit, which defaults to 15.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;What makes this work is treating the error output as &lt;em&gt;feedback&lt;/em&gt; rather than as a failure to log and walk away from. A compiler error is the single most useful sentence you can hand a model that&amp;rsquo;s trying to fix code. It says what&amp;rsquo;s wrong, and usually where. The loop feeds it straight back in, and the model fixes against it.&lt;/p&gt;
&lt;h2 id="verification-changes-what-done-means"&gt;Verification changes what &amp;ldquo;done&amp;rdquo; means
&lt;/h2&gt;&lt;p&gt;Here&amp;rsquo;s the real shift, and the agent&amp;rsquo;s own documentation puts it well: the agent &amp;ldquo;doesn&amp;rsquo;t just say it fixed a bug; it uses a Test tool to verify the fix before reporting success.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;A generate-and-hope model reports success when it finishes &lt;em&gt;writing&lt;/em&gt;. It has no idea whether the code works, and it isn&amp;rsquo;t really claiming otherwise. &amp;ldquo;Done&amp;rdquo; means &amp;ldquo;I produced text&amp;rdquo;. The repair agent reports success when &lt;code&gt;go_build&lt;/code&gt; and &lt;code&gt;go_test&lt;/code&gt; actually &lt;em&gt;pass&lt;/em&gt;. &amp;ldquo;Done&amp;rdquo; means &amp;ldquo;the build is green&amp;rdquo;. Those are two completely different claims, and only the second is worth anything to the person who asked for the command.&lt;/p&gt;
&lt;p&gt;That&amp;rsquo;s the line between an AI that&amp;rsquo;s a creative writer and an AI that&amp;rsquo;s a collaborator you can hand a task to. And when the agent can&amp;rsquo;t reach green, when it spends its whole step budget and the project is still broken, the generator fails safely: it leaves the best-attempt code in place, commented out so the project still compiles, and tells the user what to finish by hand. There&amp;rsquo;s also an &lt;code&gt;--agentless&lt;/code&gt; flag for anyone who&amp;rsquo;d rather have a plain single-shot retry than the multi-step agent. The default, though, is the agent, because the default should be code that&amp;rsquo;s been checked.&lt;/p&gt;
&lt;h2 id="where-this-leaves-us"&gt;Where this leaves us
&lt;/h2&gt;&lt;p&gt;Most AI code generation generates and hopes: the model writes code and the user discovers whether it works. For a whole generated command, that pushes a may-or-may-not-build project onto the user.&lt;/p&gt;
&lt;p&gt;go-tool-base&amp;rsquo;s generator drafts the command and then hands it to an autonomous repair agent. The agent has a fixed set of tools (explore and edit the project, build it, test it, lint it, fetch dependencies) and no shell at all, with file writes confined to the project directory. It runs a ReAct loop, reading each error and patching against it, until the build is green or it exhausts its steps. The point is what &amp;ldquo;done&amp;rdquo; comes to mean: not &amp;ldquo;the model finished writing&amp;rdquo;, but &amp;ldquo;the build passes&amp;rdquo;. Only one of those is a claim worth trusting.&lt;/p&gt;</description></item><item><title>Stop regex-ing the LLM's prose</title><link>https://blog-570662.gitlab.io/stop-regexing-the-llms-prose/</link><pubDate>Tue, 31 Mar 2026 00:00:00 +0000</pubDate><guid>https://blog-570662.gitlab.io/stop-regexing-the-llms-prose/</guid><description>&lt;img src="https://blog-570662.gitlab.io/stop-regexing-the-llms-prose/cover-stop-regexing-the-llms-prose.png" alt="Featured image of post Stop regex-ing the LLM's prose" /&gt;&lt;p&gt;Ask an LLM a question and it hands you back prose. Lovely to read, miserable to program against. You wanted the one number buried in the middle of it, and now you&amp;rsquo;re writing a regular expression to fish a word out of three well-written paragraphs that phrase themselves slightly differently every single time you run them.&lt;/p&gt;
&lt;p&gt;There&amp;rsquo;s a much better way, and it&amp;rsquo;s the difference between forever interpreting an LLM and actually building on one.&lt;/p&gt;
&lt;h2 id="the-problem-with-a-paragraph"&gt;The problem with a paragraph
&lt;/h2&gt;&lt;p&gt;You ask an LLM to analyse a log file and tell you the severity of what it found and a suggested fix. It comes back with three well-written paragraphs. Somewhere in there is the word &amp;ldquo;critical&amp;rdquo;, and somewhere is the fix.&lt;/p&gt;
&lt;p&gt;Your program now has to &lt;em&gt;extract&lt;/em&gt; those two facts from prose, and prose has no contract. The next run, the model phrases it differently. It leads with a caveat. It says &amp;ldquo;severe&amp;rdquo; where last time it said &amp;ldquo;critical&amp;rdquo;. It puts the fix first. Anything that worked by finding &amp;ldquo;critical&amp;rdquo; in the text is now quietly wrong, and you didn&amp;rsquo;t change a line. Parsing free text for structured facts is a game you lose slowly.&lt;/p&gt;
&lt;p&gt;What you actually wanted was never a paragraph. It was a value: a thing with a &lt;code&gt;severity&lt;/code&gt; field and a &lt;code&gt;fix&lt;/code&gt; field, that you can branch on and store and pass around like any other.&lt;/p&gt;
&lt;h2 id="ask-for-the-struct-not-the-prose"&gt;Ask for the struct, not the prose
&lt;/h2&gt;&lt;p&gt;go-tool-base&amp;rsquo;s &lt;code&gt;chat&lt;/code&gt; package draws the line with two methods. &lt;code&gt;Chat&lt;/code&gt; gives you text. &lt;code&gt;Ask&lt;/code&gt; gives you a struct.&lt;/p&gt;
&lt;p&gt;You define the Go type you want back:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-go" data-lang="go"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kd"&gt;type&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Analysis&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kd"&gt;struct&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Severity&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;`json:&amp;#34;severity&amp;#34;`&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Fix&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;`json:&amp;#34;fix&amp;#34;`&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kd"&gt;var&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Analysis&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Ask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;#34;Analyse this log file: &amp;#34;&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="nx"&gt;logText&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The framework generates a JSON Schema from that struct, sends it to the model as the required response format, and unmarshals the reply straight into &lt;code&gt;result&lt;/code&gt;. You never lay a finger on the prose. You get &lt;code&gt;result.Severity&lt;/code&gt; and &lt;code&gt;result.Fix&lt;/code&gt;, typed, ready to use. If you want the model&amp;rsquo;s answer to drive a &lt;code&gt;switch&lt;/code&gt; statement, this is the method that lets it.&lt;/p&gt;
&lt;h2 id="the-struct-is-the-schema-is-the-contract"&gt;The struct is the schema is the contract
&lt;/h2&gt;&lt;p&gt;The detail that makes this hold up over time: you don&amp;rsquo;t write the schema. The struct &lt;em&gt;is&lt;/em&gt; the schema.&lt;/p&gt;
&lt;p&gt;The framework derives the JSON Schema from your type. In go-tool-base that&amp;rsquo;s &lt;code&gt;GenerateSchema[T]()&lt;/code&gt;; in rust-tool-base the schema comes from your Rust type through &lt;code&gt;schemars&lt;/code&gt;. (Yes, there&amp;rsquo;s a Rust sibling now. I&amp;rsquo;ll &lt;a class="link" href="https://blog-570662.gitlab.io/rust-tool-base-the-same-idea/" &gt;introduce it properly&lt;/a&gt; in a few weeks, but it keeps gatecrashing these posts because the two frameworks deliberately share ideas.) Either way there&amp;rsquo;s one definition, your type, and the schema is just a projection of it.&lt;/p&gt;
&lt;p&gt;That matters, because otherwise two things have to agree. There&amp;rsquo;s the schema you tell the model to obey, and there&amp;rsquo;s the type you unmarshal the answer into. Hand-write the schema and those two can drift: add a field to the struct, forget to add it to the schema, and the model is never told to produce it, so it silently never appears. Deriving the schema from the type collapses the two into one. They can&amp;rsquo;t disagree, because there&amp;rsquo;s only one of them.&lt;/p&gt;
&lt;h2 id="both-frameworks-with-one-extra-step-in-rust"&gt;Both frameworks, with one extra step in Rust
&lt;/h2&gt;&lt;p&gt;go-tool-base does this with &lt;code&gt;Ask&lt;/code&gt; and a &lt;code&gt;ResponseSchema&lt;/code&gt; set on the client config. rust-tool-base does it with &lt;code&gt;chat_structured::&amp;lt;T&amp;gt;&lt;/code&gt;, where &lt;code&gt;T&lt;/code&gt; is any type that&amp;rsquo;s both deserialisable and &lt;code&gt;JsonSchema&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;rust-tool-base adds one step worth calling out. Before it deserialises the model&amp;rsquo;s reply into your &lt;code&gt;T&lt;/code&gt;, it &lt;em&gt;validates&lt;/em&gt; the raw response against the schema with a JSON Schema validator. That splits the failure into two distinct, named cases: the response didn&amp;rsquo;t match the schema, or it matched the schema but still wouldn&amp;rsquo;t deserialise. A model that returns subtly wrong JSON fails loudly and specifically, with an error that tells you which of those happened, instead of quietly handing you a zero-valued struct that you end up debugging an hour later.&lt;/p&gt;
&lt;h2 id="when-youd-reach-for-it"&gt;When you&amp;rsquo;d reach for it
&lt;/h2&gt;&lt;p&gt;The line is simple, and it&amp;rsquo;s about who reads the answer.&lt;/p&gt;
&lt;p&gt;If a &lt;em&gt;human&lt;/em&gt; reads the answer, prose is right. &lt;code&gt;Chat&lt;/code&gt;, free text, let the model write well. A summary, an explanation, an interactive reply: leave all of those as prose.&lt;/p&gt;
&lt;p&gt;If a &lt;em&gt;program&lt;/em&gt; consumes the answer, you want a value. Classification, extraction, a code review scored out of a hundred with a list of issues, a yes-or-no with reasons: anything where the next thing that happens is your code branching on the result. There, &lt;code&gt;Ask&lt;/code&gt; and &lt;code&gt;chat_structured&lt;/code&gt; turn the LLM from something you have to interpret into something that returns a value, and a typed value is a thing you can actually build on.&lt;/p&gt;
&lt;h2 id="to-sum-up"&gt;To sum up
&lt;/h2&gt;&lt;p&gt;An LLM returns prose by default, and prose has no contract, so a program that picks structured facts out of it breaks the moment the model rephrases.&lt;/p&gt;
&lt;p&gt;Structured output asks for the value instead. You define a struct, the framework derives a JSON Schema from it, the model is constrained to that shape, and you get a typed result. go-tool-base&amp;rsquo;s &lt;code&gt;Ask&lt;/code&gt; and rust-tool-base&amp;rsquo;s &lt;code&gt;chat_structured&lt;/code&gt; both work this way, with the schema derived from your type so the schema and the type can&amp;rsquo;t drift; rust-tool-base additionally validates the response against the schema before deserialising. Use it whenever the answer feeds code rather than a human. It&amp;rsquo;s one of the four methods that make up &lt;a class="link" href="https://blog-570662.gitlab.io/an-ai-interface-that-fits-on-one-screen/" &gt;go-tool-base&amp;rsquo;s small chat interface&lt;/a&gt;, and it&amp;rsquo;s the one that makes an LLM safe to program against.&lt;/p&gt;</description></item><item><title>Letting the AI call your Go functions</title><link>https://blog-570662.gitlab.io/letting-the-ai-call-your-go-functions/</link><pubDate>Sun, 29 Mar 2026 00:00:00 +0000</pubDate><guid>https://blog-570662.gitlab.io/letting-the-ai-call-your-go-functions/</guid><description>&lt;img src="https://blog-570662.gitlab.io/letting-the-ai-call-your-go-functions/cover-letting-the-ai-call-your-go-functions.png" alt="Featured image of post Letting the AI call your Go functions" /&gt;&lt;p&gt;An AI that can only produce text can &lt;em&gt;describe&lt;/em&gt; your system. An AI that can call your Go functions can actually operate it. That gap, between describing and doing, is the difference between a chatbot and something genuinely useful, and crossing it comes down to one fiddly mechanism: tool-calling, and the loop that drives it.&lt;/p&gt;
&lt;h2 id="talking-about-the-system-versus-operating-it"&gt;Talking about the system versus operating it
&lt;/h2&gt;&lt;p&gt;Wire an AI provider into a CLI command and you get something that can talk. Ask it a question, get a paragraph back. Useful, up to a point.&lt;/p&gt;
&lt;p&gt;But notice the ceiling. An AI that can only generate text can &lt;em&gt;describe&lt;/em&gt; things. It can tell you what it would do. What it can&amp;rsquo;t do is look at the actual current state of your system, or take a real action, because it has no hands. It&amp;rsquo;s reasoning in a vacuum about a world it can&amp;rsquo;t reach out and touch.&lt;/p&gt;
&lt;p&gt;The thing that gives it hands is tool-calling. You hand the AI a set of functions it&amp;rsquo;s allowed to call. Now, mid-conversation, it can decide it needs to &lt;em&gt;read that file&lt;/em&gt; before it can answer, or &lt;em&gt;run that query&lt;/em&gt;, or &lt;em&gt;check that status&lt;/em&gt;, and actually go and do it, and then reason about the real result. The AI stops describing your system and starts operating it.&lt;/p&gt;
&lt;h2 id="the-loop-is-the-hard-part"&gt;The loop is the hard part
&lt;/h2&gt;&lt;p&gt;Tool-calling has a shape, and the shape is a loop. The literature calls it ReAct: Reason, Act, Observe.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The AI &lt;strong&gt;reasons&lt;/strong&gt; about the prompt and decides whether it needs a tool.&lt;/li&gt;
&lt;li&gt;If it does, it &lt;strong&gt;acts&lt;/strong&gt;, asking for a specific tool with specific arguments.&lt;/li&gt;
&lt;li&gt;Your code runs the tool and feeds the result back. The AI &lt;strong&gt;observes&lt;/strong&gt; that result.&lt;/li&gt;
&lt;li&gt;Round again. Reason about the new information, maybe call another tool, maybe several. Keep going until the AI has what it needs and produces a final text answer with no more tool calls.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Conceptually simple. Tedious and error-prone to implement by hand every single time: parsing the model&amp;rsquo;s tool-call requests, dispatching to the right function, marshalling arguments in and results out, feeding observations back in the exact format the provider expects, knowing when to stop, and not looping forever if the model gets itself stuck.&lt;/p&gt;
&lt;p&gt;That orchestration is pure plumbing, and it&amp;rsquo;s identical for every tool and every command. So you can probably guess what&amp;rsquo;s coming: go-tool-base&amp;rsquo;s &lt;code&gt;chat&lt;/code&gt; package owns it. You don&amp;rsquo;t write the loop. You write the tools.&lt;/p&gt;
&lt;h2 id="defining-a-tool"&gt;Defining a tool
&lt;/h2&gt;&lt;p&gt;A &lt;code&gt;chat.Tool&lt;/code&gt; is four things: a name, a description, a parameter schema, and a handler. The description is what the AI reads to decide &lt;em&gt;whether&lt;/em&gt; to use the tool, so it&amp;rsquo;s worth writing well. The schema describes the arguments, and you don&amp;rsquo;t hand-write it. You write a tagged Go struct and let it generate:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-go" data-lang="go"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kd"&gt;type&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ReadFileParams&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kd"&gt;struct&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Path&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;`json:&amp;#34;path&amp;#34; jsonschema_description:&amp;#34;Relative path to the file&amp;#34;`&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The struct is the contract. The framework derives the JSON Schema the AI is given straight from those tags, so the schema and the Go type the handler receives can&amp;rsquo;t drift apart, because they share a single source. The handler is then just an ordinary Go function that takes those parameters and returns a result.&lt;/p&gt;
&lt;p&gt;You register your tools with &lt;code&gt;SetTools&lt;/code&gt;, call &lt;code&gt;Chat&lt;/code&gt;, and that&amp;rsquo;s the whole of your involvement. The framework runs the ReAct loop and &lt;code&gt;Chat&lt;/code&gt; returns the AI&amp;rsquo;s final text answer once the loop settles.&lt;/p&gt;
&lt;h2 id="two-details-that-show-it-was-built-for-real-use"&gt;Two details that show it was built for real use
&lt;/h2&gt;&lt;p&gt;A couple of decisions in the loop tell you it&amp;rsquo;s meant for production, not a demo.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Tool errors don&amp;rsquo;t abort the conversation.&lt;/strong&gt; When a handler returns an error, the framework doesn&amp;rsquo;t crash the loop. It hands the error &lt;em&gt;back to the AI as a string&lt;/em&gt;, as just another observation. That&amp;rsquo;s deliberate, and it&amp;rsquo;s right. A real agent should be able to call a tool, watch it fail, and react: try different arguments, take a different route, or tell the user it couldn&amp;rsquo;t manage it. A loop that aborted on the first tool error would be far more brittle than the model driving it.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The loop is bounded.&lt;/strong&gt; There&amp;rsquo;s a &lt;code&gt;MaxSteps&lt;/code&gt; limit, default 20. An AI that gets confused could otherwise call tools forever, and a CLI command that never returns is a worse failure than a wrong answer. The cap guarantees the command terminates. The agent gets room to genuinely work a problem across many steps, but not infinite room to flail about in.&lt;/p&gt;
&lt;p&gt;There&amp;rsquo;s also parallel tool execution: when the model asks for several tools in a single step (three independent file reads, say) the framework runs them concurrently rather than one after another, because there&amp;rsquo;s no reason to make the AI sit and wait out a sequence of things that don&amp;rsquo;t depend on each other.&lt;/p&gt;
&lt;h2 id="boiling-it-down"&gt;Boiling it down
&lt;/h2&gt;&lt;p&gt;A text-only AI can describe your system; an AI that can call your functions can operate it. Bridging that gap means tool-calling, and tool-calling means the ReAct loop (reason, act, observe, repeat) whose orchestration is fiddly, identical every time, and not a problem worth solving twice.&lt;/p&gt;
&lt;p&gt;go-tool-base&amp;rsquo;s &lt;code&gt;chat&lt;/code&gt; package runs the loop for you. You define &lt;code&gt;chat.Tool&lt;/code&gt; values (name, description, a tagged parameter struct that generates its own schema, a handler), call &lt;code&gt;SetTools&lt;/code&gt; and &lt;code&gt;Chat&lt;/code&gt;, and get the final answer. Tool errors go back to the AI as observations so it can recover, and a &lt;code&gt;MaxSteps&lt;/code&gt; cap guarantees the command always terminates. You write Go functions. The framework turns them into things an agent can reach for.&lt;/p&gt;</description></item><item><title>Nobody reads the manual</title><link>https://blog-570662.gitlab.io/nobody-reads-the-manual/</link><pubDate>Sun, 29 Mar 2026 00:00:00 +0000</pubDate><guid>https://blog-570662.gitlab.io/nobody-reads-the-manual/</guid><description>&lt;img src="https://blog-570662.gitlab.io/nobody-reads-the-manual/cover-nobody-reads-the-manual.png" alt="Featured image of post Nobody reads the manual" /&gt;&lt;p&gt;Let me describe the actual lifecycle of a user meeting your CLI tool, because it&amp;rsquo;s a bit humbling. They run it. It doesn&amp;rsquo;t quite do what they expected. They run it again with &lt;code&gt;--help&lt;/code&gt;. They get a wall of monospaced flag descriptions, skim it, don&amp;rsquo;t find the thing they wanted, and either give up or go and ask a human who already knows.&lt;/p&gt;
&lt;p&gt;Your documentation might be magnificent. It doesn&amp;rsquo;t matter, because the user never reached it.&lt;/p&gt;
&lt;h2 id="the-manual-loses-on-location-not-quality"&gt;The manual loses on location, not quality
&lt;/h2&gt;&lt;p&gt;That&amp;rsquo;s the lifecycle, and notice exactly where it breaks. The documentation might be excellent. It might answer their precise question in full. It doesn&amp;rsquo;t matter, because it&amp;rsquo;s on a website, in another window, behind a search box, and the user is &lt;em&gt;here&lt;/em&gt;, in the terminal, mid-task. The docs lost not on quality but on &lt;em&gt;location&lt;/em&gt;. They simply weren&amp;rsquo;t where the work was.&lt;/p&gt;
&lt;p&gt;go-tool-base&amp;rsquo;s answer starts with a decision about location: the documentation gets embedded into the binary itself. Your &lt;code&gt;docs/&lt;/code&gt; folder ships &lt;em&gt;inside&lt;/em&gt; the tool, the same way its default config does. Wherever the tool is installed, the docs are right there alongside it, no network, no browser. That embedding is what makes everything else possible, and there are two things built on top of it.&lt;/p&gt;
&lt;h2 id="a-browser-in-the-terminal"&gt;A browser, in the terminal
&lt;/h2&gt;&lt;p&gt;The first is the &lt;code&gt;docs&lt;/code&gt; command, and it&amp;rsquo;s not &lt;code&gt;--help&lt;/code&gt; with extra steps. It launches a proper Terminal User Interface, built on Bubble Tea.&lt;/p&gt;
&lt;p&gt;It has a sidebar, structured from the project&amp;rsquo;s own &lt;code&gt;zensical.toml&lt;/code&gt; or &lt;code&gt;mkdocs.yml&lt;/code&gt;, so the docs are a navigable tree rather than one flat scroll. Markdown renders with real formatting through Glamour (colour, tables, lists, headings) instead of collapsing into monospaced soup. There&amp;rsquo;s live search across every page, regex included.&lt;/p&gt;
&lt;p&gt;Compared with &lt;code&gt;man&lt;/code&gt; and &lt;code&gt;--help&lt;/code&gt;, the difference isn&amp;rsquo;t a nicer coat of paint. &lt;code&gt;man&lt;/code&gt; gives you linear scrolling and grep; this gives you a structured tree, rich rendering and real search. It&amp;rsquo;s the documentation experience a modern developer expects, except it followed the tool &lt;em&gt;into&lt;/em&gt; the terminal instead of demanding the user leave it.&lt;/p&gt;
&lt;h2 id="a-documentation-assistant-that-wont-make-things-up"&gt;A documentation assistant that won&amp;rsquo;t make things up
&lt;/h2&gt;&lt;p&gt;The second thing built on the embedded docs is the one I find genuinely transformative: &lt;code&gt;docs ask&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The user doesn&amp;rsquo;t navigate anything. They just ask:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;mytool docs ask &lt;span class="s2"&gt;&amp;#34;how do I point this at a self-hosted server?&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;and get a direct, specific answer. Under the hood, the framework collates the tool&amp;rsquo;s embedded markdown and hands it to the configured AI provider (Claude, OpenAI, Gemini, Claude Local, any OpenAI-compatible endpoint) as the context for the question.&lt;/p&gt;
&lt;p&gt;Now, &amp;ldquo;an AI answers questions about my tool&amp;rdquo; should immediately make you nervous, and the correct thing to be nervous about is hallucination. An AI that confidently invents a flag that doesn&amp;rsquo;t exist, or describes behaviour the tool simply doesn&amp;rsquo;t have, is worse than no assistant at all, because the user &lt;em&gt;trusts&lt;/em&gt; it.&lt;/p&gt;
&lt;p&gt;This is where embedding the docs pays off a second time, and it&amp;rsquo;s why I keep stressing that the corpus is &lt;em&gt;closed&lt;/em&gt;. The model is instructed to answer &lt;strong&gt;only&lt;/strong&gt; from the tool&amp;rsquo;s actual documentation, and the context it&amp;rsquo;s handed is exactly that documentation and nothing else. It isn&amp;rsquo;t drawing on a vague memory of similar tools from its training data. It&amp;rsquo;s answering from this tool&amp;rsquo;s real, shipped, version-matched docs. The corpus is small, closed and authoritative, which is the combination that keeps the answers honest. &amp;ldquo;Zero hallucination by design&amp;rdquo; isn&amp;rsquo;t a slogan about the model. It&amp;rsquo;s a property of bounding what the model is allowed to look at, which is the same instinct I &lt;a class="link" href="https://blog-570662.gitlab.io/your-cli-is-already-an-ai-tool/" &gt;leaned on with the &lt;code&gt;mcp&lt;/code&gt; command&lt;/a&gt;: the safety comes from the boundary you drew, not from trusting the AI to behave itself.&lt;/p&gt;
&lt;p&gt;There&amp;rsquo;s a nice second-order effect, too. The answer is always about the version of the tool the user actually has, because the docs were embedded into &lt;em&gt;that build&lt;/em&gt;. No mismatch between a website documenting the latest release and the slightly older binary sitting on the user&amp;rsquo;s machine.&lt;/p&gt;
&lt;h2 id="the-upshot"&gt;The upshot
&lt;/h2&gt;&lt;p&gt;Documentation usually loses to &lt;code&gt;--help&lt;/code&gt; not on quality but on location: it&amp;rsquo;s in a browser, and the user is in the terminal. go-tool-base embeds the docs into the binary and surfaces them two ways: a &lt;code&gt;docs&lt;/code&gt; command that&amp;rsquo;s a real TUI browser with a sidebar, rich markdown and search, and &lt;code&gt;docs ask&lt;/code&gt;, which answers natural-language questions using the embedded docs as context.&lt;/p&gt;
&lt;p&gt;Because that context is the tool&amp;rsquo;s own closed, shipped documentation and the model is told to use nothing else, the assistant stays grounded, and it&amp;rsquo;s always describing the exact version the user is holding. The fix for unread documentation was never to write more of it. It was to put it where the work happens and let it answer back.&lt;/p&gt;</description></item><item><title>An AI interface that fits on one screen</title><link>https://blog-570662.gitlab.io/an-ai-interface-that-fits-on-one-screen/</link><pubDate>Fri, 27 Mar 2026 00:00:00 +0000</pubDate><guid>https://blog-570662.gitlab.io/an-ai-interface-that-fits-on-one-screen/</guid><description>&lt;img src="https://blog-570662.gitlab.io/an-ai-interface-that-fits-on-one-screen/cover-an-ai-interface-that-fits-on-one-screen.png" alt="Featured image of post An AI interface that fits on one screen" /&gt;&lt;p&gt;The moment you decide a CLI tool should talk to an LLM, there&amp;rsquo;s a strong gravitational pull towards reaching for LangChain, or one of its many relatives. It&amp;rsquo;s the obvious move. It&amp;rsquo;s also, for most CLI work, a bit like hiring a removals firm to carry a single box up the stairs.&lt;/p&gt;
&lt;p&gt;Let me explain why go-tool-base went the other way, and what &amp;ldquo;the other way&amp;rdquo; actually looks like.&lt;/p&gt;
&lt;h2 id="the-instinct-and-why-it-overshoots"&gt;The instinct, and why it overshoots
&lt;/h2&gt;&lt;p&gt;When you add AI to a tool, the instinct is to reach for the big general-purpose framework. LangChain and its relatives are capable, and they exist for a real need: orchestrating complex multi-step AI applications, with retrieval pipelines, memory stores, chains of calls, whole fleets of agents.&lt;/p&gt;
&lt;p&gt;Now look at what a CLI tool actually needs from an LLM. It needs to send a prompt and get text back. Sometimes it wants structured data back instead of prose. Sometimes it wants to let the model call a few of the tool&amp;rsquo;s own functions. That&amp;rsquo;s pretty much the whole list.&lt;/p&gt;
&lt;p&gt;Pulling in a framework built to orchestrate retrieval and agent swarms in order to do &lt;em&gt;that&lt;/em&gt; is a poor trade. You take on a large new vocabulary of concepts, a wide dependency surface, and a great deal of abstraction you&amp;rsquo;ll never touch, all to perform three or four operations. The framework isn&amp;rsquo;t wrong. It&amp;rsquo;s just answering a far bigger question than the one a CLI tool is asking.&lt;/p&gt;
&lt;h2 id="what-go-tool-base-chose-instead"&gt;What go-tool-base chose instead
&lt;/h2&gt;&lt;p&gt;go-tool-base didn&amp;rsquo;t reach for a framework. The decision is on the record in its own design notes: before a single line was written, LangChain Go, go-openai, Vercel&amp;rsquo;s AI SDK and around ten other options were evaluated, and not one of them matched what a CLI framework actually needs. So the &lt;code&gt;chat&lt;/code&gt; package was built deliberately small.&lt;/p&gt;
&lt;p&gt;How small? The entire core &lt;code&gt;ChatClient&lt;/code&gt; interface is four methods:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-go" data-lang="go"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kd"&gt;type&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ChatClient&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kd"&gt;interface&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;Add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;Chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;Ask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;question&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;target&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;any&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;SetTools&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;tools&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="nx"&gt;Tool&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;code&gt;Add&lt;/code&gt; appends a message to the conversation. &lt;code&gt;Chat&lt;/code&gt; sends a prompt and returns text. &lt;code&gt;Ask&lt;/code&gt; sends a prompt and returns a &lt;em&gt;typed Go struct&lt;/em&gt;, the model&amp;rsquo;s answer unmarshalled straight into a value you defined. &lt;code&gt;SetTools&lt;/code&gt; hands the model a set of your own functions it&amp;rsquo;s allowed to call. That&amp;rsquo;s the whole surface. Downstream code that uses AI never holds anything larger than this, and never has to know which provider is behind it.&lt;/p&gt;
&lt;p&gt;The package&amp;rsquo;s own documentation has a word for this: right-sized. Large enough to solve genuine provider-abstraction complexity, small enough that the full interface fits on a single screen.&lt;/p&gt;
&lt;h2 id="thin-is-not-the-same-as-does-little"&gt;&amp;ldquo;Thin&amp;rdquo; is not the same as &amp;ldquo;does little&amp;rdquo;
&lt;/h2&gt;&lt;p&gt;This is the part worth being precise about, because &amp;ldquo;four methods&amp;rdquo; can sound like &amp;ldquo;barely does anything&amp;rdquo;, and that&amp;rsquo;s the wrong read entirely.&lt;/p&gt;
&lt;p&gt;Behind those four methods sits genuinely awkward work. Five providers (OpenAI, Claude, Gemini, a locally installed &lt;code&gt;claude&lt;/code&gt; binary, and any OpenAI-compatible endpoint) each with a different wire API, all normalised behind the one interface. A &lt;a class="link" href="https://blog-570662.gitlab.io/letting-the-ai-call-your-go-functions/" &gt;tool-calling loop&lt;/a&gt;. Structured output via JSON Schema, made to behave consistently across providers that each express it differently. Error normalisation. Token chunking.&lt;/p&gt;
&lt;p&gt;The point of a thin abstraction is not that there&amp;rsquo;s little underneath it. It&amp;rsquo;s that the &lt;em&gt;interface&lt;/em&gt; stays small while the &lt;em&gt;implementation&lt;/em&gt; quietly absorbs the complexity. Four methods on the surface; five provider integrations and a tool-calling loop below the waterline. The thinness is a property of what the caller sees, not of what the package does. A reach-for-LangChain decision gets that backwards: it exposes the caller to all the machinery, whether or not the caller will ever need it.&lt;/p&gt;
&lt;h2 id="the-core-stays-small-even-as-features-grow"&gt;The core stays small even as features grow
&lt;/h2&gt;&lt;p&gt;There&amp;rsquo;s a neat detail in how &lt;code&gt;chat&lt;/code&gt; keeps the interface from creeping. The package also supports streaming responses and conversation persistence, both of which are real features with real surface area. Neither of them is in the four-method core.&lt;/p&gt;
&lt;p&gt;Instead they&amp;rsquo;re &lt;em&gt;separate, optional&lt;/em&gt; interfaces. A streaming-capable client also satisfies &lt;code&gt;StreamingChatClient&lt;/code&gt;; a persistable one also satisfies &lt;code&gt;PersistentChatClient&lt;/code&gt;. Code that wants those capabilities does a type assertion to ask for them, and code that doesn&amp;rsquo;t simply never sees them. So the common path stays four methods forever. New capabilities arrive as opt-in interfaces alongside the core, not as new methods bolted onto it. The thing that fits on one screen keeps fitting on one screen.&lt;/p&gt;
&lt;h2 id="extensible-without-forking-testable-without-a-network"&gt;Extensible without forking, testable without a network
&lt;/h2&gt;&lt;p&gt;Two more properties keep the package small without making it limiting.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s extensible. The provider list isn&amp;rsquo;t closed. A &lt;code&gt;RegisterProvider&lt;/code&gt; call lets any package contribute a new provider, and &lt;code&gt;chat.New&lt;/code&gt; will route to it. You add a backend without forking &lt;code&gt;pkg/chat&lt;/code&gt; or sending a patch upstream.&lt;/p&gt;
&lt;p&gt;And it&amp;rsquo;s testable. The package ships generated mocks. A downstream tool&amp;rsquo;s AI features can be tested against a mock &lt;code&gt;ChatClient&lt;/code&gt; returning canned responses, with no network, no API key, and no flakiness. Because the interface is four methods, that mock is trivial to set up and complete by construction. A sprawling framework interface is a sprawling thing to fake; a four-method one is not. (I&amp;rsquo;ll come back to testing AI code properly &lt;a class="link" href="https://blog-570662.gitlab.io/testing-code-that-calls-an-llm/" &gt;in a later post&lt;/a&gt;, because it deserves a whole article of its own.)&lt;/p&gt;
&lt;h2 id="the-right-size"&gt;The right size
&lt;/h2&gt;&lt;p&gt;When a CLI tool needs AI, the instinct is a large framework like LangChain. For orchestrating retrieval pipelines and agent swarms, that&amp;rsquo;s exactly the right tool. For sending a prompt, getting a struct back, and letting the model call a few functions, it&amp;rsquo;s enormous overkill.&lt;/p&gt;
&lt;p&gt;go-tool-base&amp;rsquo;s &lt;code&gt;chat&lt;/code&gt; package is the deliberate alternative, chosen only after LangChain Go and a dozen others were weighed up and rejected. Its core &lt;code&gt;ChatClient&lt;/code&gt; interface is four methods. Underneath sit five normalised providers, a tool-calling loop, structured output and error handling, but the caller sees four methods and never learns which provider is active. Streaming and persistence are opt-in interfaces beside the core, not additions to it. It extends without forking and tests without a network. Right-sized: the complexity is real, but it lives under the interface rather than in it.&lt;/p&gt;</description></item><item><title>Your CLI is already an AI tool</title><link>https://blog-570662.gitlab.io/your-cli-is-already-an-ai-tool/</link><pubDate>Thu, 19 Mar 2026 00:00:00 +0000</pubDate><guid>https://blog-570662.gitlab.io/your-cli-is-already-an-ai-tool/</guid><description>&lt;img src="https://blog-570662.gitlab.io/your-cli-is-already-an-ai-tool/cover-your-cli-is-already-an-ai-tool.png" alt="Featured image of post Your CLI is already an AI tool" /&gt;&lt;p&gt;&amp;ldquo;Make it work with AI&amp;rdquo; has become one of those requests that lands on a developer&amp;rsquo;s desk with a thud and not much further detail attached. My instinct, the first time, was to brace for a big lump of integration work&amp;hellip; a bespoke adapter for this assistant, another for that one, a treadmill of little wrappers stretching off into the distance.&lt;/p&gt;
&lt;p&gt;Turns out I&amp;rsquo;d already done most of the work. So have you, if your CLI tool is any good. Let me explain what I mean.&lt;/p&gt;
&lt;h2 id="you-already-described-your-capabilities"&gt;You already described your capabilities
&lt;/h2&gt;&lt;p&gt;Stop and think for a second about what a well-built CLI tool actually is. It&amp;rsquo;s a set of named operations, each with a human-readable description, each taking a set of typed, named, documented parameters. You wrote all of that already, because a CLI without it is unusable by &lt;em&gt;people&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;Now look at what an AI assistant needs in order to call a tool. A set of named operations. A description of each, so it knows when to reach for them. A typed parameter schema for each, so it knows how to call them.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s the same list! A good CLI is already, structurally, a description of a set of capabilities. The information an AI agent needs isn&amp;rsquo;t extra work you have to go and do. It&amp;rsquo;s work you finished the moment your &lt;code&gt;--help&lt;/code&gt; output was any good.&lt;/p&gt;
&lt;p&gt;The only thing missing is a translator. Something that takes &amp;ldquo;this is a CLI&amp;rdquo; and presents it as &amp;ldquo;this is a set of tools an AI can call&amp;rdquo;.&lt;/p&gt;
&lt;h2 id="mcp-is-that-translator-and-its-a-standard"&gt;MCP is that translator, and it&amp;rsquo;s a standard
&lt;/h2&gt;&lt;p&gt;The temptation, when you want your tool to be AI-usable, is to sit down and write an integration. A little adapter for Claude Desktop. Another for Cursor. Another for whatever turns up next month. Each one a bespoke wrapper, each one a thing to maintain, and the list never stops growing because new assistants keep appearing. That&amp;rsquo;s the treadmill I was bracing for.&lt;/p&gt;
&lt;p&gt;The Model Context Protocol exists to kill that list. MCP is an open standard for how an AI model discovers and calls local tools. Implement it once and your tool works with every assistant that speaks it. Write once, not once-per-client.&lt;/p&gt;
&lt;p&gt;So go-tool-base implements it once, in the framework, for everyone. (That&amp;rsquo;s rather the theme of this whole series, if you hadn&amp;rsquo;t spotted it yet&amp;hellip; do the annoying thing once, properly, in a place where every tool inherits it.)&lt;/p&gt;
&lt;h2 id="the-mcp-command-and-the-mapping-it-does-for-free"&gt;The &lt;code&gt;mcp&lt;/code&gt; command, and the mapping it does for free
&lt;/h2&gt;&lt;p&gt;Every tool built on go-tool-base inherits a built-in &lt;code&gt;mcp&lt;/code&gt; command. Run it:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;mytool mcp
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;and the tool starts a JSON-RPC server over standard I/O, speaking MCP. That&amp;rsquo;s the whole user-facing surface. One command.&lt;/p&gt;
&lt;p&gt;Behind it, the framework walks your Cobra command tree and maps it straight onto MCP tool definitions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Each &lt;strong&gt;command&lt;/strong&gt; becomes a &lt;strong&gt;tool&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Each command&amp;rsquo;s &lt;strong&gt;short description&lt;/strong&gt; becomes the &lt;strong&gt;tool&amp;rsquo;s description&lt;/strong&gt;, the text the AI reads to decide whether this is the tool it wants.&lt;/li&gt;
&lt;li&gt;Each command&amp;rsquo;s &lt;strong&gt;flags and arguments&lt;/strong&gt; become the tool&amp;rsquo;s &lt;strong&gt;JSON Schema parameters&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;There&amp;rsquo;s no second schema to write and then keep in sync (and we all know how well &amp;ldquo;keep these two things aligned by hand&amp;rdquo; tends to go). The command tree &lt;em&gt;is&lt;/em&gt; the schema. Add a new command to your CLI and it&amp;rsquo;s a new tool for the agent, automatically, with the description and flags you already gave it. Nobody has to remember to update an MCP manifest, because there&amp;rsquo;s no separate MCP manifest to forget about.&lt;/p&gt;
&lt;h2 id="configuring-an-assistant-to-use-it"&gt;Configuring an assistant to use it
&lt;/h2&gt;&lt;p&gt;On the assistant&amp;rsquo;s side it&amp;rsquo;s just as undramatic. You tell your AI client (Claude Desktop, Cursor, anything MCP-aware) to launch &lt;code&gt;mytool mcp&lt;/code&gt;. From then on the assistant:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Starts your tool in MCP mode when it boots.&lt;/li&gt;
&lt;li&gt;Discovers every command as a callable tool.&lt;/li&gt;
&lt;li&gt;Calls the right one, with the right parameters, when a user&amp;rsquo;s request needs it.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Your CLI tool has quietly become something the AI can pick up and use, mid-conversation, on its own initiative.&lt;/p&gt;
&lt;h2 id="the-safety-property-worth-noticing"&gt;The safety property worth noticing
&lt;/h2&gt;&lt;p&gt;Now, &amp;ldquo;let an AI run things on my machine&amp;rdquo; is rightly a sentence that makes people nervous. It makes me nervous, and I built the thing. So it&amp;rsquo;s worth noticing the constraint sitting quietly in this design.&lt;/p&gt;
&lt;p&gt;The AI can only call what you defined. The tools it sees are exactly the commands in your tree, and the parameters it can pass are exactly the flags and arguments you declared, validated against the JSON Schema generated from them.&lt;/p&gt;
&lt;p&gt;It can&amp;rsquo;t invent a command. It can&amp;rsquo;t pass a parameter you never defined. The boundary of what the agent can do is the boundary of what your CLI does, and you drew that boundary already, back when you built the tool. Exposing the CLI over MCP doesn&amp;rsquo;t widen the surface one inch. It just makes the existing surface reachable. The AI isn&amp;rsquo;t running &lt;em&gt;things&lt;/em&gt;. It&amp;rsquo;s running &lt;em&gt;your commands&lt;/em&gt;, the ones you wrote, tested and shipped, and nothing else.&lt;/p&gt;
&lt;h2 id="the-gist"&gt;The gist
&lt;/h2&gt;&lt;p&gt;A CLI tool, built properly, is already a structured description of a set of capabilities: named operations, descriptions, typed parameters. Which is also exactly what an AI agent needs in order to call a tool. The gap between the two is only a translator, and writing a bespoke one per assistant is a treadmill you don&amp;rsquo;t need to step onto.&lt;/p&gt;
&lt;p&gt;go-tool-base puts the translator in the framework. Every tool gets an &lt;code&gt;mcp&lt;/code&gt; command that serves the command tree over the Model Context Protocol&amp;hellip; commands become tools, descriptions become descriptions, flags become JSON Schema parameters, with no second schema to maintain. Point any MCP-aware assistant at it and your CLI is an agent-callable tool, bounded to exactly the commands you shipped.&lt;/p&gt;
&lt;p&gt;You did the hard part when you built a good CLI. MCP just opens the door you&amp;rsquo;d already framed.&lt;/p&gt;</description></item></channel></rss>