<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Agents on PHP Boy Scout</title><link>https://blog-570662.gitlab.io/tags/agents/</link><description>Recent content in Agents on PHP Boy Scout</description><generator>Hugo -- gohugo.io</generator><language>en-gb</language><copyright>Matt Cockayne</copyright><lastBuildDate>Thu, 02 Apr 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://blog-570662.gitlab.io/tags/agents/index.xml" rel="self" type="application/rss+xml"/><item><title>An AI agent that has to make the build pass</title><link>https://blog-570662.gitlab.io/an-ai-agent-that-has-to-make-the-build-pass/</link><pubDate>Thu, 02 Apr 2026 00:00:00 +0000</pubDate><guid>https://blog-570662.gitlab.io/an-ai-agent-that-has-to-make-the-build-pass/</guid><description>&lt;img src="https://blog-570662.gitlab.io/an-ai-agent-that-has-to-make-the-build-pass/cover-an-ai-agent-that-has-to-make-the-build-pass.png" alt="Featured image of post An AI agent that has to make the build pass" /&gt;&lt;p&gt;Most AI code generation works on a charming little principle I&amp;rsquo;ll call generate-and-hope. The model writes the code, the model stops at the closing brace, and whether the thing actually compiles is left as an exercise for you. For a snippet you paste into an editor, fine. For a whole generated command, that&amp;rsquo;s just outsourcing the disappointment.&lt;/p&gt;
&lt;p&gt;go-tool-base does something I&amp;rsquo;m rather happier with: the AI has to make the build pass before it&amp;rsquo;s allowed to claim it&amp;rsquo;s done.&lt;/p&gt;
&lt;h2 id="generate-and-hope"&gt;Generate and hope
&lt;/h2&gt;&lt;p&gt;The usual shape of AI code generation is this. You ask for code, the model produces it, and the model&amp;rsquo;s job ends at the closing brace. Whether it compiles, whether the tests pass, whether the imports even resolve, none of that has been checked. The model produced something that &lt;em&gt;looks&lt;/em&gt; right. You find out whether it &lt;em&gt;is&lt;/em&gt; right when you build it.&lt;/p&gt;
&lt;p&gt;For a snippet you paste into an editor, that&amp;rsquo;s perfectly fine. The compiler tells you in a second. But go-tool-base&amp;rsquo;s generator, driven by &lt;code&gt;gtb generate command --script&lt;/code&gt; or &lt;code&gt;--prompt&lt;/code&gt;, produces a whole command: the implementation, its tests, the lot. &amp;ldquo;Generate and hope&amp;rdquo; at that scale means handing the user a project that may or may not build, and quietly making them the one who finds out which.&lt;/p&gt;
&lt;h2 id="drafting-is-only-step-one"&gt;Drafting is only step one
&lt;/h2&gt;&lt;p&gt;So the generator doesn&amp;rsquo;t stop at drafting. Writing the first version of the implementation and its tests is step one of two. Step two is an autonomous repair agent.&lt;/p&gt;
&lt;p&gt;Once the draft is on the filesystem, a separate agent takes over. It&amp;rsquo;s an LLM running in a loop, but a loop aimed at one narrow, checkable job: make this project build and pass its tests. It isn&amp;rsquo;t asked to be creative. It&amp;rsquo;s asked to get to green.&lt;/p&gt;
&lt;h2 id="a-fixed-set-of-tools-and-no-shell"&gt;A fixed set of tools, and no shell
&lt;/h2&gt;&lt;p&gt;The agent is not handed a shell. It&amp;rsquo;s given a fixed, defined set of tools and nothing else. Three of them let it explore and edit the project: &lt;code&gt;list_dir&lt;/code&gt;, &lt;code&gt;read_file&lt;/code&gt;, &lt;code&gt;write_file&lt;/code&gt;. Four of them let it verify the project:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;go_build&lt;/code&gt; runs the build and captures the compiler errors.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;go_test&lt;/code&gt; runs the tests and captures the failures.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;go_get&lt;/code&gt; resolves a missing dependency.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;golangci_lint&lt;/code&gt; runs the project&amp;rsquo;s linter.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That restriction is the design, not a limitation of it. The agent can&amp;rsquo;t delete arbitrary files, can&amp;rsquo;t reach the network, can&amp;rsquo;t run anything that isn&amp;rsquo;t on the list. It has exactly what it needs to make code compile and nothing it would need to do damage. Its file writes are confined to the project directory by an explicit path check, so even &lt;code&gt;write_file&lt;/code&gt; can&amp;rsquo;t go wandering up into &lt;code&gt;/etc&lt;/code&gt;. A coding agent you&amp;rsquo;d actually let near a filesystem is one whose abilities are an allowlist, not a denylist. (I keep coming back to that principle through this series&amp;hellip; safety as a boundary you draw, not a behaviour you hope for.)&lt;/p&gt;
&lt;h2 id="the-loop"&gt;The loop
&lt;/h2&gt;&lt;p&gt;The repair loop is a ReAct loop, the same reason-act-observe shape as &lt;a class="link" href="https://blog-570662.gitlab.io/letting-the-ai-call-your-go-functions/" &gt;the tool-calling loop&lt;/a&gt;, only this time pointed at a goal:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The draft is on disk.&lt;/li&gt;
&lt;li&gt;Verify: run &lt;code&gt;go_build&lt;/code&gt; and &lt;code&gt;go_test&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;If verification failed, read the error logs, the compiler error or the failing test.&lt;/li&gt;
&lt;li&gt;Reason about the cause: an undefined variable, a missing import, a wrong signature.&lt;/li&gt;
&lt;li&gt;Act: call &lt;code&gt;write_file&lt;/code&gt; to patch the code, or &lt;code&gt;go_get&lt;/code&gt; to add the dependency.&lt;/li&gt;
&lt;li&gt;Loop. Steps two to five repeat until the project is green, or the agent hits its step limit, which defaults to 15.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;What makes this work is treating the error output as &lt;em&gt;feedback&lt;/em&gt; rather than as a failure to log and walk away from. A compiler error is the single most useful sentence you can hand a model that&amp;rsquo;s trying to fix code. It says what&amp;rsquo;s wrong, and usually where. The loop feeds it straight back in, and the model fixes against it.&lt;/p&gt;
&lt;h2 id="verification-changes-what-done-means"&gt;Verification changes what &amp;ldquo;done&amp;rdquo; means
&lt;/h2&gt;&lt;p&gt;Here&amp;rsquo;s the real shift, and the agent&amp;rsquo;s own documentation puts it well: the agent &amp;ldquo;doesn&amp;rsquo;t just say it fixed a bug; it uses a Test tool to verify the fix before reporting success.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;A generate-and-hope model reports success when it finishes &lt;em&gt;writing&lt;/em&gt;. It has no idea whether the code works, and it isn&amp;rsquo;t really claiming otherwise. &amp;ldquo;Done&amp;rdquo; means &amp;ldquo;I produced text&amp;rdquo;. The repair agent reports success when &lt;code&gt;go_build&lt;/code&gt; and &lt;code&gt;go_test&lt;/code&gt; actually &lt;em&gt;pass&lt;/em&gt;. &amp;ldquo;Done&amp;rdquo; means &amp;ldquo;the build is green&amp;rdquo;. Those are two completely different claims, and only the second is worth anything to the person who asked for the command.&lt;/p&gt;
&lt;p&gt;That&amp;rsquo;s the line between an AI that&amp;rsquo;s a creative writer and an AI that&amp;rsquo;s a collaborator you can hand a task to. And when the agent can&amp;rsquo;t reach green, when it spends its whole step budget and the project is still broken, the generator fails safely: it leaves the best-attempt code in place, commented out so the project still compiles, and tells the user what to finish by hand. There&amp;rsquo;s also an &lt;code&gt;--agentless&lt;/code&gt; flag for anyone who&amp;rsquo;d rather have a plain single-shot retry than the multi-step agent. The default, though, is the agent, because the default should be code that&amp;rsquo;s been checked.&lt;/p&gt;
&lt;h2 id="where-this-leaves-us"&gt;Where this leaves us
&lt;/h2&gt;&lt;p&gt;Most AI code generation generates and hopes: the model writes code and the user discovers whether it works. For a whole generated command, that pushes a may-or-may-not-build project onto the user.&lt;/p&gt;
&lt;p&gt;go-tool-base&amp;rsquo;s generator drafts the command and then hands it to an autonomous repair agent. The agent has a fixed set of tools (explore and edit the project, build it, test it, lint it, fetch dependencies) and no shell at all, with file writes confined to the project directory. It runs a ReAct loop, reading each error and patching against it, until the build is green or it exhausts its steps. The point is what &amp;ldquo;done&amp;rdquo; comes to mean: not &amp;ldquo;the model finished writing&amp;rdquo;, but &amp;ldquo;the build passes&amp;rdquo;. Only one of those is a claim worth trusting.&lt;/p&gt;</description></item><item><title>Letting the AI call your Go functions</title><link>https://blog-570662.gitlab.io/letting-the-ai-call-your-go-functions/</link><pubDate>Sun, 29 Mar 2026 00:00:00 +0000</pubDate><guid>https://blog-570662.gitlab.io/letting-the-ai-call-your-go-functions/</guid><description>&lt;img src="https://blog-570662.gitlab.io/letting-the-ai-call-your-go-functions/cover-letting-the-ai-call-your-go-functions.png" alt="Featured image of post Letting the AI call your Go functions" /&gt;&lt;p&gt;An AI that can only produce text can &lt;em&gt;describe&lt;/em&gt; your system. An AI that can call your Go functions can actually operate it. That gap, between describing and doing, is the difference between a chatbot and something genuinely useful, and crossing it comes down to one fiddly mechanism: tool-calling, and the loop that drives it.&lt;/p&gt;
&lt;h2 id="talking-about-the-system-versus-operating-it"&gt;Talking about the system versus operating it
&lt;/h2&gt;&lt;p&gt;Wire an AI provider into a CLI command and you get something that can talk. Ask it a question, get a paragraph back. Useful, up to a point.&lt;/p&gt;
&lt;p&gt;But notice the ceiling. An AI that can only generate text can &lt;em&gt;describe&lt;/em&gt; things. It can tell you what it would do. What it can&amp;rsquo;t do is look at the actual current state of your system, or take a real action, because it has no hands. It&amp;rsquo;s reasoning in a vacuum about a world it can&amp;rsquo;t reach out and touch.&lt;/p&gt;
&lt;p&gt;The thing that gives it hands is tool-calling. You hand the AI a set of functions it&amp;rsquo;s allowed to call. Now, mid-conversation, it can decide it needs to &lt;em&gt;read that file&lt;/em&gt; before it can answer, or &lt;em&gt;run that query&lt;/em&gt;, or &lt;em&gt;check that status&lt;/em&gt;, and actually go and do it, and then reason about the real result. The AI stops describing your system and starts operating it.&lt;/p&gt;
&lt;h2 id="the-loop-is-the-hard-part"&gt;The loop is the hard part
&lt;/h2&gt;&lt;p&gt;Tool-calling has a shape, and the shape is a loop. The literature calls it ReAct: Reason, Act, Observe.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The AI &lt;strong&gt;reasons&lt;/strong&gt; about the prompt and decides whether it needs a tool.&lt;/li&gt;
&lt;li&gt;If it does, it &lt;strong&gt;acts&lt;/strong&gt;, asking for a specific tool with specific arguments.&lt;/li&gt;
&lt;li&gt;Your code runs the tool and feeds the result back. The AI &lt;strong&gt;observes&lt;/strong&gt; that result.&lt;/li&gt;
&lt;li&gt;Round again. Reason about the new information, maybe call another tool, maybe several. Keep going until the AI has what it needs and produces a final text answer with no more tool calls.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Conceptually simple. Tedious and error-prone to implement by hand every single time: parsing the model&amp;rsquo;s tool-call requests, dispatching to the right function, marshalling arguments in and results out, feeding observations back in the exact format the provider expects, knowing when to stop, and not looping forever if the model gets itself stuck.&lt;/p&gt;
&lt;p&gt;That orchestration is pure plumbing, and it&amp;rsquo;s identical for every tool and every command. So you can probably guess what&amp;rsquo;s coming: go-tool-base&amp;rsquo;s &lt;code&gt;chat&lt;/code&gt; package owns it. You don&amp;rsquo;t write the loop. You write the tools.&lt;/p&gt;
&lt;h2 id="defining-a-tool"&gt;Defining a tool
&lt;/h2&gt;&lt;p&gt;A &lt;code&gt;chat.Tool&lt;/code&gt; is four things: a name, a description, a parameter schema, and a handler. The description is what the AI reads to decide &lt;em&gt;whether&lt;/em&gt; to use the tool, so it&amp;rsquo;s worth writing well. The schema describes the arguments, and you don&amp;rsquo;t hand-write it. You write a tagged Go struct and let it generate:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-go" data-lang="go"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kd"&gt;type&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ReadFileParams&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kd"&gt;struct&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Path&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;`json:&amp;#34;path&amp;#34; jsonschema_description:&amp;#34;Relative path to the file&amp;#34;`&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The struct is the contract. The framework derives the JSON Schema the AI is given straight from those tags, so the schema and the Go type the handler receives can&amp;rsquo;t drift apart, because they share a single source. The handler is then just an ordinary Go function that takes those parameters and returns a result.&lt;/p&gt;
&lt;p&gt;You register your tools with &lt;code&gt;SetTools&lt;/code&gt;, call &lt;code&gt;Chat&lt;/code&gt;, and that&amp;rsquo;s the whole of your involvement. The framework runs the ReAct loop and &lt;code&gt;Chat&lt;/code&gt; returns the AI&amp;rsquo;s final text answer once the loop settles.&lt;/p&gt;
&lt;h2 id="two-details-that-show-it-was-built-for-real-use"&gt;Two details that show it was built for real use
&lt;/h2&gt;&lt;p&gt;A couple of decisions in the loop tell you it&amp;rsquo;s meant for production, not a demo.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Tool errors don&amp;rsquo;t abort the conversation.&lt;/strong&gt; When a handler returns an error, the framework doesn&amp;rsquo;t crash the loop. It hands the error &lt;em&gt;back to the AI as a string&lt;/em&gt;, as just another observation. That&amp;rsquo;s deliberate, and it&amp;rsquo;s right. A real agent should be able to call a tool, watch it fail, and react: try different arguments, take a different route, or tell the user it couldn&amp;rsquo;t manage it. A loop that aborted on the first tool error would be far more brittle than the model driving it.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The loop is bounded.&lt;/strong&gt; There&amp;rsquo;s a &lt;code&gt;MaxSteps&lt;/code&gt; limit, default 20. An AI that gets confused could otherwise call tools forever, and a CLI command that never returns is a worse failure than a wrong answer. The cap guarantees the command terminates. The agent gets room to genuinely work a problem across many steps, but not infinite room to flail about in.&lt;/p&gt;
&lt;p&gt;There&amp;rsquo;s also parallel tool execution: when the model asks for several tools in a single step (three independent file reads, say) the framework runs them concurrently rather than one after another, because there&amp;rsquo;s no reason to make the AI sit and wait out a sequence of things that don&amp;rsquo;t depend on each other.&lt;/p&gt;
&lt;h2 id="boiling-it-down"&gt;Boiling it down
&lt;/h2&gt;&lt;p&gt;A text-only AI can describe your system; an AI that can call your functions can operate it. Bridging that gap means tool-calling, and tool-calling means the ReAct loop (reason, act, observe, repeat) whose orchestration is fiddly, identical every time, and not a problem worth solving twice.&lt;/p&gt;
&lt;p&gt;go-tool-base&amp;rsquo;s &lt;code&gt;chat&lt;/code&gt; package runs the loop for you. You define &lt;code&gt;chat.Tool&lt;/code&gt; values (name, description, a tagged parameter struct that generates its own schema, a handler), call &lt;code&gt;SetTools&lt;/code&gt; and &lt;code&gt;Chat&lt;/code&gt;, and get the final answer. Tool errors go back to the AI as observations so it can recover, and a &lt;code&gt;MaxSteps&lt;/code&gt; cap guarantees the command always terminates. You write Go functions. The framework turns them into things an agent can reach for.&lt;/p&gt;</description></item></channel></rss>