Generate a command from a script or a sentence with go-tool-base

You’ve got a Python script that already does the job. It’s sat in a tools/ directory somewhere, it works, and every few weeks someone copies it onto a laptop that doesn’t have the right version of pandas and it falls over. You’d like it to be a proper subcommand of your tool, a real Go binary you can ship, but porting it means the cobra wiring, the options struct, a test file, and a fight with the linter before any of it lands.

Or you don’t even have the script. You’ve just got a sentence in your head: “something that pings a list of URLs and tells me which ones are slow.” The logic is five minutes of thought; the boilerplate around it is the afternoon.

gtb generate command is built for exactly that gap. Hand it a script or hand it a sentence, and it writes the Go, the tests and the docs, then sends an autonomous agent through the result to make sure the thing actually builds, passes its tests and survives golangci-lint before it ever reaches your working tree.

Two ways in, the same files out

There are two flags, and they’re mutually exclusive:

--script <file> converts an existing bash, Python or JavaScript script.
--prompt "<text>" (or a path to a file) generates from a plain-English description.

Both land in the same place. A generated command called csv-stats gives you:

pkg/cmd/csv-stats/cmd.go: the cobra registration. This one is read-only; the generator owns it and will regenerate it.
pkg/cmd/csv-stats/main.go: the implementation, where your logic lives and where you’re free to edit.
pkg/cmd/csv-stats/main_test.go: a test file.
docs/commands/csv-stats/index.md: AI-written docs for the command.

The provider and model come from your config (ai.provider) or the --provider / --model flags. Everything below was generated with Claude Opus. We’ll take each in turn.

From a script: `csv_stats.py` becomes `csv-stats`

Here’s the script I want as a native subcommand. It reads a CSV and reports, per column, the row count, how many values are empty, and min/max/mean for the numeric ones. Nothing exotic, but enough that porting it by hand is a chore. Copy it into a file called csv_stats.py if you want to follow along:

#!/usr/bin/env python3
"""Summarise a CSV file's columns.

For every column it reports the row count and how many values are empty; for
columns whose values are numeric it also reports min, max and mean. A single
column can be selected with --column.

usage: csv_stats.py [--column NAME] <file.csv>
"""
import argparse
import csv
import sys


def is_number(value):
    """True if value parses as a float."""
    try:
        float(value)
        return True
    except (TypeError, ValueError):
        return False


def summarise(path, only_column=None):
    with open(path, newline="") as handle:
        reader = csv.DictReader(handle)
        if reader.fieldnames is None:
            print("error: empty CSV", file=sys.stderr)
            return 1

        columns = list(reader.fieldnames)
        if only_column is not None:
            if only_column not in columns:
                print(f"error: no such column: {only_column}", file=sys.stderr)
                return 1
            columns = [only_column]

        counts = {c: 0 for c in columns}
        nulls = {c: 0 for c in columns}
        numbers = {c: [] for c in columns}

        for row in reader:
            for c in columns:
                value = row.get(c, "")
                counts[c] += 1
                if value is None or value.strip() == "":
                    nulls[c] += 1
                elif is_number(value):
                    numbers[c].append(float(value))

    header = f"{'column':<20}{'count':>8}{'nulls':>8}{'min':>12}{'max':>12}{'mean':>12}"
    print(header)
    print("-" * len(header))
    for c in columns:
        nums = numbers[c]
        if nums:
            cmin = f"{min(nums):.2f}"
            cmax = f"{max(nums):.2f}"
            cmean = f"{sum(nums) / len(nums):.2f}"
        else:
            cmin = cmax = cmean = "-"
        print(f"{c:<20}{counts[c]:>8}{nulls[c]:>8}{cmin:>12}{cmax:>12}{cmean:>12}")
    return 0


def main():
    parser = argparse.ArgumentParser(description="Summarise a CSV file's columns.")
    parser.add_argument("csvfile", help="path to the CSV file")
    parser.add_argument("--column", help="only summarise this column")
    args = parser.parse_args()
    return summarise(args.csvfile, args.column)


if __name__ == "__main__":
    sys.exit(main())

One command points the generator at it:

gtb generate command \
  --name csv-stats \
  --short "Summarise CSV columns" \
  --script ./csv_stats.py

What lands is not a transliteration. The Python kept everything in one function; the Go that came out is decomposed into named pieces, opens the file through the project’s injected filesystem (props.FS, an afero Fs) rather than os, and reports through the structured logger rather than print:

func summarise(fs afero.Fs, path, onlyColumn string) ([]string, error) {
	handle, err := fs.Open(path)
	if err != nil {
		return nil, errors.Wrapf(err, "failed to open CSV file %q", path)
	}
	defer func() {
		_ = handle.Close()
	}()

	reader := csv.NewReader(handle)
	reader.FieldsPerRecord = -1

	columns, indexByName, err := readColumns(reader, onlyColumn)
	if err != nil {
		return nil, err
	}

	stats := make(map[string]*columnStats, len(columns))
	for _, c := range columns {
		stats[c] = &columnStats{numbers: []float64{}}
	}

	for {
		record, readErr := reader.Read()
		if readErr != nil {
			if errors.Is(readErr, io.EOF) {
				break
			}

			return nil, errors.Wrap(readErr, "failed to read CSV record")
		}

		accumulate(stats, columns, indexByName, record)
	}

	return formatReport(columns, stats), nil
}

That decomposition, into readColumns, accumulate, formatReport, summaryValues and a couple of small formatting helpers, is the interesting part, and it didn’t come for free. The first thing the agent did after writing the code was build it, test it, and lint it. golangci-lint’s cyclop rule flagged a single fat summarise function well over its complexity ceiling of 10. So the agent read the file back, split the work into focused functions, and ran the checks again. It only stopped once the build, the tests and the linter were all clean. The tidy shape above is the agent arguing with the linter and winning, not the model’s first guess.

Then it just runs. In the demo I scaffolded the project without the init feature, so the tool reads sensible defaults and needs no config step, and csv-stats sample.csv prints real per-column counts, nulls and numeric stats (with the default features you’d run toolbox init, or pass --config, first). The full generated command, the three files and nothing else, is here: csv-stats-command.tar.gz.

From a sentence: a URL health-checker

No script this time. Just a description of the command I wish I had. --prompt takes a raw string, but a description with any detail to it is easier to read, and to keep, in a file, so I dropped it in healthcheck-prompt.txt:

Concurrently GET a list of URLs and report each one’s HTTP status and latency.
Flags:
--timeout: the per-request timeout
--file: read URLs from a file, one per line
--json: machine-readable output
Use httptest in the tests so they need no network.

The prompt describes what I want the command to do, including how the flags should behave. The flags themselves I declare up front with --flag (more on why that split matters below), and point the generator at the file:

gtb generate command \
  --name healthcheck \
  --short "Check URL health concurrently" \
  --flag "timeout:duration:per-request timeout:false:t:false:5s" \
  --flag "file:string:read URLs from a file, one per line" \
  --flag "json:bool:machine-readable output" \
  --prompt ./healthcheck-prompt.txt

And the flags feed straight in. RunHealthcheck reads the URL file from opts.File, the deadline from opts.Timeout, and the output format from opts.Json, then fans the requests out across goroutines, each writing into its own slot, exactly the way you’d write it by hand:

func RunHealthcheck(ctx context.Context, props *props.Props, opts *HealthcheckOptions, args []string) error {
	urls, err := collectURLs(props.FS, opts.File, args)
	if err != nil {
		return errors.Wrap(err, "failed to collect URLs")
	}

	if len(urls) == 0 {
		return errors.New("no URLs provided; pass URLs as arguments or via --file")
	}

	timeout := opts.Timeout
	if timeout <= 0 {
		timeout = defaultTimeout
	}

	client := &http.Client{Timeout: timeout}

	results := make([]result, len(urls))

	var wg sync.WaitGroup

	for i, u := range urls {
		wg.Add(1)
		go func(idx int, target string) {
			defer wg.Done()

			results[idx] = checkURL(ctx, client, target, timeout)
		}(i, u)
	}

	wg.Wait()

	return reportResults(props, opts.Json, results)
}

I asked for the tests to use httptest so they’d need no network, and they do. Each case spins up a local server, so go test is hermetic and the agent’s own test run during repair stays self-contained, and it wrote cases for the flags too, this one driving --json:

func TestRunHealthcheck_JSONOutput(t *testing.T) {
	srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) {
		w.WriteHeader(http.StatusNotFound)
	}))
	defer srv.Close()

	p := newTestProps()
	opts := &healthcheck.HealthcheckOptions{
		Timeout: 5 * time.Second,
		Json:    true,
	}

	err := healthcheck.RunHealthcheck(context.Background(), p, opts, []string{srv.URL})
	if err != nil {
		t.Fatalf("unexpected error: %v", err)
	}
}

Same as before, it builds and runs: point it at a few URLs and it GETs them concurrently, reporting each status and latency. The full generated command is here: healthcheck-command.tar.gz.

What “self-repair” actually means

The agent isn’t a single shot at the model with a hopeful prompt. It’s a loop with real tools: it reads the project layout, reads the files it needs, and runs go build, go test and golangci-lint. When something fails, it reads the relevant code, rewrites it, and runs the checks again. It only declares success once all three pass with nothing outstanding. The repair agent’s instructions are deliberately blunt on that last point: a clean build and passing tests don’t count as done if the linter still has something to say.

A few flags shape how it runs:

--max-steps N raises the agent’s reasoning budget. The default is plenty for a command like these two, but a genuinely hairy conversion can run long, and this stops it stopping short.
--agentless skips the agent entirely and uses the older retry loop, if you’d rather keep the generation cheap and do the polishing yourself.
--non-interactive withholds the agent’s ability to ask you a question mid-run. It defaults on when the CI environment variable is set, so the thing never blocks a pipeline waiting for an answer that isn’t coming.

Flags you declare, logic it writes

The --timeout, --file and --json arrived as real flags on the command, but not because the prompt mentioned them. Flags are the generator’s job, not the prompt’s, and that split is deliberate. You declare each one with --flag (or the interactive wizard), as I did above, and the generator wires it onto the options struct and into the read-only cmd.go registration, which hands that struct straight to your Run function. The prompt is left to describe behaviour: what --timeout should bound, what --file should read, what --json should change.

So the agent, told exactly which option fields exist, wrote its logic against opts.Timeout, opts.File and opts.Json rather than inventing anything, and the finished command’s --help lists them with the 5s default and the -t shorthand I asked for. Leave the --flags off and it still works: the generator hands the agent an empty options struct, and it keeps those values as locals with sensible defaults, ready for a flag to be wired in later.

The one thing you don’t do is hand-edit cmd.go: it’s regenerated every time you add a flag or change the command, so reach for --flag, never the file. When a generation finishes, the quickest sanity check is the command’s own --help, which shows the flags it actually exposes.

One thing to keep in mind: the model isn’t deterministic. Run the same prompt twice and you’ll get two slightly different commands. If the first one isn’t quite right, regenerate, or nudge the prompt. Treat the output the way you’d treat a capable colleague’s first PR: read it, run it, and own what you merge.

And is it the best possible code, the best design? Probably not. That depends on the model you can afford to point at it, how much detail you put in the prompt, and a bit of luck on the day. What you can count on is a working starting point: something that builds, has tests, and uses proper Go idioms and the project’s own patterns, instead of a blank file and an afternoon of boilerplate. From there it’s yours to shape.

Where that leaves you

The generator does the boilerplate and has the argument with the linter so you don’t have to. What it can’t do is decide whether the command it built is the command you actually wanted. That part is still yours, which is rather the point. The full docs for both flags live in the AI conversion guide and the command generation reference, and they’re the place to go when you want the flags the prompt didn’t.