You’ve got a Python script that already does the job. It’s sat in a tools/
directory somewhere, it works, and every few weeks someone copies it onto a
laptop that doesn’t have the right version of pandas and it falls over. You’d
like it to be a proper subcommand of your tool, a real Go binary you can ship,
but porting it means the cobra wiring, the options struct, a test file, and a
fight with the linter before any of it lands.
Or you don’t even have the script. You’ve just got a sentence in your head: “something that pings a list of URLs and tells me which ones are slow.” The logic is five minutes of thought; the boilerplate around it is the afternoon.
gtb generate command is built for exactly that gap. Hand it a script or hand
it a sentence, and it writes the Go, the tests and the docs, then sends an
autonomous agent through the result to make sure the thing actually builds,
passes its tests and survives golangci-lint before it ever reaches your
working tree.
Two ways in, the same files out
There are two flags, and they’re mutually exclusive:
--script <file>converts an existing bash, Python or JavaScript script.--prompt "<text>"(or a path to a file) generates from a plain-English description.
Both land in the same place. A generated command called csv-stats gives you:
pkg/cmd/csv-stats/cmd.go: the cobra registration. This one is read-only; the generator owns it and will regenerate it.pkg/cmd/csv-stats/main.go: the implementation, where your logic lives and where you’re free to edit.pkg/cmd/csv-stats/main_test.go: a test file.docs/commands/csv-stats/index.md: AI-written docs for the command.
The provider and model come from your config (ai.provider) or the
--provider / --model flags. Everything below was generated with Claude
Opus. We’ll take each in turn.
From a script: csv_stats.py becomes csv-stats
Here’s the script I want as a native subcommand. It reads a CSV and reports,
per column, the row count, how many values are empty, and min/max/mean for the
numeric ones. Nothing exotic, but enough that porting it by hand is a chore.
Copy it into a file called csv_stats.py if you want to follow along:
#!/usr/bin/env python3
"""Summarise a CSV file's columns.
For every column it reports the row count and how many values are empty; for
columns whose values are numeric it also reports min, max and mean. A single
column can be selected with --column.
usage: csv_stats.py [--column NAME] <file.csv>
"""
import argparse
import csv
import sys
def is_number(value):
"""True if value parses as a float."""
try:
float(value)
return True
except (TypeError, ValueError):
return False
def summarise(path, only_column=None):
with open(path, newline="") as handle:
reader = csv.DictReader(handle)
if reader.fieldnames is None:
print("error: empty CSV", file=sys.stderr)
return 1
columns = list(reader.fieldnames)
if only_column is not None:
if only_column not in columns:
print(f"error: no such column: {only_column}", file=sys.stderr)
return 1
columns = [only_column]
counts = {c: 0 for c in columns}
nulls = {c: 0 for c in columns}
numbers = {c: [] for c in columns}
for row in reader:
for c in columns:
value = row.get(c, "")
counts[c] += 1
if value is None or value.strip() == "":
nulls[c] += 1
elif is_number(value):
numbers[c].append(float(value))
header = f"{'column':<20}{'count':>8}{'nulls':>8}{'min':>12}{'max':>12}{'mean':>12}"
print(header)
print("-" * len(header))
for c in columns:
nums = numbers[c]
if nums:
cmin = f"{min(nums):.2f}"
cmax = f"{max(nums):.2f}"
cmean = f"{sum(nums) / len(nums):.2f}"
else:
cmin = cmax = cmean = "-"
print(f"{c:<20}{counts[c]:>8}{nulls[c]:>8}{cmin:>12}{cmax:>12}{cmean:>12}")
return 0
def main():
parser = argparse.ArgumentParser(description="Summarise a CSV file's columns.")
parser.add_argument("csvfile", help="path to the CSV file")
parser.add_argument("--column", help="only summarise this column")
args = parser.parse_args()
return summarise(args.csvfile, args.column)
if __name__ == "__main__":
sys.exit(main())
One command points the generator at it:
gtb generate command \
--name csv-stats \
--short "Summarise CSV columns" \
--script ./csv_stats.py
What lands is not a transliteration. The Python kept everything in one function;
the Go that came out is decomposed into named pieces, opens the file through the
project’s injected filesystem (props.FS, an afero Fs) rather than os, and
reports through the structured logger rather than print:
func summarise(fs afero.Fs, path, onlyColumn string) ([]string, error) {
handle, err := fs.Open(path)
if err != nil {
return nil, errors.Wrapf(err, "failed to open CSV file %q", path)
}
defer func() {
_ = handle.Close()
}()
reader := csv.NewReader(handle)
reader.FieldsPerRecord = -1
columns, indexByName, err := readColumns(reader, onlyColumn)
if err != nil {
return nil, err
}
stats := make(map[string]*columnStats, len(columns))
for _, c := range columns {
stats[c] = &columnStats{numbers: []float64{}}
}
for {
record, readErr := reader.Read()
if readErr != nil {
if errors.Is(readErr, io.EOF) {
break
}
return nil, errors.Wrap(readErr, "failed to read CSV record")
}
accumulate(stats, columns, indexByName, record)
}
return formatReport(columns, stats), nil
}
That decomposition, into readColumns, accumulate, formatReport,
summaryValues and a couple of small formatting helpers, is the interesting
part, and it didn’t come for free. The first thing the agent did after writing the code was build it, test
it, and lint it. golangci-lint’s cyclop rule flagged a single fat
summarise function well over its complexity ceiling of 10. So the agent read
the file back, split the work into focused functions, and ran the checks again.
It only stopped once the build, the tests and the linter were all clean. The
tidy shape above is the agent arguing with the linter and winning, not the
model’s first guess.
Then it just runs. In the demo I scaffolded the project without the init
feature, so the tool reads sensible defaults and needs no config step, and
csv-stats sample.csv prints real per-column counts, nulls and numeric stats
(with the default features you’d run toolbox init, or pass --config, first).
The full generated command, the three files and nothing else, is here:
csv-stats-command.tar.gz.
From a sentence: a URL health-checker
No script this time. Just a description of the command I wish I had. --prompt
takes a raw string, but a description with any detail to it is easier to read,
and to keep, in a file, so I dropped it in healthcheck-prompt.txt:
Concurrently GET a list of URLs and report each one’s HTTP status and latency.
Flags:
--timeout: the per-request timeout--file: read URLs from a file, one per line--json: machine-readable outputUse httptest in the tests so they need no network.
The prompt describes what I want the command to do, including how the flags
should behave. The flags themselves I declare up front with --flag (more on why
that split matters below), and point the generator at the file:
gtb generate command \
--name healthcheck \
--short "Check URL health concurrently" \
--flag "timeout:duration:per-request timeout:false:t:false:5s" \
--flag "file:string:read URLs from a file, one per line" \
--flag "json:bool:machine-readable output" \
--prompt ./healthcheck-prompt.txt
And the flags feed straight in. RunHealthcheck reads the URL file from
opts.File, the deadline from opts.Timeout, and the output format from
opts.Json, then fans the requests out across goroutines, each writing into its
own slot, exactly the way you’d write it by hand:
func RunHealthcheck(ctx context.Context, props *props.Props, opts *HealthcheckOptions, args []string) error {
urls, err := collectURLs(props.FS, opts.File, args)
if err != nil {
return errors.Wrap(err, "failed to collect URLs")
}
if len(urls) == 0 {
return errors.New("no URLs provided; pass URLs as arguments or via --file")
}
timeout := opts.Timeout
if timeout <= 0 {
timeout = defaultTimeout
}
client := &http.Client{Timeout: timeout}
results := make([]result, len(urls))
var wg sync.WaitGroup
for i, u := range urls {
wg.Add(1)
go func(idx int, target string) {
defer wg.Done()
results[idx] = checkURL(ctx, client, target, timeout)
}(i, u)
}
wg.Wait()
return reportResults(props, opts.Json, results)
}
I asked for the tests to use httptest so they’d need no network, and they do.
Each case spins up a local server, so go test is hermetic and the agent’s own
test run during repair stays self-contained, and it wrote cases for the flags
too, this one driving --json:
func TestRunHealthcheck_JSONOutput(t *testing.T) {
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) {
w.WriteHeader(http.StatusNotFound)
}))
defer srv.Close()
p := newTestProps()
opts := &healthcheck.HealthcheckOptions{
Timeout: 5 * time.Second,
Json: true,
}
err := healthcheck.RunHealthcheck(context.Background(), p, opts, []string{srv.URL})
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
}
Same as before, it builds and runs: point it at a few URLs and it GETs them concurrently, reporting each status and latency. The full generated command is here: healthcheck-command.tar.gz.
What “self-repair” actually means
The agent isn’t a single shot at the model with a hopeful prompt. It’s a loop
with real tools: it reads the project layout, reads the files it needs, and runs
go build, go test and golangci-lint. When something fails, it reads the
relevant code, rewrites it, and runs the checks again. It only declares success
once all three pass with nothing outstanding. The
repair agent’s instructions
are deliberately blunt on that last point: a clean build and passing tests don’t
count as done if the linter still has something to say.
A few flags shape how it runs:
--max-steps Nraises the agent’s reasoning budget. The default is plenty for a command like these two, but a genuinely hairy conversion can run long, and this stops it stopping short.--agentlessskips the agent entirely and uses the older retry loop, if you’d rather keep the generation cheap and do the polishing yourself.--non-interactivewithholds the agent’s ability to ask you a question mid-run. It defaults on when theCIenvironment variable is set, so the thing never blocks a pipeline waiting for an answer that isn’t coming.
Flags you declare, logic it writes
The --timeout, --file and --json arrived as real flags on the command, but
not because the prompt mentioned them. Flags are the generator’s job, not the
prompt’s, and that split is deliberate. You declare each one with --flag (or the
interactive wizard), as I did above, and the generator wires it onto the options
struct and into the read-only cmd.go registration, which hands that struct
straight to your Run function. The prompt is left to describe behaviour: what
--timeout should bound, what --file should read, what --json should change.
So the agent, told exactly which option fields exist, wrote its logic against
opts.Timeout, opts.File and opts.Json rather than inventing anything, and
the finished command’s --help lists them with the 5s default and the -t
shorthand I asked for. Leave the --flags off and it still works: the generator
hands the agent an empty options struct, and it keeps those values as locals with
sensible defaults, ready for a flag to be wired in later.
The one thing you don’t do is hand-edit cmd.go: it’s regenerated every time you
add a flag or change the command, so reach for --flag, never the file. When a
generation finishes, the quickest sanity check is the command’s own --help,
which shows the flags it actually exposes.
One thing to keep in mind: the model isn’t deterministic. Run the same prompt twice and you’ll get two slightly different commands. If the first one isn’t quite right, regenerate, or nudge the prompt. Treat the output the way you’d treat a capable colleague’s first PR: read it, run it, and own what you merge.
And is it the best possible code, the best design? Probably not. That depends on the model you can afford to point at it, how much detail you put in the prompt, and a bit of luck on the day. What you can count on is a working starting point: something that builds, has tests, and uses proper Go idioms and the project’s own patterns, instead of a blank file and an afternoon of boilerplate. From there it’s yours to shape.
Where that leaves you
The generator does the boilerplate and has the argument with the linter so you don’t have to. What it can’t do is decide whether the command it built is the command you actually wanted. That part is still yours, which is rather the point. The full docs for both flags live in the AI conversion guide and the command generation reference, and they’re the place to go when you want the flags the prompt didn’t.
