<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>CI/CD on PHP Boy Scout</title><link>https://blog-570662.gitlab.io/tags/ci-cd/</link><description>Recent content in CI/CD on PHP Boy Scout</description><generator>Hugo -- gohugo.io</generator><language>en-gb</language><copyright>Matt Cockayne</copyright><lastBuildDate>Wed, 20 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://blog-570662.gitlab.io/tags/ci-cd/index.xml" rel="self" type="application/rss+xml"/><item><title>Two bugs that taught me the rules</title><link>https://blog-570662.gitlab.io/two-bugs-that-taught-me-the-rules/</link><pubDate>Wed, 20 May 2026 00:00:00 +0000</pubDate><guid>https://blog-570662.gitlab.io/two-bugs-that-taught-me-the-rules/</guid><description>&lt;img src="https://blog-570662.gitlab.io/two-bugs-that-taught-me-the-rules/cover-two-bugs-that-taught-me-the-rules.png" alt="Featured image of post Two bugs that taught me the rules" /&gt;&lt;p&gt;Some bugs are interesting because they&amp;rsquo;re subtle. These two were interesting because they were the exact opposite&amp;hellip; in each case the tool had a hard rule I simply didn&amp;rsquo;t know about, and its error message couldn&amp;rsquo;t be bothered to tell me what that rule was. Both came out of building the infrastructure toolchain, both cost me a good deal more time than they had any right to, and both are the sort of thing that looks blindingly obvious the moment you know it and utterly baffling until you do.&lt;/p&gt;
&lt;p&gt;So here they are, written down, partly to save you the bother and partly so I don&amp;rsquo;t go and forget them myself.&lt;/p&gt;
&lt;h2 id="bug-one-the-rule-less-job-that-skips-your-merge-requests"&gt;Bug one: the rule-less job that skips your merge requests
&lt;/h2&gt;&lt;p&gt;The &lt;code&gt;cicd&lt;/code&gt; gate components, in their first cut, shipped with no &lt;code&gt;rules:&lt;/code&gt; block. They were dead simple jobs: lint, scan, validate. No conditions, because they should just always run. Obviously.&lt;/p&gt;
&lt;p&gt;They ran on branch pipelines. On merge requests, they didn&amp;rsquo;t run at all! The gates that were the entire point of the components were simply absent from the one place you&amp;rsquo;d most want to see them&amp;hellip; the merge request.&lt;/p&gt;
&lt;p&gt;The cause is a GitLab CI rule that&amp;rsquo;s remarkably easy to go years without ever learning: a job with no &lt;code&gt;rules:&lt;/code&gt; block runs only on branch and tag pipelines. It does not run on merge-request pipelines. So &amp;ldquo;no conditions&amp;rdquo; doesn&amp;rsquo;t mean &amp;ldquo;runs everywhere&amp;rdquo; at all. It means &amp;ldquo;runs everywhere except a merge request&amp;rdquo;, which is about the least intuitive default I can think of.&lt;/p&gt;
&lt;p&gt;The fix is faintly absurd, and that&amp;rsquo;s exactly what makes it stick. You add an &lt;em&gt;unconditional&lt;/em&gt; rule: &lt;code&gt;rules: [{ when: on_success }]&lt;/code&gt;. The content of that rule does precisely nothing. It always matches. What actually matters is that the job now &lt;em&gt;has&lt;/em&gt; a &lt;code&gt;rules:&lt;/code&gt; block at all, because merely having one is what makes a job eligible for merge-request pipelines. A rule whose content is meaningless, added solely so the block exists. That&amp;rsquo;s the fix. I&amp;rsquo;ll admit I stared at it for a moment.&lt;/p&gt;
&lt;h2 id="bug-two-the-import-block-that-only-works-at-the-root"&gt;Bug two: the import block that only works at the root
&lt;/h2&gt;&lt;p&gt;The second one came from &lt;code&gt;terraform-aws-security-baseline&lt;/code&gt;. The &lt;code&gt;account-hardening&lt;/code&gt; module needed to adopt a resource that already existed in the account, which is exactly what OpenTofu&amp;rsquo;s &lt;code&gt;import {}&lt;/code&gt; block is for. So an &lt;code&gt;import&lt;/code&gt; block went into the &lt;code&gt;account-hardening&lt;/code&gt; module, right next to the resource it was adopting. The natural home for it, surely.&lt;/p&gt;
&lt;p&gt;OpenTofu disagreed, and rejected it outright. The rule: an &lt;code&gt;import&lt;/code&gt; block is only allowed in the &lt;em&gt;root&lt;/em&gt; module. It can&amp;rsquo;t live inside a child module. A module that wants one of its own resources imported can&amp;rsquo;t declare that import itself&amp;hellip; the import has to be declared up at the root, and the root caller does the adopting.&lt;/p&gt;
&lt;p&gt;The fix was to take the &lt;code&gt;import&lt;/code&gt; block out of the module and document caller-side adoption instead. The module describes the resource, and the root configuration that calls the module is where the &lt;code&gt;import&lt;/code&gt; actually lives.&lt;/p&gt;
&lt;h2 id="the-shape-they-share"&gt;The shape they share
&lt;/h2&gt;&lt;p&gt;Two unrelated bugs, in two completely different tools, and the same shape sitting underneath both of them.&lt;/p&gt;
&lt;p&gt;In each case the tool has a hard structural rule. Where a block is allowed to live. What makes a job eligible for a particular kind of pipeline. And in each case the error told me the tool was unhappy without telling me &lt;em&gt;which&lt;/em&gt; rule I&amp;rsquo;d broken, so the obvious next move (debugging my own logic) was the wrong move entirely. There was nothing wrong with the logic. The thing was simply in a place the tool doesn&amp;rsquo;t allow, or missing a block the tool quietly insists on.&lt;/p&gt;
&lt;p&gt;The lasting lesson here isn&amp;rsquo;t the two specific rules, useful as they are to know. It&amp;rsquo;s the reflex. When something that should obviously work just doesn&amp;rsquo;t, and the error is unhelpful, stop debugging your logic and start suspecting a structural rule about &lt;em&gt;where&lt;/em&gt; something is allowed to be, or &lt;em&gt;whether&lt;/em&gt; a thing is eligible in the first place. GitLab CI and OpenTofu both have a handful of these, and you mostly learn them the hard way, by tripping over them. Knowing the shape of the category at least means the next one costs you an hour instead of a whole afternoon.&lt;/p&gt;
&lt;h2 id="worth-remembering"&gt;Worth remembering
&lt;/h2&gt;&lt;p&gt;Two bugs from building the toolchain, one shape. A GitLab CI job with no &lt;code&gt;rules:&lt;/code&gt; block runs on branches and tags but silently not on merge requests, and the fix is an unconditional &lt;code&gt;rules:&lt;/code&gt; block whose content does nothing and whose mere existence is the entire point. An OpenTofu &lt;code&gt;import&lt;/code&gt; block gets rejected inside a child module, because imports are only legal at the root, so the caller adopts and the module just describes.&lt;/p&gt;
&lt;p&gt;Neither error named the rule it was enforcing, and that&amp;rsquo;s the category to watch for. When sound logic fails against an unhelpful error, suspect a structural rule about where a thing may live or whether it&amp;rsquo;s even eligible&amp;hellip; not a bug in what you actually wrote. It&amp;rsquo;ll save you an afternoon. It certainly cost me a couple.&lt;/p&gt;</description></item><item><title>Reviewed, then applied</title><link>https://blog-570662.gitlab.io/reviewed-then-applied/</link><pubDate>Mon, 18 May 2026 00:00:00 +0000</pubDate><guid>https://blog-570662.gitlab.io/reviewed-then-applied/</guid><description>&lt;img src="https://blog-570662.gitlab.io/reviewed-then-applied/cover-reviewed-then-applied.png" alt="Featured image of post Reviewed, then applied" /&gt;&lt;p&gt;The genuinely dangerous moment in infrastructure-as-code isn&amp;rsquo;t the apply. It&amp;rsquo;s the gap between the plan a human read and approved, and the change that actually runs a moment later. If those two are different computations (and by default they are) then nobody really reviewed the thing that touched your account. The &lt;code&gt;infra&lt;/code&gt; repo closes that gap from both ends.&lt;/p&gt;
&lt;h2 id="the-gap-between-reviewed-and-ran"&gt;The gap between &amp;ldquo;reviewed&amp;rdquo; and &amp;ldquo;ran&amp;rdquo;
&lt;/h2&gt;&lt;p&gt;Here&amp;rsquo;s the moment in infrastructure-as-code where things go wrong.&lt;/p&gt;
&lt;p&gt;Someone opens a merge request. CI runs &lt;code&gt;tofu plan&lt;/code&gt; and the output is there to review: these three resources change, this one is destroyed. A human reads it, decides it&amp;rsquo;s correct, approves, merges. Then &lt;code&gt;apply&lt;/code&gt; runs.&lt;/p&gt;
&lt;p&gt;The trap is in what &lt;code&gt;apply&lt;/code&gt; actually applies. If &lt;code&gt;apply&lt;/code&gt; does its own fresh &lt;code&gt;tofu plan&lt;/code&gt; and then applies &lt;em&gt;that&lt;/em&gt;, the change that runs is not necessarily the change that was reviewed. State can have moved. A provider can have drifted. Someone else can have applied something in between. The reviewed plan and the applied change are two separate computations done at two different moments, and every difference between those moments is a change nobody looked at.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;infra&lt;/code&gt; closes that gap from both ends.&lt;/p&gt;
&lt;h2 id="plan-as-an-artifact"&gt;Plan as an artifact
&lt;/h2&gt;&lt;p&gt;The first end is making the reviewed plan and the applied plan the &lt;em&gt;same object&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;tofu-plan&lt;/code&gt; component runs the plan and saves it. It writes &lt;code&gt;tfplan.cache&lt;/code&gt;, OpenTofu&amp;rsquo;s binary plan file, as a CI artifact. It also writes &lt;code&gt;tfplan.json&lt;/code&gt;, which GitLab renders as a plan widget right in the merge request: the add, change and destroy summary, there to review without leaving the MR.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;tofu-apply&lt;/code&gt; component then does &lt;em&gt;not&lt;/em&gt; re-plan. It applies that saved &lt;code&gt;tfplan.cache&lt;/code&gt;. And OpenTofu itself enforces the safety net: applying a stale plan file, one captured against a state that has since moved, is rejected by the tool. So what reaches the account is provably the plan that was reviewed, or it&amp;rsquo;s nothing at all. There&amp;rsquo;s no third option where something unreviewed slips through.&lt;/p&gt;
&lt;h2 id="applying-is-a-human-decision"&gt;Applying is a human decision
&lt;/h2&gt;&lt;p&gt;The second end is &lt;em&gt;when&lt;/em&gt; apply runs.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;infra&lt;/code&gt; is trunk-based: it dropped the &lt;code&gt;develop&lt;/code&gt; branch and works on &lt;code&gt;main&lt;/code&gt;. But a naive trunk setup auto-applies every push to &lt;code&gt;main&lt;/code&gt;, which means there&amp;rsquo;s no human gate at all, just whatever the last merge happened to contain.&lt;/p&gt;
&lt;p&gt;So the gate is built explicitly. &lt;code&gt;releaser-pleaser&lt;/code&gt; keeps a release merge request open against &lt;code&gt;main&lt;/code&gt;. Ordinary merges to &lt;code&gt;main&lt;/code&gt; run plans but apply nothing. The apply happens only when a person &lt;em&gt;merges the release MR&lt;/em&gt;. Merging it cuts a release tag, and the tag pipeline is what runs &lt;code&gt;tofu-apply&lt;/code&gt;, against the plan banked by the latest &lt;code&gt;main&lt;/code&gt; pipeline.&lt;/p&gt;
&lt;p&gt;The effect is that the act of applying to the account is the deliberate, visible act of merging the release request. Nothing reaches the account because a commit landed. It reaches the account because a person decided a release should go out and merged it. (Which, after the &lt;a class="link" href="https://blog-570662.gitlab.io/why-we-left-github-for-gitlab/" &gt;accidental &lt;code&gt;v2.0.0&lt;/code&gt;&lt;/a&gt; that kicked off the whole GitLab move, is a discipline I&amp;rsquo;d freshly relearned the value of.)&lt;/p&gt;
&lt;h2 id="the-guard-on-the-gate"&gt;The guard on the gate
&lt;/h2&gt;&lt;p&gt;There&amp;rsquo;s one more piece, because a gate is only as good as its precondition.&lt;/p&gt;
&lt;p&gt;A &lt;code&gt;verify-main-plan&lt;/code&gt; job blocks the release MR from being mergeable unless the latest &lt;code&gt;main&lt;/code&gt; pipeline is green. You can&amp;rsquo;t cut a release, and therefore can&amp;rsquo;t apply, on top of a &lt;code&gt;main&lt;/code&gt; whose plan didn&amp;rsquo;t even succeed. The human gate has its own gate: the thing you&amp;rsquo;re about to merge has to be standing on a known-good plan before you&amp;rsquo;re allowed to merge it.&lt;/p&gt;
&lt;h2 id="the-bottom-line"&gt;The bottom line
&lt;/h2&gt;&lt;p&gt;The risk in infrastructure-as-code is the gap between the plan a human reviewed and the change that runs, because a re-plan at apply time is a different computation from the one that was approved.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;infra&lt;/code&gt; closes it twice over. &lt;code&gt;tofu-plan&lt;/code&gt; saves the plan as a &lt;code&gt;tfplan.cache&lt;/code&gt; artifact and renders it as a merge-request widget; &lt;code&gt;tofu-apply&lt;/code&gt; applies that exact artifact, and OpenTofu rejects it outright if the state has moved underneath it. And applying is gated on a human merging a &lt;code&gt;releaser-pleaser&lt;/code&gt; release request, not on a push, with a &lt;code&gt;verify-main-plan&lt;/code&gt; check making sure that request can only be merged on top of a green plan. What gets applied is what was reviewed, when a person decided it should be.&lt;/p&gt;</description></item><item><title>CI you include, not copy</title><link>https://blog-570662.gitlab.io/ci-you-include-not-copy/</link><pubDate>Sat, 16 May 2026 00:00:00 +0000</pubDate><guid>https://blog-570662.gitlab.io/ci-you-include-not-copy/</guid><description>&lt;img src="https://blog-570662.gitlab.io/ci-you-include-not-copy/cover-ci-you-include-not-copy.png" alt="Featured image of post CI you include, not copy" /&gt;&lt;p&gt;Every infrastructure repo runs the same CI: lint the OpenTofu, scan it, validate it, plan, apply. The first repo, you write that &lt;code&gt;.gitlab-ci.yml&lt;/code&gt; by hand. The second, you copy it. By the third, you&amp;rsquo;ve got three copies of the same pipeline quietly drifting apart, which is the exact problem you&amp;rsquo;d never tolerate in application code. The &lt;code&gt;cicd&lt;/code&gt; repo is the fix, and it&amp;rsquo;s just the library-first instinct pointed at the pipeline.&lt;/p&gt;
&lt;h2 id="the-gitlab-ciyml-you-keep-copying"&gt;The &lt;code&gt;.gitlab-ci.yml&lt;/code&gt; you keep copying
&lt;/h2&gt;&lt;p&gt;The infrastructure repos in this series all run the same CI gate jobs: format and validate the OpenTofu, lint it, scan it for security issues and secrets, and on the deploy side, plan and apply.&lt;/p&gt;
&lt;p&gt;The first repo, you write that &lt;code&gt;.gitlab-ci.yml&lt;/code&gt; by hand. The second repo needs the same jobs, so you copy it. The third repo, you copy it again. Now there are three copies of the same pipeline, and they do what copies always do. They drift. A fix you make in one repo&amp;rsquo;s CI doesn&amp;rsquo;t reach the other two. A tightened scan rule lands in the repo you were working in and nowhere else. It&amp;rsquo;s the copy-paste problem, exactly as it shows up in application code, just written in YAML and therefore that bit easier to pretend isn&amp;rsquo;t code.&lt;/p&gt;
&lt;h2 id="gitlab-has-a-feature-for-exactly-this"&gt;GitLab has a feature for exactly this
&lt;/h2&gt;&lt;p&gt;GitLab CI/CD Components are the answer to that problem. A component is a reusable, versioned piece of pipeline that you publish, and other projects pull in with an &lt;code&gt;include:&lt;/code&gt; pinned to a version:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;include&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="nt"&gt;component&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;gitlab.com/phpboyscout/cicd/tofu-lint@v0.5.0&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;That&amp;rsquo;s a library import, for pipeline. The component has a defined interface, a version, and a home in GitLab&amp;rsquo;s CI/CD Catalog. A consuming repo includes it instead of carrying its own copy, and when the component improves, the consumer moves a version pin rather than re-copying YAML.&lt;/p&gt;
&lt;h2 id="why-a-monorepo-of-components"&gt;Why a monorepo of components
&lt;/h2&gt;&lt;p&gt;The &lt;code&gt;cicd&lt;/code&gt; repo holds all of the components together: &lt;code&gt;tofu-lint&lt;/code&gt;, &lt;code&gt;tofu-security&lt;/code&gt;, &lt;code&gt;tofu-validate&lt;/code&gt;, &lt;code&gt;tofu-plan&lt;/code&gt;, &lt;code&gt;tofu-apply&lt;/code&gt;, and more. One project, not one project per component.&lt;/p&gt;
&lt;p&gt;That&amp;rsquo;s a deliberate call, and the reason is how GitLab versions things. A version is a tag, and a tag belongs to a &lt;em&gt;project&lt;/em&gt;. A component&amp;rsquo;s version is its project&amp;rsquo;s tag. So a monorepo of components, versioned together as one tag stream, is the natural unit: a consumer pins &lt;code&gt;@v0.5.0&lt;/code&gt; and gets a known-good &lt;em&gt;set&lt;/em&gt; of components that were tested together, rather than juggling a separate version for each one.&lt;/p&gt;
&lt;h2 id="authoring-discipline"&gt;Authoring discipline
&lt;/h2&gt;&lt;p&gt;A component is a file under &lt;code&gt;templates/&lt;/code&gt;, and it opens with a &lt;code&gt;spec: inputs:&lt;/code&gt; block: the typed inputs, their defaults, the component&amp;rsquo;s public interface.&lt;/p&gt;
&lt;p&gt;The discipline that keeps the library usable is that a component must be consumer-agnostic. It never hardcodes a token, and it never names a particular consumer&amp;rsquo;s variable. Inputs have sensible defaults, and a consuming repo overrides them. A component that reaches out and assumes something about the repo including it is a component that works in one repo and surprises the next. An authoring guide in the repo keeps that consistent across everyone who adds a component.&lt;/p&gt;
&lt;h2 id="the-self-test-you-cannot-fully-write"&gt;The self-test you cannot fully write
&lt;/h2&gt;&lt;p&gt;The &lt;code&gt;cicd&lt;/code&gt; repo tests its own components with a self-test pipeline. It&amp;rsquo;s worth knowing where that self-test stops.&lt;/p&gt;
&lt;p&gt;When a repo tests its own components by running them in child pipelines, GitLab masks &lt;code&gt;$CI_PIPELINE_SOURCE&lt;/code&gt; as &lt;code&gt;parent_pipeline&lt;/code&gt;. A component&amp;rsquo;s &lt;code&gt;rules:&lt;/code&gt;, which often branch on the pipeline source to behave differently for a merge request than for a branch or a tag, therefore can&amp;rsquo;t be exercised honestly by the self-test: the source they&amp;rsquo;d branch on has been flattened. The self-test covers what it can, and the component &lt;code&gt;rules:&lt;/code&gt; are, in the end, validated by real consumers using them for real. That&amp;rsquo;s a genuine limit, and naming it is better than pretending the self-test proves more than it does. (It&amp;rsquo;s also, not coincidentally, the exact &lt;code&gt;rules:&lt;/code&gt; quirk that bit me in &lt;a class="link" href="https://blog-570662.gitlab.io/two-bugs-that-taught-me-the-rules/" &gt;one of the two bugs&lt;/a&gt; I closed the series with.)&lt;/p&gt;
&lt;h2 id="the-same-instinct-again"&gt;The same instinct, again
&lt;/h2&gt;&lt;p&gt;This blog keeps circling the same instinct. go-tool-base exists because the same CLI scaffolding kept getting rewritten, so it was &lt;a class="link" href="https://blog-570662.gitlab.io/introducing-go-tool-base/" &gt;extracted into a library&lt;/a&gt;. &lt;code&gt;cicd&lt;/code&gt; is that instinct pointed at the pipeline: the same gate jobs kept getting copied between repos, so they were extracted into a versioned, included library.&lt;/p&gt;
&lt;p&gt;Stop copy-pasting. Publish, version, include. It&amp;rsquo;s true for CLI code, and it turns out to be just as true for the YAML that builds and ships it.&lt;/p&gt;
&lt;h2 id="the-gist"&gt;The gist
&lt;/h2&gt;&lt;p&gt;Every infrastructure repo needs the same CI, and copying the &lt;code&gt;.gitlab-ci.yml&lt;/code&gt; between them produces copies that drift apart. GitLab CI/CD Components fix it: reusable, versioned pipeline that a repo &lt;code&gt;include:&lt;/code&gt;s and pins, instead of carrying its own copy.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;cicd&lt;/code&gt; is a monorepo of those components, versioned together as one tag stream, because GitLab tags a project and a component&amp;rsquo;s version is its project&amp;rsquo;s tag. Components are authored consumer-agnostic, with typed &lt;code&gt;spec: inputs:&lt;/code&gt; and no hardcoded assumptions, and their &lt;code&gt;rules:&lt;/code&gt; are validated by real use because the self-test can&amp;rsquo;t see the pipeline source. It&amp;rsquo;s the library-first instinct, applied to CI: publish it once, include it everywhere, fix it in one place.&lt;/p&gt;</description></item><item><title>One image for the whole toolchain</title><link>https://blog-570662.gitlab.io/one-image-for-the-whole-toolchain/</link><pubDate>Fri, 15 May 2026 00:00:00 +0000</pubDate><guid>https://blog-570662.gitlab.io/one-image-for-the-whole-toolchain/</guid><description>&lt;img src="https://blog-570662.gitlab.io/one-image-for-the-whole-toolchain/cover-one-image-for-the-whole-toolchain.png" alt="Featured image of post One image for the whole toolchain" /&gt;&lt;p&gt;Every CI gate job across the infrastructure repos reaches for the same pile of tools: OpenTofu, tflint, trivy, checkov, gitleaks, terraform-docs, the AWS CLI. Installing that pile per job is both slow and quietly dangerous, because nothing pins it consistently. &lt;code&gt;infra-tools&lt;/code&gt; is the obvious fix (one image, one source of truth for versions), but two of its build decisions are less obvious and worth a look: it publishes with &lt;code&gt;crane&lt;/code&gt; instead of a second build, and it deliberately lets its own vulnerability scan fail.&lt;/p&gt;
&lt;h2 id="the-same-pile-of-tools-in-every-repo"&gt;The same pile of tools, in every repo
&lt;/h2&gt;&lt;p&gt;Every infrastructure repo in this series runs the same CI gate jobs: format and validate the OpenTofu, lint it, scan it for security problems and secrets, check the docs. Those jobs need a specific set of tools, and it&amp;rsquo;s the same set in every repo.&lt;/p&gt;
&lt;p&gt;Install them per job and you pay twice. You pay in time, because every pipeline downloads and installs the whole set again. And you pay in drift, because unless every repo pins every tool identically, the repos slowly diverge on which version of trivy or tflint they actually run, and a check that passes in one repo fails in another for no reason anyone can see.&lt;/p&gt;
&lt;h2 id="one-image-one-source-of-truth"&gt;One image, one source of truth
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;infra-tools&lt;/code&gt; is the answer: a single Debian-based container image with the whole toolchain baked in. Every CI job in every repo uses it with one &lt;code&gt;image:&lt;/code&gt; line.&lt;/p&gt;
&lt;p&gt;The real value isn&amp;rsquo;t the convenience. It&amp;rsquo;s that the image is the &lt;em&gt;one place&lt;/em&gt; tool versions are pinned. The Go-based tools are pinned in a &lt;code&gt;mise.toml&lt;/code&gt;. &lt;code&gt;checkov&lt;/code&gt;, which has no mise plugin, is pinned in a requirements file installed with pipx. The AWS CLI is pinned by a build argument. Three mechanisms, because the tools come from three kinds of source, but one image, and every pin wired to Renovate so a version bump arrives as a reviewable pull request. There&amp;rsquo;s exactly one answer to &amp;ldquo;what version of trivy does the toolchain use&amp;rdquo;, and it lives here.&lt;/p&gt;
&lt;h2 id="publishing-with-crane-not-a-second-build"&gt;Publishing with crane, not a second build
&lt;/h2&gt;&lt;p&gt;A build-pipeline detail that took a real bug to discover.&lt;/p&gt;
&lt;p&gt;The pipeline builds the image with kaniko, which builds images without a privileged Docker daemon, something that matters a great deal on shared CI runners. Then it scans the image, then it publishes it.&lt;/p&gt;
&lt;p&gt;The obvious way to write the publish stage is &amp;ldquo;build the image and push it&amp;rdquo;. But kaniko has no mode for &amp;ldquo;just push this tarball I already built&amp;rdquo;. A second kaniko invocation re-executes the entire Dockerfile from the top, including a second &lt;code&gt;mise install&lt;/code&gt;, which makes a fresh round of calls to GitHub&amp;rsquo;s API to fetch tools. GitHub&amp;rsquo;s anonymous API limit is low and shared by IP, so on a CI runner that second install reliably trips a &lt;code&gt;403&lt;/code&gt; rate-limit. (Yes, another &lt;code&gt;403&lt;/code&gt;. They do get everywhere.)&lt;/p&gt;
&lt;p&gt;So the publish stage doesn&amp;rsquo;t rebuild. It uses &lt;code&gt;crane&lt;/code&gt; to push the exact image tarball the build stage already produced. The image is built once. And because the published bytes are the same bytes the scan stage scanned, there&amp;rsquo;s no gap between &amp;ldquo;the image we checked&amp;rdquo; and &amp;ldquo;the image we shipped&amp;rdquo;.&lt;/p&gt;
&lt;h2 id="soft-failing-the-scanner-on-purpose"&gt;Soft-failing the scanner on purpose
&lt;/h2&gt;&lt;p&gt;The decision that looks wrong until you see the reasoning: the pipeline scans the image with trivy, and trivy is allowed to fail without failing the pipeline.&lt;/p&gt;
&lt;p&gt;A vulnerability scanner that doesn&amp;rsquo;t gate the build sounds like a scanner switched off. It isn&amp;rsquo;t. It&amp;rsquo;s a scanner pointed at something it can&amp;rsquo;t helpfully gate.&lt;/p&gt;
&lt;p&gt;The tools in the image are prebuilt Go binaries. trivy inspects them, reads the version of the Go runtime each was compiled with, and reports every known CVE in that Go runtime. Those findings are real, but they aren&amp;rsquo;t &lt;em&gt;mine&lt;/em&gt; to fix. The only fix is the upstream tool rebuilding itself against a patched Go. With seven such tools in the image, at any given moment one of them is usually a little behind on its Go version.&lt;/p&gt;
&lt;p&gt;A hard gate would mean the image becomes unpublishable whenever any single upstream lags, over a CVE in code I don&amp;rsquo;t own and can&amp;rsquo;t patch. That&amp;rsquo;s not a security control; it&amp;rsquo;s a way to be unable to ship. So the scan is &lt;code&gt;allow_failure&lt;/code&gt;. The findings stay fully visible, and the residual count is genuinely useful as a &lt;em&gt;metric&lt;/em&gt; for how far behind upstream the toolchain has drifted. It just doesn&amp;rsquo;t block shipping an image whose only &amp;ldquo;vulnerabilities&amp;rdquo; are other people&amp;rsquo;s build timelines.&lt;/p&gt;
&lt;h2 id="what-it-comes-down-to"&gt;What it comes down to
&lt;/h2&gt;&lt;p&gt;The infrastructure repos all run the same CI gate jobs, needing the same tools, so &lt;code&gt;infra-tools&lt;/code&gt; bakes the whole toolchain into one image and pins every version in one place, wired to Renovate.&lt;/p&gt;
&lt;p&gt;Two build choices are worth copying. The publish stage uses &lt;code&gt;crane&lt;/code&gt; to push the already-built, already-scanned tarball, because a second kaniko build would re-run &lt;code&gt;mise install&lt;/code&gt; and hit GitHub&amp;rsquo;s anonymous rate limit, and because pushing the scanned bytes means shipping exactly what was checked. And the trivy scan is deliberately &lt;code&gt;allow_failure&lt;/code&gt;, because it reports Go-runtime CVEs in prebuilt upstream binaries that no change to this repo can fix, so a hard gate would only make the image unshippable over someone else&amp;rsquo;s lag.&lt;/p&gt;</description></item><item><title>A 403 you can't fix in IAM</title><link>https://blog-570662.gitlab.io/a-403-you-cant-fix-in-iam/</link><pubDate>Thu, 14 May 2026 00:00:00 +0000</pubDate><guid>https://blog-570662.gitlab.io/a-403-you-cant-fix-in-iam/</guid><description>&lt;img src="https://blog-570662.gitlab.io/a-403-you-cant-fix-in-iam/cover-a-403-you-cant-fix-in-iam.png" alt="Featured image of post A 403 you can't fix in IAM" /&gt;&lt;p&gt;&lt;a class="link" href="https://blog-570662.gitlab.io/no-access-keys-in-ci/" &gt;The OIDC post&lt;/a&gt; explained the handshake that lets a GitLab pipeline deploy to AWS with no stored key. This is the story of the first time I got it wrong, and spent an afternoon fixing the wrong thing. The error was a flat 403 from AWS, and the maddening part is that no amount of editing the IAM policy was ever going to fix it.&lt;/p&gt;
&lt;h2 id="a-403-on-the-first-real-run"&gt;A 403 on the first real run
&lt;/h2&gt;&lt;p&gt;The OIDC post covered the handshake: GitLab CI mints a signed token, AWS exchanges it for short-lived credentials against a role whose trust policy names the pipeline. During the GitLab migration I wired exactly that up for the &lt;code&gt;infra&lt;/code&gt; repo, including a trust policy condition meant to let merge-request pipelines run a plan.&lt;/p&gt;
&lt;p&gt;The first merge request that should have triggered &lt;code&gt;tofu-plan&lt;/code&gt; didn&amp;rsquo;t run it. The job failed, and the error from AWS was a flat &lt;code&gt;AccessDenied&lt;/code&gt;. A 403.&lt;/p&gt;
&lt;h2 id="the-instinct-and-why-it-wastes-an-afternoon"&gt;The instinct, and why it wastes an afternoon
&lt;/h2&gt;&lt;p&gt;The instinct on an IAM 403 is immediate and almost always right: the policy&amp;rsquo;s wrong, so go and edit the policy. Tighten the condition. Loosen the condition. Check the wildcard. Re-read the &lt;code&gt;sub&lt;/code&gt; pattern character by character.&lt;/p&gt;
&lt;p&gt;All of that was wasted, and it was wasted for a reason that took me far too long to see. The trust policy wasn&amp;rsquo;t matching the &lt;em&gt;wrong&lt;/em&gt; value. It was matching a value that &lt;em&gt;does not exist&lt;/em&gt;. No amount of editing a condition makes it match a thing that&amp;rsquo;s never present.&lt;/p&gt;
&lt;h2 id="what-is-actually-in-the-token"&gt;What is actually in the token
&lt;/h2&gt;&lt;p&gt;GitLab&amp;rsquo;s OIDC token has a &lt;code&gt;sub&lt;/code&gt; claim that encodes the pipeline&amp;rsquo;s context, and part of that encoding is a &lt;code&gt;ref_type&lt;/code&gt;. I&amp;rsquo;d assumed &lt;code&gt;ref_type&lt;/code&gt; could be &lt;code&gt;branch&lt;/code&gt;, &lt;code&gt;tag&lt;/code&gt;, or &lt;code&gt;mr&lt;/code&gt;, because a pipeline can certainly be a branch pipeline, a tag pipeline, or a merge-request pipeline. So the trust policy, for the plan job, matched a &lt;code&gt;sub&lt;/code&gt; containing &lt;code&gt;ref_type:mr&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;That assumption was wrong. GitLab&amp;rsquo;s &lt;code&gt;ref_type&lt;/code&gt; is &lt;code&gt;branch&lt;/code&gt; or &lt;code&gt;tag&lt;/code&gt;. That&amp;rsquo;s the entire set. There is no &lt;code&gt;mr&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;A merge-request pipeline doesn&amp;rsquo;t run against a merge-request ref. It runs against the source &lt;em&gt;branch&lt;/em&gt;. So its token&amp;rsquo;s &lt;code&gt;sub&lt;/code&gt; carries &lt;code&gt;ref_type:branch&lt;/code&gt;, like any other branch pipeline. The trust policy condition asked for &lt;code&gt;ref_type:mr&lt;/code&gt;, GitLab never puts &lt;code&gt;mr&lt;/code&gt; in a token, the condition was therefore never true, and every merge-request pipeline got a 403. Forever, until the policy stopped asking for a claim that isn&amp;rsquo;t real.&lt;/p&gt;
&lt;h2 id="the-fix-and-the-lesson-worth-more-than-the-fix"&gt;The fix, and the lesson worth more than the fix
&lt;/h2&gt;&lt;p&gt;The fix is small once it&amp;rsquo;s visible: match &lt;code&gt;ref_type:branch&lt;/code&gt; and narrow it down by branch name or project path instead. An afternoon of policy edits, and the actual change is one word.&lt;/p&gt;
&lt;p&gt;The lesson is the part worth keeping. When an OIDC trust fails, the useful question is never &amp;ldquo;is my policy clever enough&amp;rdquo;. It&amp;rsquo;s &amp;ldquo;what&amp;rsquo;s &lt;em&gt;actually in the token&lt;/em&gt;&amp;rdquo;. An OIDC trust policy can only ever match the claims the identity provider genuinely asserts, and the gap between what a provider asserts and what you &lt;em&gt;assumed&lt;/em&gt; it asserts is precisely where this class of bug lives.&lt;/p&gt;
&lt;p&gt;So the move, when an OIDC handshake 403s, is to get hold of a real token and decode it. Look at the actual &lt;code&gt;sub&lt;/code&gt;, the actual claims, the actual values. Match what&amp;rsquo;s there. A 403 that survives every sensible edit to the policy is usually not a policy that&amp;rsquo;s too loose or too strict. It&amp;rsquo;s a policy matching a claim that was never going to be in the token.&lt;/p&gt;
&lt;h2 id="the-habit-it-left-behind"&gt;The habit it left behind
&lt;/h2&gt;&lt;p&gt;I wired an OIDC trust policy to let merge-request pipelines plan, by matching a &lt;code&gt;sub&lt;/code&gt; claim with &lt;code&gt;ref_type:mr&lt;/code&gt;. The first real merge request got a 403, and no edit to the policy fixed it, because GitLab&amp;rsquo;s &lt;code&gt;ref_type&lt;/code&gt; is only ever &lt;code&gt;branch&lt;/code&gt; or &lt;code&gt;tag&lt;/code&gt;. A merge-request pipeline runs on a branch ref, so the &lt;code&gt;mr&lt;/code&gt; value the policy demanded was never in any token.&lt;/p&gt;
&lt;p&gt;The fix was one word. The habit it left behind is the valuable bit: when an OIDC trust fails, stop editing the policy and go and read a real token. A trust policy can only match what the provider actually asserts, and &amp;ldquo;what I assumed it asserts&amp;rdquo; is where the 403 was hiding the whole time. (If this shape of bug feels familiar by the end of the series, that&amp;rsquo;s not an accident: I &lt;a class="link" href="https://blog-570662.gitlab.io/two-bugs-that-taught-me-the-rules/" &gt;come back to it&lt;/a&gt; with two more from exactly the same family.)&lt;/p&gt;</description></item><item><title>Why go-tool-base left GitHub for GitLab</title><link>https://blog-570662.gitlab.io/why-we-left-github-for-gitlab/</link><pubDate>Mon, 11 May 2026 00:00:00 +0000</pubDate><guid>https://blog-570662.gitlab.io/why-we-left-github-for-gitlab/</guid><description>&lt;img src="https://blog-570662.gitlab.io/why-we-left-github-for-gitlab/cover-why-we-left-github-for-gitlab.png" alt="Featured image of post Why go-tool-base left GitHub for GitLab" /&gt;&lt;p&gt;A botched version bump made me stop and actually look at where go-tool-base lived, and I didn&amp;rsquo;t much like what I saw. GitHub had spent months quietly falling over, and when Mitchell Hashimoto (GitHub user #1299, no less) publicly walked Ghostty off the platform, it stopped feeling like just my problem. I&amp;rsquo;ve been a GitLab fan for years, so the move was less a leap and more an overdue nudge. This is the &lt;em&gt;why&lt;/em&gt;, not the &lt;em&gt;how&lt;/em&gt;.&lt;/p&gt;
&lt;h2 id="it-started-with-a-wrong-number"&gt;It started with a wrong number
&lt;/h2&gt;&lt;p&gt;Every migration has a trigger, and mine was embarrassingly small. A commit landed on &lt;code&gt;main&lt;/code&gt; carrying a &lt;code&gt;BREAKING CHANGE:&lt;/code&gt; footer it didn&amp;rsquo;t really deserve. Semantic-release did exactly what it&amp;rsquo;s told to do with that footer: it cut a major version. go-tool-base lurched from the v1 line straight to v2.0.0, and a chain of things that keyed off the version went sideways with it.&lt;/p&gt;
&lt;p&gt;It was fixable. It wasn&amp;rsquo;t a disaster. But it was the kind of small, stupid breakage that makes you stop and actually &lt;em&gt;look&lt;/em&gt; at your setup instead of just patching it and moving on. And when I looked, the version bump wasn&amp;rsquo;t the thing that bothered me. It was everything around it.&lt;/p&gt;
&lt;h2 id="the-platform-had-been-quietly-failing"&gt;The platform had been quietly failing
&lt;/h2&gt;&lt;p&gt;I&amp;rsquo;d been losing time to GitHub for months. Not dramatically. No single outage you&amp;rsquo;d write home about, just a steady drip of Actions queues that wouldn&amp;rsquo;t drain, pull requests that wouldn&amp;rsquo;t merge, the occasional morning where the thing simply wasn&amp;rsquo;t there. You absorb it. You re-run the job. You make a coffee and try again. You tell yourself it&amp;rsquo;s a blip.&lt;/p&gt;
&lt;p&gt;The trouble with a steady drip is that you stop counting it. It becomes weather.&lt;/p&gt;
&lt;h2 id="the-canary-left-the-mine"&gt;The canary left the mine
&lt;/h2&gt;&lt;p&gt;Then, in late April, &lt;a class="link" href="https://mitchellh.com/" target="_blank" rel="noopener"
 &gt;Mitchell Hashimoto&lt;/a&gt; (co-founder of HashiCorp, creator of Vagrant, Terraform and the Ghostty terminal) published &lt;a class="link" href="https://mitchellh.com/writing/ghostty-leaving-github" target="_blank" rel="noopener"
 &gt;&lt;em&gt;Ghostty Is Leaving GitHub&lt;/em&gt;&lt;/a&gt;, and &lt;em&gt;The Register&lt;/em&gt; &lt;a class="link" href="https://www.theregister.com/software/2026/04/29/mitchell-hashimoto-says-github-no-longer-for-serious-work/5227505" target="_blank" rel="noopener"
 &gt;picked it up&lt;/a&gt; a day later under the headline &amp;ldquo;GitHub &amp;rsquo;no longer a place for serious work&amp;rsquo;&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;This is not a man with a casual relationship to GitHub. He&amp;rsquo;s, by his own account, user #1299, joined February 2008. He called it &amp;ldquo;the place that has made me the most happy&amp;rdquo;. And he still wrote this:&lt;/p&gt;

 &lt;blockquote&gt;
 &lt;p&gt;This is no longer a place for serious work if it just blocks you out for hours per day, every day.&lt;/p&gt;

 &lt;/blockquote&gt;
&lt;p&gt;The detail that landed hardest for me wasn&amp;rsquo;t a quote, it was a habit. He&amp;rsquo;d kept a journal for a month, marking an &amp;ldquo;X&amp;rdquo; on every day a GitHub outage had cost him working time. &lt;em&gt;Almost every day had an X.&lt;/em&gt; Reading that, I realised I&amp;rsquo;d been having the same month. I&amp;rsquo;d just never been disciplined enough to write it down. He&amp;rsquo;d turned my vague &amp;ldquo;it&amp;rsquo;s been flaky lately&amp;rdquo; into a row of crosses on a calendar.&lt;/p&gt;

 &lt;blockquote&gt;
 &lt;p&gt;I want to ship software and it doesn&amp;rsquo;t want me to ship software.&lt;/p&gt;

 &lt;/blockquote&gt;
&lt;p&gt;When the person who&amp;rsquo;s been on the platform for eighteen years and &lt;em&gt;loves&lt;/em&gt; it says that out loud, it stops being your private grumble. It&amp;rsquo;s the canary, and the canary has stopped singing.&lt;/p&gt;
&lt;h2 id="why-gitlab-and-not-just-somewhere-else"&gt;Why GitLab, and not just &amp;ldquo;somewhere else&amp;rdquo;
&lt;/h2&gt;&lt;p&gt;Being annoyed at GitHub is a reason to leave. It is not, on its own, a reason to pick a destination. The destination has to be a positive choice.&lt;/p&gt;
&lt;p&gt;For me GitLab was an easy one, because I&amp;rsquo;ve been a fan for years. Long enough, in fact, to have &lt;em&gt;also&lt;/em&gt; been a reliable grumbler about their pricing tiers, which is how you know it&amp;rsquo;s a real relationship and not a honeymoon. What I&amp;rsquo;ve always rated is the model: GitLab treats source hosting, CI/CD, the package registry, releases and Pages as &lt;em&gt;one integrated product&lt;/em&gt;, not a marketplace of bolted-on parts you assemble yourself.&lt;/p&gt;
&lt;p&gt;That integration is the actual prize. On the old setup, &amp;ldquo;CI&amp;rdquo; meant a folder of separate GitHub Actions workflow files, each pinned, each its own little world. On GitLab it&amp;rsquo;s a single &lt;code&gt;.gitlab-ci.yml&lt;/code&gt; pipeline with proper stages (lint, test, security, docs, release) and the release stage talks to the built-in package registry and Pages without me wiring up a single external credential. The CI job that builds the project can authenticate to the things the project needs &lt;em&gt;because they&amp;rsquo;re the same platform&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;There&amp;rsquo;s a second-order benefit too. A migration is a rare licence to fix things you&amp;rsquo;d never otherwise touch. Moving gave me the cover to reset go-tool-base&amp;rsquo;s versioning cleanly (back to a sensible &lt;code&gt;v0.x&lt;/code&gt; line, the accidental &lt;code&gt;v2.0.0&lt;/code&gt; left behind as a cautionary tale) and to move the module path to its new home in one deliberate change rather than a thousand apologetic ones.&lt;/p&gt;
&lt;h2 id="what-im-not-going-to-claim"&gt;What I&amp;rsquo;m not going to claim
&lt;/h2&gt;&lt;p&gt;I&amp;rsquo;m not going to tell you GitHub is finished, or that GitLab never has a bad day, because it does, everyone does. This isn&amp;rsquo;t a teardown. GitHub gave go-tool-base a perfectly good home for its first year, and the archived mirror is still sitting there, read-only, pointing anyone who finds it at the new place.&lt;/p&gt;
&lt;p&gt;What changed is simpler than a grand verdict. The friction crossed a line, someone I respect said the quiet part loudly enough that I couldn&amp;rsquo;t keep filing it under &amp;ldquo;weather&amp;rdquo;, and the place I&amp;rsquo;d have moved to anyway was sitting right there with a better model. Sometimes the prudent move and the move you secretly wanted turn out to be the same move, and you just need a wrong version number to give you permission.&lt;/p&gt;
&lt;h2 id="boiling-it-down"&gt;Boiling it down
&lt;/h2&gt;&lt;p&gt;go-tool-base moved from GitHub to GitLab in May 2026. The proximate cause was a self-inflicted version-bump mess; the real cause was months of GitHub unreliability that I&amp;rsquo;d stopped consciously noticing until Mitchell Hashimoto&amp;rsquo;s very public departure named it for me. GitLab was a positive pick, not just an escape hatch: its integrated CI/CD, registry, releases and Pages are one product rather than a kit, and that integration is genuinely worth having. The migration also bought a clean versioning restart as a bonus.&lt;/p&gt;
&lt;p&gt;If you&amp;rsquo;ve been absorbing a steady drip of friction and telling yourself it&amp;rsquo;s normal: try the calendar trick. Mark the X&amp;rsquo;s for a month. The page will tell you something you already half-know.&lt;/p&gt;</description></item><item><title>No access keys in CI</title><link>https://blog-570662.gitlab.io/no-access-keys-in-ci/</link><pubDate>Fri, 08 May 2026 00:00:00 +0000</pubDate><guid>https://blog-570662.gitlab.io/no-access-keys-in-ci/</guid><description>&lt;img src="https://blog-570662.gitlab.io/no-access-keys-in-ci/cover-no-access-keys-in-ci.png" alt="Featured image of post No access keys in CI" /&gt;&lt;p&gt;A long-lived AWS access key, sitting in a CI system, is just about the single credential I&amp;rsquo;d most like to be rid of. It&amp;rsquo;s powerful, it never expires unless someone remembers to rotate it (nobody remembers to rotate it), and it lives in one of the most attractive targets in the whole supply chain. For infrastructure that&amp;rsquo;s eventually going to hold a release-signing key, it&amp;rsquo;s exactly the wrong place to start. So the &lt;code&gt;phpboyscout&lt;/code&gt; infrastructure has no AWS access key in CI at all. None.&lt;/p&gt;
&lt;h2 id="the-access-key-you-dont-want"&gt;The access key you don&amp;rsquo;t want
&lt;/h2&gt;&lt;p&gt;A CI pipeline that runs &lt;code&gt;tofu apply&lt;/code&gt; against AWS needs AWS credentials. The traditional way to give it some is an IAM user with an access key pair, pasted into the CI system as a masked variable.&lt;/p&gt;
&lt;p&gt;Look at what that key is. It&amp;rsquo;s long-lived: it works until someone remembers to rotate it, and rotating it is a chore, so mostly nobody does. It&amp;rsquo;s powerful: it can apply infrastructure, so it can do nearly anything. And it&amp;rsquo;s sitting in a CI system, which is one of the most attractive targets in your whole supply chain. You&amp;rsquo;ve taken your highest-value credential and stored a permanent copy of it in a place built for running automated jobs.&lt;/p&gt;
&lt;p&gt;For infrastructure that&amp;rsquo;s going to hold a release-signing key, that&amp;rsquo;s precisely the wrong starting point. So the &lt;code&gt;phpboyscout&lt;/code&gt; infrastructure has no AWS access key in CI at all. Not a well-guarded one. None.&lt;/p&gt;
&lt;h2 id="federation-instead-of-a-stored-secret"&gt;Federation instead of a stored secret
&lt;/h2&gt;&lt;p&gt;The replacement is OIDC federation, and the shape of it is worth walking through, because it&amp;rsquo;s genuinely different from &amp;ldquo;a secret, but better&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;A modern CI platform can mint an OIDC token. GitLab does this with an &lt;code&gt;id_tokens:&lt;/code&gt; block: at job time, GitLab issues a short-lived JSON Web Token, signed by GitLab, that asserts a set of facts. This is project X. This is pipeline Y. This is running on ref Z, of this type.&lt;/p&gt;
&lt;p&gt;AWS can consume that. The &lt;code&gt;sts:AssumeRoleWithWebIdentity&lt;/code&gt; call takes such a token and, if it satisfies an IAM role&amp;rsquo;s trust policy, returns short-lived AWS credentials for that role. The trust policy is where the control lives: it names GitLab as a trusted token issuer, and it constrains the token&amp;rsquo;s &lt;code&gt;sub&lt;/code&gt; claim so that only the specific project, and the specific refs, you intend can assume the role.&lt;/p&gt;
&lt;p&gt;Put it together: the pipeline asks GitLab for a token, hands it to AWS, and gets back credentials that last about an hour and are scoped to one role. Nothing long-lived is stored anywhere. The credential exists only for the job that needs it, and it can&amp;rsquo;t be stolen from a CI variable store, because it was never in one.&lt;/p&gt;
&lt;h2 id="two-halves-of-one-handshake"&gt;Two halves of one handshake
&lt;/h2&gt;&lt;p&gt;That handshake is built by two of the repos in this series, each owning one side.&lt;/p&gt;
&lt;p&gt;&lt;a class="link" href="https://blog-570662.gitlab.io/the-bootstrap-that-does-almost-nothing/" &gt;&lt;code&gt;terraform-aws-bootstrap&lt;/code&gt;&lt;/a&gt; builds the AWS half, in its &lt;code&gt;automation-iam&lt;/code&gt; module: it registers GitLab as an OIDC identity provider in the account, and it creates the automation role with the trust policy that decides which pipelines may assume it.&lt;/p&gt;
&lt;p&gt;The CI components build the consuming half: the &lt;code&gt;id_tokens:&lt;/code&gt; block that asks GitLab for the JWT, and then simply letting the AWS provider&amp;rsquo;s own credential chain perform the exchange. The pipeline doesn&amp;rsquo;t call &lt;code&gt;sts&lt;/code&gt; by hand. It presents the token; the SDK does the rest.&lt;/p&gt;
&lt;h2 id="the-gotcha-dont-set-a-profile"&gt;The gotcha: don&amp;rsquo;t set a profile
&lt;/h2&gt;&lt;p&gt;There&amp;rsquo;s one quiet way to break this, and a stack can look completely correct while doing it.&lt;/p&gt;
&lt;p&gt;The AWS SDK finds credentials by walking a chain of sources in order. The web-identity path, the one that uses the OIDC token, is one link in that chain. It triggers off environment variables the CI sets up automatically.&lt;/p&gt;
&lt;p&gt;But if the &lt;code&gt;aws&lt;/code&gt; provider block has a hardcoded &lt;code&gt;profile = &amp;quot;...&amp;quot;&lt;/code&gt;, the SDK takes the &lt;em&gt;profile&lt;/em&gt; link of the chain instead, and never reaches the web-identity link. A &lt;code&gt;profile&lt;/code&gt; line is the sort of thing that ends up in a provider block from someone&amp;rsquo;s local development setup, where it&amp;rsquo;s exactly right. Committed and run in CI, it silently short-circuits the federation. The pipeline either fails to find credentials, or finds the wrong ones.&lt;/p&gt;
&lt;p&gt;The rule is simple once you know it: the provider block that runs in CI must not name a &lt;code&gt;profile&lt;/code&gt;. Leave the chain free to find the web identity. It&amp;rsquo;s the kind of bug that teaches you to be precise about &lt;em&gt;which&lt;/em&gt; link of the credential chain you&amp;rsquo;re actually relying on.&lt;/p&gt;
&lt;h2 id="the-bottom-line"&gt;The bottom line
&lt;/h2&gt;&lt;p&gt;Giving CI an AWS access key means storing your most powerful, longest-lived credential in one of your most exposed systems. OIDC federation removes it entirely. The CI platform mints a short-lived signed token, AWS exchanges it via &lt;code&gt;AssumeRoleWithWebIdentity&lt;/code&gt; for hour-long credentials against a role whose trust policy names the exact pipeline, and nothing permanent is stored.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;terraform-aws-bootstrap&lt;/code&gt; builds the AWS side, the identity provider and the trust policy; the CI components build the consuming side, the token request. The one trap is a hardcoded &lt;code&gt;profile&lt;/code&gt; in the provider block, which short-circuits the SDK&amp;rsquo;s credential chain before it reaches the web-identity path. Get that right, and a pipeline deploys to AWS as a verifiable, short-lived identity, with no key to steal.&lt;/p&gt;</description></item></channel></rss>