From allow_failure to blocking

There’s a special kind of CI job that everyone on a team quietly learns to ignore: the one marked allow_failure: true. It runs, it goes red, the pipeline goes green anyway, and after the third time you stop looking at it. I inherited six of those when I moved rust-tool-base’s CI to GitLab. Over a few days I turned three of them into real gates, and the interesting part was never the YAML. It was working out which ones had earned the right to block, and which hadn’t.

What allow_failure actually buys you

allow_failure: true is genuinely useful, and quietly corrosive. It lets a job report a problem without stopping the pipeline, which is exactly right for a check that’s noisy, not yet stable, or guarding against something you can’t fix this minute. The trouble is that a warning nobody is forced to act on is a warning nobody acts on. Leave a job advisory long enough and it becomes scenery: red, ignored, pointless. So an advisory check is really a promise, “I’ll make this blocking once it’s trustworthy”, and a promise you only ever mean to keep is just a lie you haven’t noticed yet.

When I migrated rust-tool-base from GitHub Actions to GitLab CI, the move landed six jobs as allow_failure: true: the macOS and Windows tests, the integration tests, cargo-audit, trivy, and coverage. That wasn’t laziness. A migration is the wrong moment to also be fighting flaky gates. But it left me holding six promises to either keep or admit I wasn’t going to.

A check earns the right to block

Here’s the rule I settled on. A check earns the right to fail your build when two things are true: it’s meaningful (a red result is a real problem, not noise) and it’s reliable (it goes red only when there genuinely is a problem, and it can actually run to completion). Flip a check to blocking before both hold and you haven’t raised the bar, you’ve taught the team to force-merge past red, which is worse than no gate at all, because now the red means nothing.

Three of my six crossed that line within a few days. Three deliberately didn’t. The reasons are the whole story.

trivy: blocked once there was nothing to block on

trivy scans the dependency tree for HIGH and CRITICAL advisories. It went across as advisory for an honest reason: the Cargo.lock at migration time already carried two known HIGH/CRITICAL advisories I hadn’t cleared yet, a path-traversal in gix-validate and a DNS-rebinding issue in rmcp. Make trivy blocking with those sitting there and the pipeline is red from day one, over problems I already knew about and was already fixing. So it stayed advisory until the dependency bumps cleared both, and then the allow_failure line came out. The gate never changed. The tree underneath it got clean enough to stand on.

integration-tests: blocked once it could actually run

The integration tests stand up a real Gitea in a Docker-in-Docker service and talk to it. They were advisory for a different reason: they couldn’t reliably run. dind needs a privileged runner, and the suite was resolving the container host with a hardcoded 127.0.0.1 that didn’t hold everywhere. Blocking a job that fails for infrastructure reasons rather than code reasons is the fastest way to make people distrust the entire pipeline. So the fix wasn’t in the YAML, it was making the thing dependable: privileged set on the runner, and the host resolved through the test library’s own get_host() instead of a hardcoded address. Once it ran the same way every time, it earned the gate.

coverage: blocked once it could run at all, then once it cleared the bar

Coverage is the two-step one, and my favourite, because it nearly didn’t make it for a thoroughly undramatic reason: it ran out of memory. cargo llvm-cov instruments every test binary, and linking hundreds of instrumented object files needs more RAM than the shared medium runner had, so the job bus-errored on the link. I tagged it onto a larger runner, and then the shared SaaS runners were switched off entirely, so the tag matched nothing and the job sat pending forever.

The fix was a self-hosted homelab runner with the RAM the instrumented link actually needs. I moved coverage there but kept it advisory for one run, to confirm the box could finish the build before I trusted it. It did, at 73.22% line coverage, so I set the gate to fail under 70% and made it blocking. Three points of headroom: enough that ordinary churn won’t trip it, tight enough that a real drop will. A coverage gate pinned to the current number is a tripwire that fires on the very next commit; set it a touch below and it catches regressions instead of normal life.

The three I left advisory, on purpose

The point was never “block everything”. Three jobs are still allow_failure in the current pipeline, deliberately. The macOS and Windows tests run on SaaS runners that bill by the minute; they’re worth running, not worth blocking every merge of a Linux-first project over a quota I’m choosing to ration. And cargo-audit stays advisory because cargo-deny already does the blocking advisory check: cargo-audit is a second opinion from a different database, and a second opinion that can veto isn’t a second opinion, it’s a duplicate gate that will eventually disagree with the first and block you on the difference.

That’s the same rule from the other side. Those three haven’t earned the right to block, because blocking them would cost more than it ever caught.

The upshot

allow_failure: true is fine as a waiting room and corrosive as a destination. Every advisory check is a promise to make it blocking once it’s both meaningful and reliable, and the job is to keep the promise or admit you won’t. trivy earned its gate when the advisories cleared, the integration tests when they ran the same way every time, coverage when it had a runner with enough memory and a threshold set just below the current mark. The three I left advisory earned that standing too, by costing more to block than they’d catch. The YAML is one deleted line per job. Knowing which line to delete, and when, is the whole skill.