There’s a special kind of CI job that everyone on a team quietly learns to
ignore: the one marked allow_failure: true. It runs, it goes red, the
pipeline goes green anyway, and after the third time you stop looking at it. I
inherited six of those when I moved rust-tool-base’s CI to GitLab. Over a few
days I turned three of them into real gates, and the interesting part was never
the YAML. It was working out which ones had earned the right to block, and
which hadn’t.
What allow_failure actually buys you
allow_failure: true is genuinely useful, and quietly corrosive. It lets a job
report a problem without stopping the pipeline, which is exactly right for a
check that’s noisy, not yet stable, or guarding against something you can’t fix
this minute. The trouble is that a warning nobody is forced to act on is a
warning nobody acts on. Leave a job advisory long enough and it becomes
scenery: red, ignored, pointless. So an advisory check is really a promise,
“I’ll make this blocking once it’s trustworthy”, and a promise you only ever
mean to keep is just a lie you haven’t noticed yet.
When I migrated rust-tool-base from GitHub Actions to GitLab CI,
the move landed six jobs as allow_failure: true: the macOS and Windows tests,
the integration tests, cargo-audit, trivy, and coverage. That wasn’t
laziness. A migration is the wrong moment to also be fighting flaky gates. But
it left me holding six promises to either keep or admit I wasn’t going to.
A check earns the right to block
Here’s the rule I settled on. A check earns the right to fail your build when two things are true: it’s meaningful (a red result is a real problem, not noise) and it’s reliable (it goes red only when there genuinely is a problem, and it can actually run to completion). Flip a check to blocking before both hold and you haven’t raised the bar, you’ve taught the team to force-merge past red, which is worse than no gate at all, because now the red means nothing.
Three of my six crossed that line within a few days. Three deliberately didn’t. The reasons are the whole story.
trivy: blocked once there was nothing to block on
trivy
scans the dependency tree for HIGH and CRITICAL advisories. It went across as
advisory for an honest reason: the Cargo.lock at migration time already
carried two known HIGH/CRITICAL advisories I hadn’t cleared yet, a
path-traversal in gix-validate and a DNS-rebinding issue in rmcp. Make
trivy blocking with those sitting there and the pipeline is red from day one,
over problems I already knew about and was already fixing. So it stayed
advisory until the dependency bumps cleared both, and then the allow_failure
line came out. The gate never changed. The tree underneath it got clean enough
to stand on.
integration-tests: blocked once it could actually run
The integration tests
stand up a real Gitea in a Docker-in-Docker service and talk to it. They were
advisory for a different reason: they couldn’t reliably run. dind needs a
privileged runner, and the suite was resolving the container host with a
hardcoded 127.0.0.1 that didn’t hold everywhere. Blocking a job that fails
for infrastructure reasons rather than code reasons is the fastest way to make
people distrust the entire pipeline. So the fix wasn’t in the YAML, it was
making the thing dependable: privileged set on the runner, and the host
resolved through the test library’s own get_host() instead of a hardcoded
address. Once it ran the same way every time, it earned the gate.
coverage: blocked once it could run at all, then once it cleared the bar
Coverage is the two-step one, and my favourite, because it nearly didn’t make
it for a thoroughly undramatic reason: it ran out of memory. cargo llvm-cov
instruments every test binary, and linking hundreds of instrumented object
files needs more RAM than the shared medium runner had, so the job bus-errored
on the link. I tagged it onto a larger runner, and then the shared SaaS runners
were switched off entirely, so the tag matched nothing and the job sat pending
forever.
The fix was a self-hosted homelab runner with the RAM the instrumented link actually needs. I moved coverage there but kept it advisory for one run, to confirm the box could finish the build before I trusted it. It did, at 73.22% line coverage, so I set the gate to fail under 70% and made it blocking. Three points of headroom: enough that ordinary churn won’t trip it, tight enough that a real drop will. A coverage gate pinned to the current number is a tripwire that fires on the very next commit; set it a touch below and it catches regressions instead of normal life.
The three I left advisory, on purpose
The point was never “block everything”. Three jobs are still allow_failure in
the current pipeline,
deliberately. The macOS and Windows tests run on SaaS runners that bill by the
minute; they’re worth running, not worth blocking every merge of a Linux-first
project over a quota I’m choosing to ration. And cargo-audit stays advisory
because cargo-deny already does the blocking advisory check: cargo-audit is a
second opinion from a different database, and a second opinion that can veto
isn’t a second opinion, it’s a duplicate gate that will eventually disagree with
the first and block you on the difference.
That’s the same rule from the other side. Those three haven’t earned the right to block, because blocking them would cost more than it ever caught.
The upshot
allow_failure: true is fine as a waiting room and corrosive as a destination.
Every advisory check is a promise to make it blocking once it’s both meaningful
and reliable, and the job is to keep the promise or admit you won’t. trivy
earned its gate when the advisories cleared, the integration tests when they
ran the same way every time, coverage when it had a runner with enough memory
and a threshold set just below the current mark. The three I left advisory
earned that standing too, by costing more to block than they’d catch. The YAML
is one deleted line per job. Knowing which line to delete, and when, is the
whole skill.
