Verification
A technical walkthrough of the evidence gates that decide whether a session is allowed to close — what each gate checks, what it writes to disk, and how the stop-ceremony hook wires the gates into every supported client.
You will learn
- The five gates TRW enforces before it lets the agent close a session.
- The evidence each gate emits and where that evidence is written on disk.
- How the stop-ceremony hook fires across every supported client.
- When an override is the right call and how to make the override visible.
- Where to look when a gate fails and the agent claims it passed.
What is verification
Done means evidence, not assertion. Verification is the phase where the agent is required to produce artifacts that prove its claims — test output, type-check output, coverage numbers, a review verdict — before delivery is permitted. If the evidence does not exist, the agent has not finished, no matter how confident the prose sounds.
The point of the phase is not to add ceremony. The point is to separate the agent's narration from the build's ground truth. Narration is persuasive by nature; ground truth is a file on disk. Verification forces the agent to write the file and then step aside so a reviewer — human or otherwise — can read it directly.
The five gates
Each gate is independent. Each writes evidence you can inspect after the fact. The agent cannot advance past the verification phase until all five report pass.
| Gate | What it checks | Evidence emitted |
|---|---|---|
| tests_pass | Full test suite runs and reports zero failures. | Structured pytest summary captured in build-status.yaml. |
| types_clean | Static type checker reports zero errors under strict mode. | Type-checker output appended to build-status.yaml. |
| coverage_threshold | Line coverage meets or exceeds the configured floor. | Coverage percentage and totals recorded with the floor used. |
| review_clean | Independent adversarial auditor returns no block-level findings. | Per-finding verdict (pass / warn / block) with citations. |
| deliver_clock | trw_deliver fires with all prior gates satisfied. | Run closure entry with learnings persisted and phase transitions logged. |
Stop-ceremony hook
The stop-ceremony hook fires on the session-stop lifecycle event — the moment your client is about to close the agent's session. The hook inspects whether the gates ran and whether trw_deliver was called. If either is missing, it rejects the stop event and prepends a warning to the next tool response. Because the trigger is a lifecycle event and not a client-specific command, the same enforcement wiring applies in every supported client — only the configuration path differs.
The hook allows up to two rejections before an operator override becomes available. That cap is intentional: rigor should win close calls, but the hook must not trap a human who has a real reason to close.
Hook path per client
# .claude/hooks/stop-ceremony.sh
# Registered under "Stop" event in .claude/settings.json.
# Rejects the stop event if trw_build_check or trw_deliver
# has not been called this session.Overrides and thresholds
Each gate is independently toggleable and each threshold is configurable in .trw/config.yaml. Common overrides: lowering the coverage floor during a prototype spike, disabling the adversarial-auditor gate for a docs-only change, or relaxing a gate while a flaky suite is being repaired. The stop-ceremony hook itself should stay on — it is the mechanism that makes every other gate unskippable.
Any override is logged in build-status.yaml alongside the run, so a reviewer can see whether a session cleared the standard bar or a reduced one.
Debugging a failing gate
When a gate blocks delivery, start from the evidence, not the agent's explanation. The same three files answer most failures:
.trw/context/build-status.yaml— latest gate outcomes with timestamps, thresholds used, and raw tool summaries..trw/runs/<run-id>/meta/summary.yaml— the run's phase transitions and what was claimed versus what was measured..trw/context/ceremony-state.json— whether the stop-ceremony hook rejected a close and how many rejections have been used.
If the run is stuck across multiple retries, look for a configuration drift: a coverage floor that was tightened, a type-checker flag that was added to strict mode, a new test file that was committed without fixtures. The build-status history is ordered by run, so a diff between the last green run and the current failing run usually points at the regression in under a minute.
Next steps