LEDGER // REQUIREMENTS

PRD-EVAL-023 — Behavioral Transcript Scoring CalibrationAARE-F

Requirements that govern the build, not just describe it

Every task gets a machine-readable PRD with acceptance criteria, quality gates, and FR-by-FR traceability. The agent verifies work against the spec — not against the conversation.

Acceptance criterion

Implementation

FR01

Rate limit: 100 requests per minute per IP

middleware/rate_limit.py

FR02

429 response includes Retry-After header

api/responses.py:42

FR03

Authenticated users exempt from IP-based limit

middleware/rate_limit.py:89

FR04

Limits configurable per route prefix

config/limits.yaml

FR05

Sliding-window algorithm, Redis-backed

services/redis_limiter.py

0 of 5 acceptance criteria satisfiedVERIFYING…

LEDGER // PROBLEM

Vague prompts build vague features

A plain AI-coding session has no governing spec. The agent receives a sentence in chat, produces code that looks plausible, and reports done. There is no artefact that says what “done” means, no criteria to test against, and no record of the intent behind each decision.

AI fluency does not equal correctness. A fluent model can write confident-sounding code that satisfies the prompt's literal words while missing every unstated requirement that mattered. Without machine-verifiable acceptance criteria, “done” is just the agent stopped typing.

Engineering leads feel this as defect rework. Compliance managers feel it as audit exposure. Developers feel it as the sinking realisation that a four-hour session produced something that fails on the first edge case they forgot to mention.

LEDGER // AARE_F

AARE-F — the structured spec format

AARE-F (AI-Augmented Requirements Engineering Framework) defines a structured PRD format — mandatory sections scaled by category, YAML frontmatter, confidence scores per FR (0.0–1.0), EARS-pattern compliance, and a bidirectional traceability matrix. Every PRD that passes validation carries the same structure so agents and humans can navigate any spec without a tutorial.

YAML frontmatterPRD ID, version, status, priority, risk level

AARE-F componentsC1–C9 mapped to the PRD

Evidence levelstrong | moderate | weak + sources

Confidence scoresper-FR 0.0–1.0 float, EARS-pattern linked

Traceabilityimplements / depends_on / enables / conflicts_with

Metrics + SLOssuccess criteria, measurement method, SLO targets

Quality gatesambiguity ≤5%, completeness ≥85%, traceability ≥90%

Problem statementbackground + measurable defects + impact

Goals + non-goalsexplicit scope boundary

User storiesrole / want / so-that + evidence required

Functional requirementsEARS pattern + acceptance criteria + confidence

Risk + dependenciesprobability, impact, mitigation, residual risk

TERMINAL // trw_prd_create

From prompt to PRD in seconds

One tool call turns a natural-language description into a sprint-ready AARE-F PRD. The agent does not write vague bullets — it emits a versioned document with every section filled, every FR carrying a confidence score, and every quality gate pre-checked. The ledger at the top of this page is the live output of PRD-EVAL-023, created with the same call shape below.

TOOL CALL

trw_prd_create(

input_text="Add rate limiting",

category="INFRA"

)

OUTPUT

PRD-INFRA-042 created — 15 FRs, 5 acceptance criteria each

FR01 Rate limit per IP: 100 req/min default [confidence: 0.92]

FR02 429 response includes Retry-After header [confidence: 0.90]

FR03 Authenticated users exempt from IP-based limit [confidence: 0.88]

FR04 Limits configurable per route prefix [confidence: 0.85]

FR05 Sliding-window algorithm, Redis-backed [confidence: 0.82]

Quality gate: READY — structure 15/15, completeness 28/30, clarity 18/20, traceability 24/25

LATTICE // VALIDATION

Quality gates before IMPLEMENT

trw_prd_validate returns a READY / NEEDS WORK / BLOCK verdict with per-dimension scores. A BLOCK verdict prevents moving to the IMPLEMENT phase until the issue is resolved — the agent cannot self-approve past a failing gate.

15pts

Structure

YAML frontmatter, 12 sections, ID format

30pts

Completeness

All required fields populated, non-goals explicit

20pts

Clarity

EARS patterns, no vague terms, measurable criteria

25pts

Traceability

FR → code → test links bidirectional

10pts

Quality gates

Ambiguity rate, completeness %, coverage %

READYNEEDS WORKBLOCK— three verdicts, one source of truth

LEDGER // TRACEABILITY

Traceability that sticks

The traceability matrix in every AARE-F PRD links each FR to the source requirement, the implementation file, and the test that verifies it. The trw-traceability-checker agent grep-verifies every FR has a corresponding code path and test before delivery is allowed.

Bidirectional means starting from either end works. An engineer browsing the code can jump to the FR that mandated a function. An auditor reading the PRD can see the exact commit that landed each requirement. The spec-ledger demo at the top of this page is that matrix in motion.

LATTICE // RELATED

Pairs with

Verification

Acceptance criteria from the PRD are what verification gates check. No spec means no verifiable "done."

Workflows

The PLAN phase produces the PRD; the VALIDATE phase runs checks against it. Requirements and workflows share the same phase clock.

Memory

High-impact PRD decisions get promoted into persistent memory so future tasks inherit the rationale without re-reading the doc.

LEDGER // FAQ

Common questions

Do I have to write PRDs manually?

No. trw_prd_create drafts a full AARE-F PRD from a single input sentence. You review and refine; you do not author from scratch.

What happens if functional requirements change mid-build?

Update the FRs in the PRD and re-run trw_prd_validate. The quality gate re-checks traceability before IMPLEMENT resumes. No silent drift.

Does this work for small bug fixes?

Yes. Smaller tasks produce smaller PRDs — a single FR with two acceptance criteria is valid. The 12-section structure scales down gracefully.

How strict are the quality gates?

READY / NEEDS WORK / BLOCK verdicts are configurable per project. Defaults: ambiguity ≤5%, completeness ≥85%, traceability ≥90%, consistency ≥95%.

TERMINAL // SHIP_WITH_SPECS

Ship what the spec says, not what the chat remembered

Start running machine-verifiable requirements in your repo — create a free account to install in minutes.