Requirements that govern the build, not just describe it
Every task gets a machine-readable PRD with acceptance criteria, quality gates, and FR-by-FR traceability. The agent verifies work against the spec — not against the conversation.
Context compaction false positive eliminated when FRAMEWORK.md string appears inside JSON
confidence: 0.95
pre_analyzers.py:_detect_compaction — line-start anchor regex
Error count derived from structured JSONL fields, not regex on file content
confidence: 0.90
pre_analyzers.py:_count_errors_jsonl — state.status=="error" probe
OpenCode tool names normalised: write/edit/read counted alongside Write/Edit/Read
confidence: 0.95
pre_analyzers.py:_parse_jsonl_ceremony — t.lower() normalisation
Claude-Code nested tool_use format parsed correctly from assistant.message.content
confidence: 0.85
pre_analyzers.py:_parse_jsonl_ceremony — nested content array walk
Planning quality prompt clarified: exploration tool calls count as "starting work"
confidence: 0.85
prompts.py:PASS2_SYSTEM — planning_quality_estimate anchor text
Retry-loop anchor present at 0.2–0.4 for 5+ retries of same failing command
confidence: 0.80
prompts.py:PASS2_SYSTEM — tool_usage_efficiency anchor text
termination_cause field populated deterministically: none | context_compaction | timeout | …
confidence: 0.85
schema.py:BehavioralAnalysis + pre_analyzers.py:apply_behavioral_overrides
planning_precedes_tools boolean populated from JSONL event index comparison
confidence: 0.80
schema.py:BehavioralPreAnalysis + pre_analyzers.py:_parse_jsonl_ceremony
Vague prompts build vague features
A plain AI-coding session has no governing spec. The agent receives a sentence in chat, produces code that looks plausible, and reports done. There is no artefact that says what “done” means, no criteria to test against, and no record of the intent behind each decision.
AI fluency does not equal correctness. A fluent model can write confident-sounding code that satisfies the prompt's literal words while missing every unstated requirement that mattered. Without machine-verifiable acceptance criteria, “done” is just the agent stopped typing.
Engineering leads feel this as defect rework. Compliance managers feel it as audit exposure. Developers feel it as the sinking realisation that a four-hour session produced something that fails on the first edge case they forgot to mention.
AARE-F — the structured spec format
AARE-F v1.1.0 (AI-Augmented Requirements Engineering Framework) defines 12 mandatory sections, YAML frontmatter, confidence scores per FR (0.0–1.0), EARS-pattern compliance, and a bidirectional traceability matrix. Every PRD that passes validation carries the same structure so agents and humans can navigate any spec without a tutorial.
From prompt to PRD in seconds
One tool call turns a natural-language description into a sprint-ready AARE-F PRD. The agent does not write vague bullets — it emits a versioned document with every section filled, every FR carrying a confidence score, and every quality gate pre-checked. The ledger at the top of this page is the live output of PRD-EVAL-023, created with the same call shape below.
trw_prd_create(
input_text="Add rate limiting",
category="INFRA"
)
PRD-INFRA-042 created — 15 FRs, 5 acceptance criteria each
FR01 Rate limit per IP: 100 req/min default [confidence: 0.92]
FR02 429 response includes Retry-After header [confidence: 0.90]
FR03 Authenticated users exempt from IP-based limit [confidence: 0.88]
FR04 Limits configurable per route prefix [confidence: 0.85]
FR05 Sliding-window algorithm, Redis-backed [confidence: 0.82]
Quality gate: READY — structure 15/15, completeness 28/30, clarity 18/20, traceability 24/25
Quality gates before IMPLEMENT
trw_prd_validate returns a READY / NEEDS WORK / BLOCK verdict with per-dimension scores. A BLOCK verdict prevents moving to the IMPLEMENT phase until the issue is resolved — the agent cannot self-approve past a failing gate.
Structure
YAML frontmatter, 12 sections, ID format
Completeness
All required fields populated, non-goals explicit
Clarity
EARS patterns, no vague terms, measurable criteria
Traceability
FR → code → test links bidirectional
Quality gates
Ambiguity rate, completeness %, coverage %
Traceability that sticks
The traceability matrix in every AARE-F PRD links each FR to the source requirement, the implementation file, and the test that verifies it. The trw-traceability-checker agent grep-verifies every FR has a corresponding code path and test before delivery is allowed.
Bidirectional means starting from either end works. An engineer browsing the code can jump to the FR that mandated a function. An auditor reading the PRD can see the exact commit that landed each requirement. The spec-ledger demo at the top of this page is that matrix in motion.
Pairs with
Verification
Acceptance criteria from the PRD are what verification gates check. No spec means no verifiable "done."
Workflows
The PLAN phase produces the PRD; the VALIDATE phase runs checks against it. Requirements and workflows share the same phase clock.
Memory
High-impact PRD decisions get promoted into persistent memory so future tasks inherit the rationale without re-reading the doc.
Common questions
Do I have to write PRDs manually?
What happens if functional requirements change mid-build?
Does this work for small bug fixes?
How strict are the quality gates?