Why TRW
AI coding tools forget everything between sessions. TRW is the methodology layer that makes the forgetting stop — so session 50 starts where session 49 left off, not where session 1 did.
You will learn
- Why stateless AI sessions produce the same gotchas, misses, and regressions over and over.
- What "knowledge compounding" means in practical terms, and how to tell if your project has it.
- The value hierarchy TRW enforces — truthfulness, quality, knowledge, velocity — and why the ordering is not decorative.
- How work looks without TRW, how it looks with TRW, and the cost of adopting it.
- Which projects benefit from TRW today, and which projects should skip it.
The problem
Every AI coding tool you have used resets to zero at the start of a session. The model does not remember your codebase. It does not remember the three hours you spent last Tuesday proving that the ORM double-flushes transactions when you call session.refresh() inside a context manager. It does not remember that your test suite has a 40-second warm start because of the Alembic migration check, or that the CI runner needs PYTHONDONTWRITEBYTECODE=1 to avoid a permissions error in the Docker layer.
So you explain it again. Session 1, session 5, session 23 — each one starts cold. You spend ten to twenty minutes re-establishing context that the previous session already established, and then you try to do real work on top of a foundation that has to be rebuilt from rubble every time.
Specific symptoms engineers report after a few months of AI-assisted development:
- The same gotcha solved five times in five different sessions, because nobody wrote it down in a place the next agent would read.
- The same test-coverage miss shipped three releases in a row, because the reviewer is a different agent each time and none of them inherit the lesson from the last one.
- The same architectural decision re-litigated every time a new feature touches it, because the reasoning exists only in a closed chat transcript nobody can search.
- The same lint-config footgun hit repeatedly, because "I'll remember this next time" is a lie you tell yourself about a system that cannot remember anything.
The generation quality is not the problem. The forgetting is.
What compounding means
Knowledge compounds when every session builds on the discoveries of the sessions before it. Session 50 starts with what session 1 discovered, validated, and survived. The gotcha from session 7 is already loaded into the new agent's context before it reads its first line of code. The convention established in session 14 is enforced automatically by a hook, so session 50 cannot accidentally violate it.
This is different from "writing it in the README." A README is a static document that does not participate in retrieval. It is read once if the agent is diligent, forgotten by the next session, and stale within a month. Compounding knowledge is indexed, scored by impact, surfaced when relevant, and pruned when it stops being useful.
A project has knowledge compounding when the 50th session demonstrably starts from a higher baseline than the 1st. You can test this: pick a problem that got solved in session 10 and hand the repo to a fresh agent in session 50. If the fresh agent rediscovers the same problem, nothing compounded. If the fresh agent inherits the solution before it starts work, you have compounding.
The value hierarchy
TRW agents operate under a strict, ordered value hierarchy. When two values conflict, the earlier value takes absolute precedence. The ordering is not a preference — it is a defense against the specific ways AI-assisted development fails under pressure.
Value hierarchy
1. Truthfulness
Never misrepresent the state of the work. A fluent false report does more damage than an awkward honest one. Tests that were skipped must be reported as skipped. Code that was generated but not read must be reported as generated-but-not-read. This is non-negotiable because every other value depends on accurate ground truth.
2. Quality
The work survives independent inspection. Tests pass. Types check. The reviewer — human or agent — can follow the reasoning. Quality comes second because truth without quality produces honestly-reported bad code, which is still better than dishonestly-reported bad code.
3. Knowledge
The session's discoveries outlive the session. A correct fix that nobody records is a correct fix that will be re-derived next quarter. Knowledge comes third because it exists in service of future truthfulness and future quality — it is the mechanism by which compounding happens.
4. Velocity
Ship. But last. Velocity comes last because it is the value most frequently used to rationalise damage to the first three. "We don't have time to write the test." "We don't have time to record the learning." "We don't have time to verify the claim." Those sentences are almost always wrong — the unrecorded learning will cost more later than the test would have cost now.
When an agent is about to cut a corner, the hierarchy tells it which corner it is cutting. If the corner is on Truthfulness, it stops. If the corner is on Quality, it stops unless explicitly authorised. If the corner is on Knowledge, it stops unless the discovery is genuinely trivial. If the corner is on Velocity, it may proceed — but only if the first three are intact.
Without TRW vs with TRW
This is not a feature comparison. It is a behaviour comparison between two operating models — one where the agent resets each session, and one where the agent inherits the team's accumulated context.
Without TRW
- Session opens. Agent re-reads the README, maybe the top-level source tree, forms a fresh mental model.
- Known gotchas are rediscovered the slow way, one by one.
- Decisions made in earlier sessions are absent. New decisions may contradict them silently.
- Tests skipped by the previous agent are not flagged. The gap persists until someone notices in production.
- Completion is claimed when the agent feels done. Nobody verifies that build-check passed or that tests actually ran.
- Session closes. Discoveries evaporate. The next session will encounter the same walls.
With TRW
- Session opens with prior learnings recalled, active run resumed, and protocol reminders loaded — before the agent sees its first prompt.
- Known gotchas are surfaced by the recall layer when the query is relevant, preventing the re-derivation loop.
- Decisions made in earlier sessions are indexed and searchable. Contradictions surface before they ship.
- Test-coverage gaps are caught by the verification gate. Delivery blocks until the gate has evidence.
- Completion requires passing build-check and test-coverage gates. "Feels done" is not accepted as evidence.
- Session closes via deliver. Discoveries are persisted, impact-scored, and promoted into startup context for the next session.
What TRW is not
TRW is a methodology layer that sits above your existing tools. It is deliberately not several other things, and being clear about the non-goals is the easiest way to understand the goal.
Not an IDE.
TRW does not replace Claude Code, Cursor, opencode, Codex, or Aider. It integrates with them. You keep the tool you like and TRW adds the persistence layer underneath.
Not a new model.
TRW does not generate code. It governs the work done by the model you already use. If a better model arrives next quarter, TRW works with it unchanged.
Not a cloud service.
TRW is local-first. Memory lives in your repo under
.trw/. Nothing leaves the machine unless you opt into shared services. Remove the config entry and TRW disappears from the session.Not a silver bullet for bad requirements.
If you hand an agent a vague task, TRW will help the agent execute the vague task consistently. It will not turn the vague task into a good task. Requirements quality is upstream of anything a methodology layer can fix.
When to adopt
The setup cost is non-zero. Installing TRW, pointing it at the right client, and training yourself to call trw_session_start() and trw_deliver() reliably takes a couple of hours. After that, the overhead per session is a few seconds at start and a few seconds at end.
Adopt TRW when you see these signals:
- You or your team work on the same codebase across many AI sessions, and you notice the same problems being re-solved.
- The codebase has non-obvious conventions, historical decisions, or integration gotchas that take more than five minutes to explain to a fresh agent.
- You have been burned by false completion claims — code that "works" until the CI runs, or tests that "pass" because they were never executed.
- You run multi-step work that spans context compaction, and you have lost progress to an unexpected window shrink.
- You work on the codebase often enough that saving one rediscovery per week pays back the setup cost in under a month.
When not to adopt
TRW is not appropriate for every situation. Skip it when the overhead will not pay back:
- One-shot scripts. If the code is throwaway and the context dies with the task, there is no future session to compound into. Use the agent raw.
- Single-session work. A weekend prototype does not benefit from memory infrastructure. Finish the work, take what you learned, and move on.
- Teams that will not use any structure. TRW enforces protocol through hooks. If the team's pattern is to bypass every gate on principle, the framework becomes friction instead of leverage. Fix the cultural question first.
- Codebases with no recurring problems. Some small projects are uniform enough that a fresh agent has nothing useful to inherit. If you cannot name three gotchas your project has, you may not need a memory layer.
Where to go next
If the case above matches your situation, install TRW and run one session end-to-end before reading anything else. If you want the mental model first, read core concepts. If you want to see the mechanism, read memory.
Next steps