This post introduces
ftl- a Claude Code orchestrator that builds knowledge over time.
Before Opus 4.5, agentic harnesses focused on working around the two worst tendencies of LLMs: scope creep and over-engineering. Coding agents felt like overeager junior-savants that had to be carefully steered whenever projects became even moderately complex.
Opus 4.5 broke this pattern. If you've been using it, you've likely felt the shift. We are now living the transformation of LLM agents from spastic assistants to true collaborators.
ftl is built on this shift. While previous harnesses constrained models to prevent drift, ftl persists knowledge across sessions to build on what we've already done instead of always starting from an empty context window.
This post goes over what it does and how it fits together.
ftl is built on five principles:
| Principle | What it means |
|---|---|
| Memory compounds | Each task leaves the system smarter |
| Verify first | Shape work by starting with proof-of-success |
| Bounded scope | Workspace files are explicit so humans can audit agent boundaries |
| Present over future | Implement current requests, not anticipated needs |
| Edit over create | Modify what exists before creating something new |
These aren't new. In fact, they read like the 101s of good software development. But anyone who's worked with coding agents knows that the models like to work and stay busy. Every part of ftl is built around these principles to turn them into the orchestrator's north star.
/ftl <task> → router → builder → learner → workspace/
↓ ↓
/ftl campaign → planner → tasks → synthesizer → memory
↓ ↓
└─────────── queries precedent ───────────┘
Tasks produce workspace files capturing decisions, reasoning, and patterns. Memory indexes these into a queryable knowledge graph. Campaigns coordinate multi-task objectives, querying memory for precedent before planning.
Each completed task makes the system smarter. Patterns emerge over time to influence future work.
ftl coordinates six specialized agents:
| Agent | Role |
|---|---|
| Router | Route + explore + anchor. Creates workspace for full tasks. |
| Builder | TDD implementation within Delta. Test-first, edit-over-create. |
| Reflector | Failure diagnosis. Returns RETRY with strategy or ESCALATE to human. |
| Learner | Extract patterns to Key Findings + index to memory. |
| Planner | Verification-first campaign decomposition. |
| Synthesizer | Cross-campaign meta-pattern extraction. |
The division of labor is deliberate. Each agent does one thing well. The Router decides what kind of task this is. The Builder implements exactly what was anchored. The Reflector handles failures without hiding them. The Learner makes sure completed work feeds back into the system.
The Router makes the first call: is this a full task or direct execution?
Direct tasks are simple enough to execute immediately. Fix a typo. Add a log statement. Update a config value. No workspace file, no memory - just do it.
Full tasks get the complete treatment:
The workspace file is the key artifact. It captures not just what was done, but the transformation, the scope, and the thinking traces. This is what memory indexes.
Full tasks produce workspace files in workspace/:
# NNN: [Decision Title]
## Question
[What decision does this resolve?]
## Precedent
[Injected from memory - patterns, antipatterns, related decisions]
## Options Considered
[Alternatives explored and rejected]
## Decision
[Explicit choice with rationale]
## Implementation
Path: [Input] → [Processing] → [Output]
Delta: [files in scope]
Verify: [test command]
## Thinking Traces
[Exploration, dead ends, discoveries]
## Delivered
[What was implemented]
## Key Findings
#pattern/name #constraint/name
Naming follows: NNN_task-slug_status[_from-NNN].md
active, complete, blocked_from-NNN indicates lineage (builds on prior task)There is a single source of truth: .ftl/memory.json
This memory stores decisions, patterns, signals, and development lineage. When you query for precedent, it searches this graph. When a task completes, the Learner indexes new patterns here.
The learning mechanism is signals. When a pattern works, you signal positive. When it causes problems, signal negative. Over time, successful patterns surface more readily in queries. Failed patterns fade. The graph learns which approaches work in your codebase.
| Command | Purpose |
|---|---|
/ftl:ftl <task> |
Execute task (routes to direct or full) |
/ftl:ftl campaign <objective> |
Plan and execute multi-task campaign |
/ftl:ftl query <topic> |
Surface relevant precedent from memory |
/ftl:ftl status |
Combined campaign + workspace status |
| Command | Purpose |
|---|---|
/ftl:workspace |
Query state, lineage, tags |
/ftl:close |
Complete active task manually |
| Command | Purpose |
|---|---|
/ftl:learn |
Force pattern synthesis |
/ftl:signal +/- #pattern |
Mark pattern outcome (+/-) |
/ftl:trace #pattern |
Find decisions using a pattern |
/ftl:impact <file> |
Find decisions affecting a file |
/ftl:age [days] |
Find stale decisions |
/ftl:decision NNN |
Full decision record with traces |
Tasks are too granular. Projects are too permanent. What we need is something in between: a measurable objective spanning multiple tasks, with verification planning upfront.
A campaign is an objective that decomposes into multiple tasks with state that persists across sessions. The Planner thinks verification-first: "How will we prove this is done?" Then it queries memory for relevant precedent before decomposing the work.
# Simple task - routes to direct, no workspace
/ftl:ftl fix typo in README
# Complex task - routes to full, creates workspace
/ftl:ftl add user authentication
# Multi-task campaign
/ftl:ftl campaign implement OAuth with Google and GitHub
# Query what you've done before
/ftl:ftl query session handling
# Check status
/ftl:ftl status
# Mark a pattern as successful
/ftl:signal + #pattern/session-token-flow
# Find what touched a file
/ftl:impact src/auth/
When a task fails, the Reflector classifies the problem:
The word "success" is doing important work here. Escalation isn't failure. It's the system working correctly. An agent that says "this is beyond my scope, here's what I tried" is more valuable than one that silently produces broken code.
I believe this is one of the most important principles for building reliable agents. The confidence to escalate is a feature.
Use ftl when:
Skip ftl when:
Knowing when to reach for these tools - and when not to - is itself a skill.
# Add the crinzo-plugins marketplace
claude plugin marketplace add https://github.com/enzokro/crinzo-plugins
# Install ftl
claude plugin install ftl@crinzo-plugins
Or from inside of Claude Code:
/plugin marketplace add https://github.com/enzokro/crinzo-plugins
/plugin install ftl@crinzo-plugins
Opus 4.5 proved that agents can be true collaborators. What was missing was the architecture to let that collaboration compound.
ftl is one attempt at that architecture. Tasks that stay bounded. Memory that learns. Campaigns that persist across sessions.
The core insight is simple: context loss is an architecture problem, not a capability problem. If we structure knowledge to persist, it will persist. If we track what works, the system learns. If we create explicit boundaries, scope stays contained.
The models are ready. We just needed to build the scaffolding.