ftl: Claude Code orchestrator with memory

This post introduces ftl - a Claude Code orchestrator that builds knowledge over time.

Introduction

Before Opus 4.5, agentic harnesses focused on working around the two worst tendencies of LLMs: scope creep and over-engineering. Coding agents felt like overeager junior-savants that had to be carefully steered whenever projects became even moderately complex.

Opus 4.5 broke this pattern. If you've been using it, you've likely felt the shift. We are now living the transformation of LLM agents from spastic assistants to true collaborators.

ftl is built on this shift. While previous harnesses constrained models to prevent drift, ftl persists knowledge across sessions to build on what we've already done instead of always starting from an empty context window.

This post goes over what it does and how it fits together.

Philosophy

ftl is built on five principles:

Principle What it means
Memory compounds Each task leaves the system smarter
Verify first Shape work by starting with proof-of-success
Bounded scope Workspace files are explicit so humans can audit agent boundaries
Present over future Implement current requests, not anticipated needs
Edit over create Modify what exists before creating something new

These aren't new. In fact, they read like the 101s of good software development. But anyone who's worked with coding agents knows that the models like to work and stay busy. Every part of ftl is built around these principles to turn them into the orchestrator's north star.

The development loop

/ftl <task> → router → builder → learner → workspace/
      ↓                                         ↓
/ftl campaign → planner → tasks → synthesizer → memory
      ↓                                         ↓
      └─────────── queries precedent ───────────┘

Tasks produce workspace files capturing decisions, reasoning, and patterns. Memory indexes these into a queryable knowledge graph. Campaigns coordinate multi-task objectives, querying memory for precedent before planning.

Each completed task makes the system smarter. Patterns emerge over time to influence future work.

Agents

ftl coordinates six specialized agents:

Agent Role
Router Route + explore + anchor. Creates workspace for full tasks.
Builder TDD implementation within Delta. Test-first, edit-over-create.
Reflector Failure diagnosis. Returns RETRY with strategy or ESCALATE to human.
Learner Extract patterns to Key Findings + index to memory.
Planner Verification-first campaign decomposition.
Synthesizer Cross-campaign meta-pattern extraction.

The division of labor is deliberate. Each agent does one thing well. The Router decides what kind of task this is. The Builder implements exactly what was anchored. The Reflector handles failures without hiding them. The Learner makes sure completed work feeds back into the system.

Task routing

The Router makes the first call: is this a full task or direct execution?

Direct tasks are simple enough to execute immediately. Fix a typo. Add a log statement. Update a config value. No workspace file, no memory - just do it.

Full tasks get the complete treatment:

  1. Anchor: Create a workspace file with Path, Delta, and the thinking that led there.
  2. Build: Implement exactly what was anchored. Test-first. Run verification.
  3. Learn (conditional): Extract reusable patterns if they emerged.

The workspace file is the key artifact. It captures not just what was done, but the transformation, the scope, and the thinking traces. This is what memory indexes.

Workspace format

Full tasks produce workspace files in workspace/:

# NNN: [Decision Title]

## Question
[What decision does this resolve?]

## Precedent
[Injected from memory - patterns, antipatterns, related decisions]

## Options Considered
[Alternatives explored and rejected]

## Decision
[Explicit choice with rationale]

## Implementation
Path: [Input] → [Processing] → [Output]
Delta: [files in scope]
Verify: [test command]

## Thinking Traces
[Exploration, dead ends, discoveries]

## Delivered
[What was implemented]

## Key Findings
#pattern/name #constraint/name

Naming follows: NNN_task-slug_status[_from-NNN].md

  • Status: active, complete, blocked
  • _from-NNN indicates lineage (builds on prior task)

Memory

There is a single source of truth: .ftl/memory.json

This memory stores decisions, patterns, signals, and development lineage. When you query for precedent, it searches this graph. When a task completes, the Learner indexes new patterns here.

The learning mechanism is signals. When a pattern works, you signal positive. When it causes problems, signal negative. Over time, successful patterns surface more readily in queries. Failed patterns fade. The graph learns which approaches work in your codebase.

Commands

Core

Command Purpose
/ftl:ftl <task> Execute task (routes to direct or full)
/ftl:ftl campaign <objective> Plan and execute multi-task campaign
/ftl:ftl query <topic> Surface relevant precedent from memory
/ftl:ftl status Combined campaign + workspace status

Workspace

Command Purpose
/ftl:workspace Query state, lineage, tags
/ftl:close Complete active task manually

Memory

Command Purpose
/ftl:learn Force pattern synthesis
/ftl:signal +/- #pattern Mark pattern outcome (+/-)
/ftl:trace #pattern Find decisions using a pattern
/ftl:impact <file> Find decisions affecting a file
/ftl:age [days] Find stale decisions
/ftl:decision NNN Full decision record with traces

Campaigns

Tasks are too granular. Projects are too permanent. What we need is something in between: a measurable objective spanning multiple tasks, with verification planning upfront.

A campaign is an objective that decomposes into multiple tasks with state that persists across sessions. The Planner thinks verification-first: "How will we prove this is done?" Then it queries memory for relevant precedent before decomposing the work.

The campaign flow

  1. Plan with verification-first thinking. "How will we prove this is done?"
  2. Query memory for relevant precedent.
  3. Delegate each task to the Router → Builder flow.
  4. Gate on workspace files. No workspace file = task isn't done.
  5. Reflect on failures via the Reflector.
  6. Synthesize cross-campaign patterns after completion.

Example session

# Simple task - routes to direct, no workspace
/ftl:ftl fix typo in README

# Complex task - routes to full, creates workspace
/ftl:ftl add user authentication

# Multi-task campaign
/ftl:ftl campaign implement OAuth with Google and GitHub

# Query what you've done before
/ftl:ftl query session handling

# Check status
/ftl:ftl status

# Mark a pattern as successful
/ftl:signal + #pattern/session-token-flow

# Find what touched a file
/ftl:impact src/auth/

Escalation as success

When a task fails, the Reflector classifies the problem:

  • Execution (code wrong) → RETRY with fix
  • Approach (design wrong) → RETRY with new strategy
  • Scope/Environment (external issue) → ESCALATE to human

The word "success" is doing important work here. Escalation isn't failure. It's the system working correctly. An agent that says "this is beyond my scope, here's what I tried" is more valuable than one that silently produces broken code.

I believe this is one of the most important principles for building reliable agents. The confidence to escalate is a feature.

When to use

Use ftl when:

  • Work should persist as precedent and compound over time
  • You want bounded, reviewable scope
  • Knowledge should build and evolve over sessions
  • Complex objectives need coordination

Skip ftl when:

  • Exploratory prototyping where you want the models to wander
  • Quick one-offs with no future value
  • Simple queries you'd ask Claude directly

Knowing when to reach for these tools - and when not to - is itself a skill.

Installation

# Add the crinzo-plugins marketplace
claude plugin marketplace add https://github.com/enzokro/crinzo-plugins

# Install ftl
claude plugin install ftl@crinzo-plugins

Or from inside of Claude Code:

/plugin marketplace add https://github.com/enzokro/crinzo-plugins
/plugin install ftl@crinzo-plugins

Conclusion

Opus 4.5 proved that agents can be true collaborators. What was missing was the architecture to let that collaboration compound.

ftl is one attempt at that architecture. Tasks that stay bounded. Memory that learns. Campaigns that persist across sessions.

The core insight is simple: context loss is an architecture problem, not a capability problem. If we structure knowledge to persist, it will persist. If we track what works, the system learns. If we create explicit boundaries, scope stays contained.

The models are ready. We just needed to build the scaffolding.