Feature List as Immutable Contract

Nikola Balic (@nibzard)

Problem

Long-running agents exhibit several failure modes when tasked with building complete applications:

Premature victory declaration: Agent declares "done" after implementing a fraction of requirements
Scope creep via test deletion: Agent "passes" tests by deleting or weakening them rather than fixing code
Hallucinated completeness: Agent loses track of what was actually implemented versus planned
Feature drift: Without a fixed specification, agents may substitute easier features for harder ones
Progress amnesia: Across sessions, agents forget what's done vs. pending

Solution

Define all features upfront in a structured, immutable format that agents can read but cannot meaningfully game:

1. Comprehensive Feature Specification

Create a JSON file with ALL required features before any implementation begins:

{
  "features": [
    {
      "id": "auth-001",
      "category": "functional",
      "description": "New chat button creates fresh conversation",
      "steps": [
        "Click 'New Chat' button in sidebar",
        "Verify URL changes to new conversation ID",
        "Verify message input is empty and focused",
        "Verify no previous messages are displayed"
      ],
      "passes": false
    },
    {
      "id": "auth-002",
      "category": "functional",
      "description": "User can log out and session is cleared",
      "steps": [
        "Click user profile menu",
        "Click 'Log out' option",
        "Verify redirect to login page",
        "Verify protected routes are inaccessible"
      ],
      "passes": false
    }
  ]
}

2. Immutability Constraints

Enforce through prompt instructions:

Agent MAY set passes: true after verification
Agent MAY NOT delete features from the list
Agent MAY NOT modify acceptance criteria/steps
Agent MAY NOT mark features as "not applicable"

Two implementation variations:

Static list: Features hardcoded at compile time (maximum security, requires redeploy to change)
Dynamic-but-immutable: Features loaded at startup then frozen (config changes via restart, used by LangChain/CrewAI)

3. Verification Requirements

Features are only marked passing after:

Implementation is complete
Manual or automated testing confirms all steps pass
Agent has actually exercised the feature (not just written code)

graph TD A[Feature List Created] --> B[All Features: passes=false] B --> C{Select Next Feature} C --> D[Implement Feature] D --> E[Test Feature] E --> F{All Steps Pass?} F -->|No| D F -->|Yes| G[Set passes=true] G --> H{More Features?} H -->|Yes| C H -->|No| I[Project Complete] style A fill:#e1f5fe style I fill:#c8e6c9 style G fill:#fff9c4

How to use it

Creating effective feature lists:

Include 100-200+ features for complex applications
Be specific in acceptance criteria (observable behaviors, not implementation details)
Group by category (functional, UI, performance, security)
Include edge cases and error handling as separate features
Write features as a human tester would verify them

Prompt enforcement:

CRITICAL RULES:
1. You MUST NOT delete or modify any feature in feature-list.json
2. You MUST NOT edit the "steps" or "description" fields
3. You MAY ONLY change "passes" from false to true
4. You MUST actually test the feature before marking it passing
5. If a feature seems impossible, ask the user - do NOT skip it

Verification tooling:

Use browser automation (Puppeteer, Playwright) for E2E verification
Run features as a user would, not just unit tests
Capture evidence (screenshots, logs) when marking features passing

Trade-offs

Pros:

Prevents premature victory declaration with clear completion criteria
Creates immutable record of requirements that survives session boundaries
Makes agent progress measurable (X of Y features passing)
Eliminates "pass by deletion" attack vector
Provides natural work queue for incremental sessions

Cons:

Requires significant upfront investment in feature specification
Not suitable for exploratory or research-oriented work
May miss emergent requirements discovered during implementation
Rigid format doesn't accommodate changing requirements
Large feature lists can overwhelm agent context

Security implications:

Guaranteed by Immutable Contract	Not Guaranteed (requires additional patterns)
No unauthorized tool access	Prompt injection in parameters
Predictable attack surface	Authorization bypass
Schema validation prevents injection	Output exfiltration

When to use:

Building complete applications with known requirements
Projects spanning many agent sessions
When agent accountability is important
Replicable workflows (same feature list for multiple similar projects)

When to avoid:

Exploratory prototyping
Research tasks with unclear scope
Small, single-session tasks
Rapidly evolving requirements

References

Anthropic Engineering: Effective Harnesses for Long-Running Agents
Action-Selector Pattern (Beurer-Kellner et al., 2025)
Related: Initializer-Maintainer Dual Agent Architecture — extends this pattern with two-agent lifecycle
Related: Action-Selector Pattern — alternative approach using allowlists
Related: Sandboxed Tool Authorization — complementary pattern for capability restriction

Source: https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents