Feature List as Immutable Contract
Problem
Long-running agents exhibit several failure modes when tasked with building complete applications:
- Premature victory declaration: Agent declares "done" after implementing a fraction of requirements
- Scope creep via test deletion: Agent "passes" tests by deleting or weakening them rather than fixing code
- Hallucinated completeness: Agent loses track of what was actually implemented versus planned
- Feature drift: Without a fixed specification, agents may substitute easier features for harder ones
- Progress amnesia: Across sessions, agents forget what's done vs. pending
Solution
Define all features upfront in a structured, immutable format that agents can read but cannot meaningfully game:
1. Comprehensive Feature Specification
Create a JSON file with ALL required features before any implementation begins:
{
"features": [
{
"id": "auth-001",
"category": "functional",
"description": "New chat button creates fresh conversation",
"steps": [
"Click 'New Chat' button in sidebar",
"Verify URL changes to new conversation ID",
"Verify message input is empty and focused",
"Verify no previous messages are displayed"
],
"passes": false
},
{
"id": "auth-002",
"category": "functional",
"description": "User can log out and session is cleared",
"steps": [
"Click user profile menu",
"Click 'Log out' option",
"Verify redirect to login page",
"Verify protected routes are inaccessible"
],
"passes": false
}
]
}
2. Immutability Constraints
Enforce through prompt instructions:
- Agent MAY set
passes: trueafter verification - Agent MAY NOT delete features from the list
- Agent MAY NOT modify acceptance criteria/steps
- Agent MAY NOT mark features as "not applicable"
3. Verification Requirements
Features are only marked passing after:
- Implementation is complete
- Manual or automated testing confirms all steps pass
- Agent has actually exercised the feature (not just written code)
graph TD
A[Feature List Created] --> B[All Features: passes=false]
B --> C{Select Next Feature}
C --> D[Implement Feature]
D --> E[Test Feature]
E --> F{All Steps Pass?}
F -->|No| D
F -->|Yes| G[Set passes=true]
G --> H{More Features?}
H -->|Yes| C
H -->|No| I[Project Complete]
style A fill:#e1f5fe
style I fill:#c8e6c9
style G fill:#fff9c4
How to use it
Creating effective feature lists:
- Include 100-200+ features for complex applications
- Be specific in acceptance criteria (observable behaviors, not implementation details)
- Group by category (functional, UI, performance, security)
- Include edge cases and error handling as separate features
- Write features as a human tester would verify them
Prompt enforcement:
CRITICAL RULES:
1. You MUST NOT delete or modify any feature in feature-list.json
2. You MUST NOT edit the "steps" or "description" fields
3. You MAY ONLY change "passes" from false to true
4. You MUST actually test the feature before marking it passing
5. If a feature seems impossible, ask the user - do NOT skip it
Verification tooling:
- Use browser automation (Puppeteer, Playwright) for E2E verification
- Run features as a user would, not just unit tests
- Capture evidence (screenshots, logs) when marking features passing
Trade-offs
Pros:
- Prevents premature victory declaration with clear completion criteria
- Creates immutable record of requirements that survives session boundaries
- Makes agent progress measurable (X of Y features passing)
- Eliminates "pass by deletion" attack vector
- Provides natural work queue for incremental sessions
Cons:
- Requires significant upfront investment in feature specification
- Not suitable for exploratory or research-oriented work
- May miss emergent requirements discovered during implementation
- Rigid format doesn't accommodate changing requirements
- Large feature lists can overwhelm agent context
When to use:
- Building complete applications with known requirements
- Projects spanning many agent sessions
- When agent accountability is important
- Replicable workflows (same feature list for multiple similar projects)
When to avoid:
- Exploratory prototyping
- Research tasks with unclear scope
- Small, single-session tasks
- Rapidly evolving requirements