Skill Library Evolution
Problem
Agents frequently solve similar problems across different sessions or workflows. Without a mechanism to preserve and reuse working code, agents must rediscover solutions each time, wasting tokens and time. Organizations want agents to build up capability over time rather than starting from scratch every session.
Solution
Agents persist working code implementations as reusable functions in a skills/ directory. Over time, these implementations evolve into well-documented, tested "skills" that become higher-level capabilities the agent can leverage.
Evolution path:
Basic pattern:
# Session 1: Agent writes code to solve a problem
def analyze_sentiment(text):
# Implementation discovered through experimentation
response = llm.complete(f"Analyze sentiment: {text}")
return parse_sentiment(response)
# Agent saves working solution
with open("skills/analyze_sentiment.py", "w") as f:
f.write(inspect.getsource(analyze_sentiment))
Later session: Agent discovers and uses existing skill
# Agent explores skills directory
skills = os.listdir("skills/")
# Finds: ['analyze_sentiment.py', 'extract_entities.py', ...]
# Imports and uses existing skill
from skills.analyze_sentiment import analyze_sentiment
result = analyze_sentiment("Customer feedback here...")
Evolved skill with documentation:
# skills/analyze_sentiment.py
"""
SKILL: Sentiment Analysis
Analyzes text sentiment using LLM completion and structured parsing.
Args:
text (str): Text to analyze
granularity (str): 'binary' or 'fine-grained' (default: 'binary')
Returns:
dict: {'sentiment': str, 'confidence': float, 'aspects': list}
Example:
>>> analyze_sentiment("Great product, fast shipping!")
{'sentiment': 'positive', 'confidence': 0.92, 'aspects': ['product', 'shipping']}
Tested: 2024-01-15
Success rate: 94% on validation set
"""
def analyze_sentiment(text, granularity='binary'):
# Refined implementation
pass
How to use it
Implementation phases:
-
Ad-hoc → Saved
-
Agent writes code to solve immediate problem
- If solution works, save to
skills/directory -
Use descriptive names:
skills/pdf_to_markdown.py -
Saved → Reusable
-
Refactor for generalization (parameterize hard-coded values)
- Add basic error handling
-
Create simple function signature
-
Reusable → Documented
-
Add docstrings with purpose, parameters, returns, examples
- Include any prerequisites or dependencies
-
Note when last tested or validated
-
Documented → Capability
-
Agent can discover skills through directory listing
- Skills become part of agent's effective capability set
- Skills are composed into higher-level workflows
Skill organization:
skills/
├── README.md # Index of available skills
├── data_processing/
│ ├── csv_to_json.py
│ └── filter_outliers.py
├── api_integration/
│ ├── github_pr_summary.py
│ └── slack_notify.py
├── text_analysis/
│ ├── sentiment.py
│ └── extract_entities.py
└── tests/
└── test_sentiment.py # Validation tests for skills
Discovery pattern:
# Agent explores available skills
def discover_skills():
"""List available skills with descriptions."""
skills = []
for root, dirs, files in os.walk("skills/"):
for file in files:
if file.endswith(".py") and file != "__init__.py":
path = os.path.join(root, file)
# Extract docstring
with open(path) as f:
content = f.read()
docstring = extract_docstring(content)
skills.append({
'path': path,
'name': file[:-3],
'description': docstring.split('\n')[0] if docstring else ''
})
return skills
Trade-offs
Pros:
- Builds agent capability over time
- Reduces redundant problem-solving across sessions
- Creates organizational knowledge base in code form
- Skills can be tested, versioned, and improved
- Enables composition of higher-level capabilities
- Reduces token consumption (reuse vs. rewrite)
Cons:
- Requires discipline to save and organize skills
- Skills can become stale or outdated
- Need maintenance and testing infrastructure
- Namespace conflicts if skills grow large
- Agents must be prompted to check skills before writing new code
- Quality varies (not all saved code is good code)
Maintenance requirements:
- Regular review of skill quality and relevance
- Testing framework for skill validation
- Deprecation policy for outdated skills
- Documentation standards for new skills
- Version control to track skill evolution
Success factors:
- Clear naming conventions
- Good documentation from the start
- Encourage skill reuse through prompting
- Periodic skill library review and curation
- Examples and test cases for each skill
References
- Anthropic Engineering: Code Execution with MCP (2024)
- Related: Compounding Engineering Pattern