Skip to content

Iterative Prompt & Skill Refinement NEW

Nikola Balic (@nibzard)

Problem

Agent usage reveals gaps in prompts, skills, and tools—but how do you systematically improve them? When a workflow fails or behaves sub-optimally, you need multiple mechanisms to capture feedback and iterate. Single approaches aren't enough; you need a multi-pronged refinement strategy.

Solution

Implement multiple complementary refinement mechanisms that work together. No single mechanism catches all issues—you need layered approaches.

Four key mechanisms:

1. Responsive Feedback (Primary) - Monitor internal #ai channel for issues - Skim workflow interactions daily - This is the most valuable ongoing source of improvement

2. Owner-Led Refinement (Secondary) - Store prompts in editable documents (Notion, Google Docs) - Most prompts editable by anyone at the company - Include prompt links in workflow outputs (Slack messages, Jira comments) - Prompts must be discoverable + editable

3. Claude-Enhanced Refinement (Specialized) - Use Datadog MCP to pull logs into skill repository - Skills are a "platform" used by many workflows - Often maintained by central AI team, not individual owners

4. Dashboard Tracking (Quantitative) - Track workflow run frequency and errors - Track tool usage (how often each skill loads) - Data-driven prioritization of improvements

graph TD A[Workflow Runs] --> B[Feedback Channel: #ai] A --> C[Owner Edits Prompts] A --> D[Datadog Logs → Claude] A --> E[Dashboards: Metrics] B --> F[Identify Issues] C --> F D --> F E --> F F --> G[Update Prompts/Skills] G --> A style B fill:#e8f5e9,stroke:#388e3c,stroke-width:2px style E fill:#e3f2fd,stroke:#1976d2,stroke-width:2px

How to use it

Implementation checklist:

  • [ ] Feedback channel: Internal Slack/Discord for agent issues
  • [ ] Editable prompts: Store in Notion/docs, not code
  • [ ] Prompt links: Include in every workflow output
  • [ ] Log access: Datadog/observability with MCP integration
  • [ ] Dashboards: Track workflow runs, errors, tool usage

Refinement workflow:

# After each workflow run, include link
workflow_result = {
    "output": "...",
    "prompt_link": "https://notion.so/prompt-abc123"
}

Discovery strategy:

  • Daily: Skim feedback channel, review workflow interactions
  • Weekly: Review dashboard metrics for error spikes
  • Ad-hoc: Pull logs when specific issues reported
  • Quarterly: Comprehensive prompt/skill audit

Post-run evals (next step):

Include subjective eval after each run: - Was this workflow effective? - What would have made it better? - Human-in-the-loop to nudge evolution

Trade-offs

Pros:

  • Multi-layered: Catches issues different mechanisms miss
  • Continuous: Always improving, not episodic
  • Accessible: Anyone can contribute to improvement
  • Data-driven: Dashboards prioritize what matters
  • Skill-sharing: Central team can maintain platform-level skills

Cons:

  • No silver bullet: Can't eliminate any mechanism
  • Maintenance overhead: Multiple systems to manage
  • Permission complexity: Need balanced edit access
  • Alert fatigue: Too many signals can overwhelm

Workflow archetypes:

Type Refinement Strategy
Chatbots Post-run evals + human-in-the-loop
Well-understood workflows Code-driven (deterministic)
Not-yet-understood workflows The open question

Open challenge: How to scalably identify and iterate on "not-yet-well-understood" workflows without product engineers implementing each individually?

References