Custom Sandboxed Background Agent NEW

Nikola Balic (@nibzard)

Problem

Off-the-shelf coding agents (e.g., Devin, Claude Code, Cursor) are either:

Too generic - Not deeply integrated with company-specific dev environments, tools, and workflows
Vendor-locked - Tightly coupled to one model provider, limiting flexibility and creating dependency
Limited context - Cannot access internal infrastructure, private repos, or company-specific tooling

Companies need coding agents that:

Work within their specific development environment
Can iterate with closed feedback loops (compiler, linter, tests)
Provide real-time visibility into agent progress
Are model-agnostic to switch providers as needed

Solution

Build a custom background agent that runs in a sandboxed environment identical to developers, with:

1. Sandboxed Execution Environment - Use infrastructure like Modal for ephemeral, sandboxed dev environments - Agent runs with same context: codebase, dependencies, dev tools - Isolated from production but mirroring development setup

2. Real-Time Communication Layer - WebSocket connection streams stdout/stderr to client - User sees agent progress in real-time (not polling) - Two-way communication for prompts and status updates

3. Closed Feedback Loop - Agent iterates autonomously with machine-readable feedback - Compiler errors, linter warnings, test failures guide iterations - Not 1-shot implementation - but iterative refinement

4. Model-Agnostic Architecture - Support multiple frontier models via pluggable interface - Switch between providers without rewriting infrastructure - Leverage best model for specific task type

5. Company-Specific Integration - Deep integration with internal tooling, scripts, and workflows - Access to private repos, internal APIs, documentation - Customized to team's specific development practices

Example

How to use it

Architecture Components: - Sandbox provider: Modal, sprites.dev, or custom container orchestration - WebSocket server: For real-time bidirectional communication - Model abstraction layer: Interface supporting multiple LLM providers - Feedback loop integrations: Parser for compiler/linter/test output

Implementation Steps: 1. Create a sandbox service that spins up isolated dev environments 2. Build WebSocket layer for prompt submission and progress streaming 3. Implement model-agnostic agent interface 4. Add feedback loop integrations (test parsing, error ingestion) 5. Connect to version control for branch/PR creation

Why Custom vs. Off-the-Shelf: - Off-the-shelf agents (Devin, Cursor) work great for generic tasks - But they can't deeply integrate with your company's specific infrastructure - Building custom lets you optimize for your workflows, tools, and security requirements

Trade-offs

Pros:
Deep integration with company-specific tools and workflows
Model flexibility - not locked into one provider
Real-time visibility into agent progress and intermediate steps
Same context as developers - agent works in identical environment
Custom feedback loops tailored to your stack
Cons:
Engineering overhead - requires building and maintaining infrastructure
Security considerations - agent needs access to repos, credentials
Ongoing maintenance - unlike SaaS solutions, you own the ops burden
Requires devops expertise - sandbox management, websocket scaling

References

Source: https://engineering.ramp.com/post/why-we-built-our-background-agent