Skip to content

Progressive Disclosure for Large Files NEW

Nikola Balic (@nibzard)

Problem

Large files (PDFs, DOCXs, images) overwhelm the context window when loaded naively. A 5-10MB PDF may contain only 10-20KB of relevant text/tables, but the entire file is often shoved into context, wasting tokens and degrading performance.

Solution

Apply progressive disclosure: load file metadata first, then provide tools to load content on-demand.

Core approach:

  1. Always include file metadata in the prompt (not full content):

Files: - id: f_a1 name: my_image.png size: 500,000 preloaded: false - id: f_b3 name: report.pdf size: 8500000 preloaded: false

  1. Optionally preload first N KB of appropriate mimetypes (configurable per-workflow, can be 0)

  2. Provide three file operations:

  3. load_file(id) - Load entire file into context

  4. peek_file(id, start, stop) - Load a section of file
  5. extract_file(id) - Transform PDF/DOCX/PPT into simplified text

  6. Include a large_files skill explaining when/how to use these tools

# Agent workflow for document comparison
1. Prompt includes file metadata for report_2024.pdf and report_2025.pdf
2. Agent sees large PDFs, checks large_files skill
3. Agent calls: extract_file("report_2024.pdf")
4. Agent calls: extract_file("report_2025.pdf")
5. Agent compares extracted summaries using minimal context
# Agent workflow for image analysis
1. Prompt includes metadata for screenshot.png
2. Agent sees PNG type, calls: load_file("screenshot.png")
3. Image content is loaded, agent analyzes visual content

How to use it

Best for:

  • Document comparison workflows (multiple PDFs)
  • Ticket systems with file attachments (images, PDFs)
  • Data export analysis (large reports in various formats)
  • Any workflow where agents need file content but files are large

Implementation considerations:

  • File id should be a stable reference for tool calls
  • extract_file should return simplified text (tables, text content)
  • Consider making extract_file return a virtual file_id for very large extractions
  • Preloading first N KB is optional - can give agent initial context without full load

Tool design:

def load_file(file_id: str) -> str:
    """Load entire file content into context window."""

def peek_file(file_id: str, start: int, stop: int) -> str:
    """Load a specific byte range from file."""

def extract_file(file_id: str) -> str:
    """Convert PDF/DOCX/PPT to simplified text representation."""

Trade-offs

Pros:

  • Enables working with files much larger than context window
  • Agent has control over what/when to load
  • Reusable across workflows via large_files skill
  • Extracted content is often 100x smaller than original file

Cons:

  • Adds tool call overhead (multiple round-trips)
  • Requires preloading heuristics (how much is enough?)
  • Extraction from complex formats (DOCX) can be slow without native dependencies
  • Agent may make poor loading decisions without proper guidance

Trade-offs in preloading:

  • Preloading: Gives agent immediate context but reduces control
  • No preloading: Maximum agent control but requires explicit load calls

References