Anthropic engineers discuss building AI agents that run for hours, covering challenges, harness design, and model improvements.
Key Takeaways
- Building long-running AI agents requires addressing context limitations, planning, and self-evaluation challenges.
- Model improvements and harness design must co-evolve to extend agent runtime effectively.
- Anthropic’s Agent SDK offers reusable primitives to support complex, multi-hour agent workflows.
- Iterative feedback loops and agent negotiation improve task completion accuracy and coherence.
- Developers can start experimenting with harnesses and long-running agents without needing Anthropic’s internal tools.
Summary
- Ash Prabaker and Andrew Wilson from Anthropic present techniques to build AI agents capable of running for extended periods, such as 5-6 hours or more.
- They discuss challenges including limited context windows, context rot, context sense anxiety, poor planning, and models' inability to accurately judge their own output.
- Two main solutions are improving the model itself and enhancing the harness or scaffolding around the model.
- The evolution of Anthropic's models and harnesses is highlighted, showing progress from short runs to agents running for days.
- The Anthropic Agent SDK provides primitives like core agent loops, tool integrations, permission systems, and sub-agent delegation.
- They emphasize co-evolution of model capabilities and harness improvements to achieve longer, more coherent agent runs.
- Examples include Claude Code and its internal loops, negotiation between agents on task completion, and iterative improvement via feedback.
- The importance of reading execution traces and continuous iteration is stressed for building robust long-running agents.
- The session covers experimental techniques and state-of-the-art approaches to improve agent self-evaluation and task planning.
- They encourage developers to experiment with harness components and share that internal tools are not mandatory to start building long-running agents.
Chapters
- 00:00Introduction and session overview
- 04:10Challenges in building long-running agents
- 08:17Solutions: model improvements and harness design
- 13:17History and evolution of Anthropic’s agent technology
- 16:32Agent negotiation and task verification techniques
- 19:37Iterative feedback and trace analysis
- 25:15Experimental harness techniques and state-of-the-art approaches
- 30:40Closing remarks and developer encouragement











