What is Programmatic Tool Calling?
Programmatic tool calling lets an AI agent write code that invokes tools inside a sandboxed execution environment — instead of requiring a separate model round-trip for every tool call. The agent writes a script, the runtime executes it, and tool calls happen directly from the code. Only the final result is returned to the model's context window.
Why It Matters
Traditional tool use requires the model to generate a tool call, wait for the result, then decide the next step — one round-trip at a time. Programmatic tool calling eliminates this overhead for multi-step workflows.
Fewer Round-Trips
Call multiple tools in a single code execution instead of one model turn per tool.
Lower Token Usage
Intermediate results stay in the sandbox — only the summary enters the context window.
Data Filtering
Process and filter large tool outputs in code before they reach the model.
Native Control Flow
Use loops, conditionals, and error handling — the model writes real code, not just JSON calls.
How It Works
The flow involves four steps between the agent, sandbox, and your tool server.
Agent Writes Code
The model generates a Python script that calls your tools as async functions.
Sandbox Executes
The code runs in a sandboxed container. When a tool function is called, execution pauses.
Tool Runs Externally
Your server receives the tool call, executes it, and returns the result to the sandbox.
Result to Model
Once the script finishes, only the final output is added to the model's context.
Traditional vs Programmatic
See how the two approaches differ for a task that queries three database regions.
Traditional Tool Use
Each tool call requires a full model round-trip. For N tools, that's N inference passes.
Model → tool call → result → Model → tool call → result → Model → tool call → result → Model → final answer
7 model turns
Programmatic Tool Calling
Agent writes one script calling N tools. Only 1 inference pass needed.
Model → code (3 tool calls + aggregation) → final answer
2 model turns
Example: Programmatic Database Query
The agent writes Python that loops over regions, calls a database tool, and aggregates results — all in one execution.
regions = ["West", "East", "Central", "North", "South"]
results = {}
for region in regions:
data = await query_database(
f"SELECT SUM(revenue) as total FROM sales WHERE region='{region}'"
)
results[region] = data[0]["total"]
# Aggregate in code — only the summary reaches the model
top_region = max(results, key=results.get)
print(f"Top region: {top_region} ({results[top_region]:,.0f})")
print(f"All regions total: {sum(results.values()):,.0f}")Use Cases
Programmatic tool calling shines when agents need to do more than one-shot tool calls.
Batch Processing
Query a database for each of 50 regions in a loop, aggregate results, and return a summary — all in one execution.
Conditional Logic
Check file size first, then decide whether to read the full file or just a summary. No wasted round-trips.
Data Filtering
Fetch 10,000 log entries, filter to only errors, and return the last 10 — keeping the context window clean.
Early Termination
Check endpoints in sequence and stop as soon as a healthy one is found. No need to check all of them.
The allowed_callers Concept
When defining tools, you specify which contexts can invoke them. This controls whether a tool can only be called directly by the model, only from within code execution, or both.
For clarity, it's best to choose one mode per tool rather than enabling both. This gives the model clearer guidance on how to use each tool.
Direct Only
The model calls the tool directly via the standard tool-use flow. This is the default.
"allowed_callers": ["direct"]
Code Execution Only
The tool can only be invoked from within a sandboxed code execution environment.
"allowed_callers": ["code_execution"]
Both Modes
The tool can be called either directly or from code. Use sparingly — it can confuse tool selection.
"allowed_callers": ["direct", "code_execution"]
Key Takeaways
- 1Programmatic tool calling lets agents write code that calls tools, eliminating per-tool round-trips.
- 2Intermediate results stay in the sandbox and never enter the model's context window, saving tokens.
- 3Use allowed_callers to control whether tools are invoked directly, from code, or both.
- 4Best for batch processing, conditional workflows, data filtering, and multi-step tool chains.
- 5Multiple AI providers are implementing this pattern as a way to make agents faster and more efficient.