A/B test agent deployments
Run two versions of an agent side by side, route a percentage of traffic to each, and compare results from the workflow DAG.
Run two versions of the same agent at the same time. Each registers itself with the control plane under a different version string and variant tag. A router reasoner discovers both and splits traffic by hashing on the request key.
# agent_blue.py — current production version
from agentfield import Agent
app = Agent(
node_id="summarizer-blue",
version="2.0.0",
tags=["summarizer", "variant:control"],
)
@app.reasoner()
async def summarize(text: str) -> dict:
result = await app.ai(system="Summarize concisely.", user=text)
return {"variant": "control", "summary": str(result)}
app.run()# agent_green.py — candidate version
from agentfield import Agent
app = Agent(
node_id="summarizer-green",
version="3.0.0",
tags=["summarizer", "variant:treatment"],
)
@app.reasoner()
async def summarize(text: str) -> dict:
result = await app.ai(
system="Summarize concisely. Lead with the headline finding.",
user=text,
)
return {"variant": "treatment", "summary": str(result)}
app.run()# router.py — splits traffic 90/10
import hashlib
from agentfield import Agent
app = Agent(node_id="router")
@app.reasoner()
async def route_summarize(request_id: str, text: str) -> dict:
# Discover both variants — health-aware so dead agents drop out automatically
candidates = app.discover(tags=["summarizer"], health_status="active")
# Stable bucketing on request_id — same request always hits the same variant
bucket = int(hashlib.sha256(request_id.encode()).hexdigest(), 16) % 100
target_tag = "variant:treatment" if bucket < 10 else "variant:control"
chosen = next(
cap for cap in candidates.json.capabilities
if target_tag in (cap.tags or [])
)
target = f"{chosen.agent_id}.summarize"
result = await app.call(target, text=text)
app.note(f"routed {request_id} to {target}", tags=["ab-test", target_tag])
return result
app.run()After running both for a day, pull the workflow DAG to compare cost, latency, and outcomes by variant tag. Both registered versions show up side by side in the Agent nodes page with their tags, version strings, and live health:
What this gives you
- Two agent versions in the same registry, no load balancer needed.
- Stable hash-based bucketing keeps the same user on the same variant.
- The DAG records which variant served each request, so you can analyze the split offline.
Next
Orchestrate Claude Code, Codex, and Gemini alongside your agents
Dispatch a task to a coding agent — Claude Code, Codex, Gemini CLI, or OpenCode — and get back schema-validated output with cost and turn caps.
Access control between agents
Allow or deny cross-agent calls based on tags, function names, and even argument values.