A/B test agent deployments

Run two versions of an agent side by side, route a percentage of traffic to each, and compare results from the workflow DAG.

Run two versions of the same agent at the same time. Each registers itself with the control plane under a different version string and variant tag. A router reasoner discovers both and splits traffic by hashing on the request key.

# agent_blue.py — current production version
from agentfield import Agent

app = Agent(
    node_id="summarizer-blue",
    version="2.0.0",
    tags=["summarizer", "variant:control"],
)

@app.reasoner()
async def summarize(text: str) -> dict:
    result = await app.ai(system="Summarize concisely.", user=text)
    return {"variant": "control", "summary": str(result)}

app.run()

# agent_green.py — candidate version
from agentfield import Agent

app = Agent(
    node_id="summarizer-green",
    version="3.0.0",
    tags=["summarizer", "variant:treatment"],
)

@app.reasoner()
async def summarize(text: str) -> dict:
    result = await app.ai(
        system="Summarize concisely. Lead with the headline finding.",
        user=text,
    )
    return {"variant": "treatment", "summary": str(result)}

app.run()

# router.py — splits traffic 90/10
import hashlib
from agentfield import Agent

app = Agent(node_id="router")

@app.reasoner()
async def route_summarize(request_id: str, text: str) -> dict:
    # Discover both variants — health-aware so dead agents drop out automatically
    candidates = app.discover(tags=["summarizer"], health_status="active")

    # Stable bucketing on request_id — same request always hits the same variant
    bucket = int(hashlib.sha256(request_id.encode()).hexdigest(), 16) % 100
    target_tag = "variant:treatment" if bucket < 10 else "variant:control"

    chosen = next(
        cap for cap in candidates.json.capabilities
        if target_tag in (cap.tags or [])
    )
    target = f"{chosen.agent_id}.summarize"

    result = await app.call(target, text=text)
    app.note(f"routed {request_id} to {target}", tags=["ab-test", target_tag])
    return result

app.run()

After running both for a day, pull the workflow DAG to compare cost, latency, and outcomes by variant tag. Both registered versions show up side by side in the Agent nodes page with their tags, version strings, and live health:

Agent nodes page showing both agents online with their auth tags, version strings, and live heartbeat — code-forge expanded with all 17 reasoner endpoints visible

What this gives you

Two agent versions in the same registry, no load balancer needed.
Stable hash-based bucketing keeps the same user on the same variant.
The DAG records which variant served each request, so you can analyze the split offline.

Next