Skip to content
Quick Guides
Quick Guides

A/B test agent deployments

Run two versions of an agent side by side, route a percentage of traffic to each, and compare results from the workflow DAG.

Run two versions of the same agent at the same time. They share one node_id, register with different version values, and the control plane splits traffic when callers use the normal execution endpoint.

# agent_v1.py -- control
from agentfield import Agent

app = Agent(
    node_id="summarizer",
    version="2.0.0",
    tags=["summarizer", "variant:control"],
)

@app.reasoner()
async def summarize(text: str) -> dict:
    result = await app.ai(system="Summarize concisely.", user=text)
    return {"variant": "control", "summary": str(result)}

app.run(port=9200)
# agent_v2.py -- treatment
from agentfield import Agent

app = Agent(
    node_id="summarizer",
    version="3.0.0",
    tags=["summarizer", "variant:treatment"],
)

@app.reasoner()
async def summarize(text: str) -> dict:
    result = await app.ai(
        system="Summarize concisely. Lead with the headline finding.",
        user=text,
    )
    return {"variant": "treatment", "summary": str(result)}

app.run(port=9201)

Call the shared target. The control plane chooses a healthy registered version and returns the selected version in the response headers.

curl -i -X POST http://localhost:8080/api/v1/execute/summarizer.summarize \
  -H "Content-Type: application/json" \
  -d '{"input": {"text": "AgentField gives agents backend-shaped deployment controls."}}'

# Look for:
# X-Routed-Version: 3.0.0

Adjust the rollout with traffic weights:

# Send roughly 90% to v2.0.0 and 10% to v3.0.0.
curl -X PUT http://localhost:8080/api/v1/connector/reasoners/summarizer/versions/2.0.0/weight \
  -H "X-Connector-Token: $AGENTFIELD_CONNECTOR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"weight": 90}'

curl -X PUT http://localhost:8080/api/v1/connector/reasoners/summarizer/versions/3.0.0/weight \
  -H "X-Connector-Token: $AGENTFIELD_CONNECTOR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"weight": 10}'

For rollback, set the treatment weight to 0 or stop the candidate agent. The execution API keeps the same target, so callers do not need a deploy-time switch.

If the choice needs request-specific logic, keep the split inside a reasoner instead:

# router.py -- choose variants with custom logic
import hashlib
from agentfield import Agent

app = Agent(node_id="summarizer-router")

@app.reasoner()
async def route_summarize(request_id: str, text: str) -> dict:
    # Discover both variants -- health-aware so dead agents drop out automatically.
    candidates = app.discover(tags=["summarizer"], health_status="active")

    # Stable bucketing on request_id -- same request always hits the same variant.
    bucket = int(hashlib.sha256(request_id.encode()).hexdigest(), 16) % 100
    target_tag = "variant:treatment" if bucket < 10 else "variant:control"

    chosen = next(
        cap for cap in candidates.json.capabilities
        if target_tag in (cap.tags or [])
    )
    target = f"{chosen.agent_id}.summarize"

    result = await app.call(target, text=text)
    app.note(f"routed {request_id} to {target}", tags=["ab-test", target_tag])
    return result

app.run()

After running both for a day, compare cost, latency, errors, and output quality by routed version or variant tag. Both registered versions show up side by side in the Agent nodes page with their tags, version strings, and live health:

Agent nodes page showing both agents online with their auth tags, version strings, and live heartbeat — code-forge expanded with all 17 reasoner endpoints visible

What this gives you

  • Native control-plane routing for canaries and A/B tests without changing callers.
  • Traffic weights you can adjust through REST as the rollout progresses.
  • A programmable reasoner pattern when routing depends on request content, account rules, or an LLM decision.

Next