Realtime voice sessions
A live voice agent in one decorator — control plane owns the provider, every turn lands in the workflow DAG.
A live browser voice call wired through AgentField. The provider key never touches the browser, every tool call lands in the workflow DAG, and the whole thing is one decorator.
from agentfield import Agent
app = Agent(node_id="voice-support")
@app.reasoner()
async def resolve_turn(turn: dict) -> dict:
# Normal AgentField reasoner — retrieval, policy, app.call, escalation, anything.
return {
"spoken_response": "I found the order. It's stuck in customs — want me to start a replacement?",
"next_action": "offer_replacement",
}
@app.session(
"voice",
provider="openai",
transport="webrtc",
model="gpt-realtime-2",
modalities=["audio", "text"],
voice="marin",
tools=["voice-support.resolve_turn"], # provider-visible allowlist
tags=["support:voice"], # access-control tag
)
async def voice(session):
turn = await session.input() # next live turn from the provider
result = await session.call("voice-support.resolve_turn", turn=turn) # routes through the control plane
await session.say(result["spoken_response"]) # speaks back over the same transport
app.run()Start it from the CLI — the browser then exchanges its WebRTC SDP through the control plane, not directly with OpenAI:
af session start voice-support.voice \
--provider openai --transport webrtc \
--model gpt-realtime-2 --voice marinWhat this gives you
- The browser never holds the provider key. AgentField proxies the WebRTC SDP exchange and owns the provider boundary.
session.call(...)is handler-controlled orchestration;tools=[...]is the provider-visible allowlist for autonomous tool calls during the live audio loop. Both route throughexecute/asyncwithX-Session-IDattached, so every tool call lands in the session's workflow DAG.- Provider and transport are explicit. AgentField validates the pair (
openai/webrtc,openai/websocket,openrouter/audio_turns) and refuses to silently switch.
Compose it
Once a voice call is a session, it composes with everything else in AgentField. A few patterns this primitive unlocks:
The model picks tools mid-conversation. Add targets to tools=[...] and the live realtime model can decide, while it's still speaking, that it needs to look up an order or open an approval — the call goes through execute/async and lands in the workflow DAG like any other reasoner work.
@app.session("voice", provider="openai", transport="webrtc",
tools=["orders.lookup_order", "refunds.request_approval"])
async def voice(session):
...Same decorator, server-side transport. Swap one keyword and the same handler runs behind a websocket — useful for server-driven bridges (telephony, kiosks) where the browser is not the client.
@app.session("phone", provider="openai", transport="websocket", model="gpt-realtime-2")
async def phone(session): ...Hand off to a multi-agent workflow. session.call(...) is just app.call(...) with the session ID attached, so any voice turn can trigger a full reasoner pipeline (retrieval → policy → human approval → response) without leaving the call.
result = await session.call("support.handle_turn", turn=turn) # spawns a normal DAG
await session.say(result["spoken_response"])Every voice turn is a verifiable execution. Sessions inherit AgentField's identity and audit surfaces — the same DIDs, signed credentials, and workflow DAG that back reasoner work back the voice loop. Query a session's runs with af session workflows <session_id>.
Run the example
Full runnable project — Python agent + browser WebRTC page:
examples/python_agent_nodes/voice_dictation
Next