Realtime voice sessions

A live voice agent in one decorator — control plane owns the provider, every turn lands in the workflow DAG.

One @app.session decorator — live audio in, tools called, workflows traced

A live browser voice call wired through AgentField sessions. The provider key never touches the browser, every tool call lands in the workflow DAG, and the whole thing is one decorator.

from agentfield import Agent

app = Agent(node_id="voice-support")

@app.reasoner()
async def resolve_turn(turn: dict) -> dict:
    # Normal AgentField reasoner — retrieval, policy, app.call, escalation, anything.
    return {
        "spoken_response": "I found the order. It's stuck in customs — want me to start a replacement?",
        "next_action": "offer_replacement",
    }

@app.session(
    "voice",
    provider="openai",
    transport="webrtc",
    model="gpt-realtime-2",
    modalities=["audio", "text"],
    voice="marin",
    tools=["voice-support.resolve_turn"],   # provider-visible allowlist
    tags=["support:voice"],                 # access-control tag
)
async def voice(session):
    turn = await session.input()                                          # next live turn from the provider
    result = await session.call("voice-support.resolve_turn", turn=turn)  # routes through the control plane
    await session.say(result["spoken_response"])                          # speaks back over the same transport

app.run()

import { Agent } from "@agentfield/sdk";

const agent = new Agent({ nodeId: "voice-support" });

agent.reasoner("resolveTurn", async (ctx) => {
  // Normal AgentField reasoner — retrieval, policy, ctx.call, escalation, anything.
  return {
    spokenResponse: "I found the order. It's stuck in customs — want me to start a replacement?",
    nextAction: "offer_replacement",
  };
});

agent.session("voice", {
  provider: "openai",
  transport: "webrtc",
  model: "gpt-realtime-2",
  modalities: ["audio", "text"],
  voice: "marin",
  tools: ["voice-support.resolveTurn"],   // provider-visible allowlist
  tags: ["support:voice"],                // access-control tag
}, async (session) => {
  const turn = await session.input();                                       // next live turn from the provider
  const result = await session.call("voice-support.resolveTurn", { turn }); // routes through the control plane
  await session.say(result.spokenResponse);                                 // speaks back over the same transport
});

agent.serve();

package main

import (
    "context"
    "log"

    "github.com/Agent-Field/agentfield/sdk/go/agent"
)

func main() {
    a, err := agent.New(agent.Config{NodeID: "voice-support"})
    if err != nil {
        log.Fatal(err)
    }

    // Normal AgentField reasoner — retrieval, policy, a.Call, escalation, anything.
    a.RegisterReasoner("resolve_turn", func(ctx context.Context, input map[string]any) (any, error) {
        return map[string]any{
            "spoken_response": "I found the order. It's stuck in customs — want me to start a replacement?",
            "next_action":     "offer_replacement",
        }, nil
    })

    // Declare the realtime voice session. The control plane owns the provider
    // boundary and the live audio loop; the Go SDK registers the session
    // definition (provider, transport, model, voice, tools, tags).
    if err := a.RegisterSession("voice", "openai", "webrtc",
        agent.WithSessionModel("gpt-realtime-2"),
        agent.WithSessionModalities("audio", "text"),
        agent.WithSessionVoice("marin"),
        agent.WithSessionTools("voice-support.resolve_turn"), // provider-visible allowlist
        agent.WithSessionTags("support:voice"),               // access-control tag
    ); err != nil {
        log.Fatal(err)
    }

    log.Fatal(a.Run(context.Background()))
}

Start it from the CLI — the browser then exchanges its WebRTC SDP through the control plane, not directly with OpenAI:

af session start voice-support.voice \
  --provider openai --transport webrtc \
  --model gpt-realtime-2 --voice marin

What this gives you

The browser never holds the provider key. AgentField proxies the WebRTC SDP exchange and owns the provider boundary.
session.call(...) is handler-controlled orchestration; tools=[...] is the provider-visible allowlist for autonomous tool calls during the live audio loop. Both route through execute/async with X-Session-ID attached, so every tool call lands in the session's workflow DAG.
Provider and transport are explicit. AgentField validates the pair (openai/webrtc, openai/websocket, openrouter/audio_turns) and refuses to silently switch.

Compose it

Once a voice call is a session, it composes with everything else in AgentField. A few patterns this primitive unlocks:

The model picks tools mid-conversation. Add targets to tools=[...] and the live realtime model can decide, while it's still speaking, that it needs to look up an order or open an approval — the call goes through execute/async and lands in the workflow DAG like any other reasoner work.

@app.session("voice", provider="openai", transport="webrtc",
    tools=["orders.lookup_order", "refunds.request_approval"])
async def voice(session):
    ...

Same decorator, server-side transport. Swap one keyword and the same handler runs behind a websocket — useful for server-driven bridges (telephony, kiosks) where the browser is not the client.

@app.session("phone", provider="openai", transport="websocket", model="gpt-realtime-2")
async def phone(session): ...

Hand off to a multi-agent workflow. session.call(...) is just app.call(...) with the session ID attached, so any voice turn can trigger a full reasoner pipeline (retrieval → policy → human approval → response) without leaving the call.

result = await session.call("support.handle_turn", turn=turn)   # spawns a normal DAG
await session.say(result["spoken_response"])

Every voice turn is a verifiable execution. Sessions inherit AgentField's identity and audit surfaces — the same DIDs, signed credentials, and workflow DAG that back reasoner work back the voice loop. Query a session's runs with af session workflows <session_id>.

Run the example

Full runnable project — Python agent + browser WebRTC page: examples/python_agent_nodes/voice_dictation

Next

Realtime voice sessions

Compose it

On this page