Skip to content
Build
BuildBuilding Blocks

Sessions

Realtime and multimodal ingress for agent workflows, routed through the AgentField control plane.

One @app.session decorator — live audio in, tools called, workflows traced

Sessions are long-lived entrypoints for realtime or multimodal interactions. Use them when the caller is not making a single request-response call, but opening an interaction that may include audio, text, tool calls, and multiple turns.

The important boundary is the control plane:

  • The browser or external client starts an AgentField session.
  • The control plane owns the provider boundary and realtime transport.
  • Agent work still happens through reasoners and app.call.
  • Tool calls from the session are routed through execute/async with the session ID attached.

That means a voice call can still produce normal AgentField workflows, DAGs, replay surfaces, and provenance records.

Explicit provider and transport

AgentField validates provider and transport, but does not infer one from the other or switch providers for you. If a provider does not support a transport, the control plane returns a clear validation error.

from agentfield import Agent

app = Agent(node_id="voice-support-af")

@app.session(
    "voice",
    provider="openai",
    model="gpt-realtime-2",
    transport="webrtc",
    modalities=["audio", "text"],
    voice="marin",
    tools=["support.resolve_voice_turn"],
    tags=["support:voice", "pii:limited"],
)
async def voice(session):
    turn = await session.input()
    result = await session.call("support.resolve_voice_turn", turn=turn)
    await session.say(result["spoken_response"])
import { Agent } from "@agentfield/sdk";

const agent = new Agent({ nodeId: "voice-support-af" });

agent.session("voice", {
  provider: "openai",
  model: "gpt-realtime-2",
  transport: "webrtc",
  modalities: ["audio", "text"],
  voice: "marin",
  tools: ["support.resolveVoiceTurn"],
  tags: ["support:voice", "pii:limited"],
}, async (session) => {
  const turn = await session.input();
  const result = await session.call("support.resolveVoiceTurn", { turn });
  await session.say(result.spokenResponse);
});
app.RegisterSession("voice", "openai", "webrtc",
    agent.WithSessionModel("gpt-realtime-2"),
    agent.WithSessionModalities("audio", "text"),
    agent.WithSessionVoice("marin"),
    agent.WithSessionTools("support.resolve_voice_turn"),
    agent.WithSessionTags("support:voice", "pii:limited"),
)

Sessions vs Reasoners

Reasoners are callable units of work. Sessions are ingress surfaces that can call reasoners.

PrimitiveUse it forLifecycle
ReasonerOne typed decision, workflow step, or API capabilityRequest-response or async execution
SessionRealtime voice, live text, multimodal turns, browser callsLong-lived interaction with provider setup

Most applications use both: the session handles realtime input and output, while reasoners do the structured business work.

What tools Means

The tools option is not how your session handler gets access to reasoners. Your handler can call reasoners directly with session.call(...) or app.call(...).

@app.session("voice", provider="openai", transport="webrtc")
async def voice(session):
    turn = await session.input()
    result = await session.call("support.resolve_voice_turn", turn=turn)
    await session.say(result["spoken_response"])

Use tools=[...] when the realtime loop itself needs a provider-visible allowlist of AgentField capabilities. For example, during a live audio call, the realtime model may decide it needs to look up an order or request an approval before answering. The tool allowlist tells AgentField which targets may be invoked through the session tool endpoint.

@app.session(
    "voice",
    provider="openai",
    transport="webrtc",
    tools=[
        "orders.lookup_order",
        "refunds.request_approval",
    ],
)
async def voice(session):
    ...

Each entry is an AgentField target. It is exposed to the live session as an allowed tool, then routed back through the control plane:

provider/client tool call
  -> POST /api/v1/session-instances/:session_id/tools/:tool
  -> POST /api/v1/execute/async/:target

So the distinction is:

APIWho decides to call it?What it does
session.call(...)Your session handler codeCalls a reasoner or skill directly from the handler
tools=[...]The realtime provider/client tool loopAllows selected AgentField targets to be invoked autonomously during the live session

Tags and Access Control

Sessions can declare tags just like reasoners and skills. Those tags are proposed at registration time, approved through the same access-management flow, and included in the target tag set used when a caller starts the session.

@app.session(
    "voice",
    provider="openai",
    transport="webrtc",
    tags=["support:voice", "pii:limited"],
)
async def voice(session):
    ...

Use session tags for ingress-level policy: who can start the live session, which data class the session may touch, or which team owns the interaction. Use reasoner and skill tags for the work the session calls after it starts.

Control-Plane Flow

browser/client
  -> POST /api/v1/session-targets/:target/start
  -> POST /api/v1/session-instances/:session_id/realtime-offer
  -> POST /api/v1/session-instances/:session_id/tools/:tool
       -> POST /api/v1/execute/async/:target

The session itself is not a shortcut around AgentField. It is a control-plane entrypoint that keeps the realtime provider boundary separate from the reasoner workflow boundary.

Provider and Transport Matrix

ProviderTransportUse it for
openaiwebrtcBrowser realtime voice and audio sessions
openaiwebsocketServer-side realtime sessions
openrouteraudio_turnsTurn-based audio input/output calls

Unsupported combinations fail early with an error like:

Unsupported session transport 'webrtc' for provider 'openrouter'. Supported transports: audio_turns. AgentField does not infer or switch providers; set provider and transport explicitly.

Next