Name: Agentfield
Rating: 5 (1 reviews)

Universal interface to Large Language Models with automatic multimodal detection, structured output validation, and hierarchical configuration. Handles text, images, audio, and files with intelligent response wrapping.

Return Type Behavior: - With schema parameter: Returns validated Pydantic model instance - Without schema parameter: Returns MultimodalResponse object (backward compatible as string)

Basic Example

from agentfield import Agent

app = Agent(node_id="assistant")

# Simple text-to-text
response = await app.ai("What is the capital of France?")
print(response)  # "The capital of France is Paris."

# System + user pattern
response = await app.ai(
    system="You are a geography expert.",
    user="What is the capital of France?"
)

Parameters

Prop

Type

Parameter Handling: The SDK passes all **kwargs directly to LiteLLM without hard-coding parameters. Provider-specific transformations (e.g., OpenAI's max_tokens → max_completion_tokens) are handled in litellm_adapters.py.

Common Patterns

Structured Output with Pydantic

Enforce type-safe, validated responses using Pydantic models.

from pydantic import BaseModel

class SentimentAnalysis(BaseModel):
    sentiment: str  # positive, negative, neutral
    confidence: float  # 0.0 to 1.0
    keywords: list[str]
    reasoning: str

@app.reasoner
async def analyze_sentiment(text: str) -> SentimentAnalysis:
    """Returns validated Pydantic object, not raw text."""
    return await app.ai(
        system="Analyze sentiment systematically.",
        user=text,
        schema=SentimentAnalysis  # Automatic validation
    )

# Usage
result = await analyze_sentiment("I love this product!")
print(result.sentiment)  # "positive"
print(result.confidence)  # 0.95
print(result.keywords)  # ["love", "product"]

The schema parameter automatically augments the system prompt with strict schema adherence instructions. The SDK validates the LLM response and returns a typed Pydantic instance.

Multimodal Input - Automatic Detection

Agentfield automatically detects and processes images, audio, and files from URLs or local paths.

# Image from URL - automatically detected
response = await app.ai(
    "Describe this image in detail.",
    "https://example.com/product-photo.jpg"
)

# Local image file - automatically converted to base64
response = await app.ai(
    "What's in this screenshot?",
    "./screenshots/error-message.png"
)

# Audio file - automatically processed
response = await app.ai(
    "Transcribe this audio.",
    "./recordings/meeting-notes.mp3"
)

# Mix multiple types
response = await app.ai(
    "Compare the audio description with the visual content.",
    "./product-review.wav",
    "https://example.com/product-image.jpg",
    "Additional context: Premium product line."
)

Automatic detection handles: image URLs, local image files (jpg, png, gif, webp), audio files (wav, mp3, flac, ogg), base64 data URLs, and raw bytes. No manual type specification needed.

Multimodal Input - Explicit Control

Use input classes for precise control over multimodal content.

from agentfield import Text, Image, Audio
from agentfield import image_from_url, image_from_file, audio_from_file

# Explicit multimodal composition
response = await app.ai(
    Text(text="Describe this chart and the audio commentary"),
    image_from_url("https://example.com/sales-chart.png"),
    audio_from_file("./presenter-notes.wav")
)

# Convenience functions
response = await app.ai(
    "Analyze the presentation",
    image_from_file("./slide1.png"),
    image_from_file("./slide2.png"),
    audio_from_file("./narration.mp3")
)

Multimodal Response Handling

When called without a schema parameter, app.ai() returns a MultimodalResponse object that works as a string but provides rich multimodal access.

This section applies only when no schema parameter is provided. When using schema, the method returns a validated Pydantic model instance instead.

# Backward compatible - works as string
response = await app.ai("Generate a greeting with audio")
print(response)  # Prints text content
print(str(response))  # Explicit string conversion

# Access multimodal content
if response.has_audio:
    response.audio.save("greeting.wav")
    response.audio.play()  # Requires pygame

if response.has_images:
    for i, image in enumerate(response.images):
        image.save(f"generated_{i}.png")
        image.show()  # Requires PIL

# Check content types
print(f"Has audio: {response.has_audio}")
print(f"Has images: {response.has_images}")
print(f"Is multimodal: {response.is_multimodal}")

# Save all content at once
saved_files = response.save_all("./output", prefix="ai_response")
# Returns: {"text": "path/to/text.txt", "audio": "path/to/audio.wav", ...}

Configuration Overrides

Override agent defaults on a per-call basis using hierarchical configuration.

Why Override Configurations? - Cost Optimization: Use cheaper models (gpt-4o-mini) for simple tasks, expensive models (gpt-4o) only when needed - Task-Specific Performance: Different models excel at different tasks (reasoning vs speed vs multimodal) - Quality Control: Adjust temperature for deterministic outputs (0.0) vs creative generation (1.2+) - Token Management: Set appropriate max_tokens based on expected response length

# Agent defaults for most operations
app = Agent(
    node_id="writer",
    ai_config=AIConfig(
        model="openai/gpt-4o-mini",  # Cost-effective default
        temperature=0.7,
        max_tokens=1000
    )
)

# Override for creative writing (better quality, higher cost)
creative_story = await app.ai(
    "Write a sci-fi short story.",
    model="openai/gpt-4o",  # Better model for creative tasks
    temperature=1.2,  # More creative
    max_tokens=2000  # Longer output
)

# Override for precise analysis (deterministic, lower cost)
analysis = await app.ai(
    "Analyze this data for errors.",
    temperature=0.0,  # Deterministic
    max_tokens=500  # Concise
)

# Provider-specific parameters
response = await app.ai(
    "Generate diverse ideas.",
    top_p=0.9,  # LiteLLM parameter
    frequency_penalty=0.5,  # OpenAI parameter
    presence_penalty=0.3  # OpenAI parameter
)

Configuration hierarchy: Agent defaults → Method parameters → Runtime kwargs. Later values override earlier ones. All LiteLLM parameters are supported via **kwargs.

Reasoner Pattern with Model Selection

Pass model configuration through reasoners for flexible AI routing.

from pydantic import BaseModel

class Analysis(BaseModel):
    summary: str
    complexity: str
    recommendations: list[str]

@app.reasoner
async def analyze_document(
    document: str,
    model: str = "openai/gpt-4o-mini"  # Accept model as parameter
) -> Analysis:
    """Analyze document with configurable model selection."""

    # Route to appropriate model based on task complexity
    return await app.ai(
        system="You are a document analyzer.",
        user=f"Analyze: {document}",
        schema=Analysis,
        model=model  # Pass through to app.ai()
    )

# Use cheap model for simple documents
simple_analysis = await analyze_document(
    "Short memo about meeting",
    model="openai/gpt-4o-mini"
)

# Use powerful model for complex documents
complex_analysis = await analyze_document(
    "50-page technical specification",
    model="openai/gpt-4o"
)

This pattern enables cost-aware AI routing: use cheaper models by default, upgrade to powerful models only when complexity demands it. Particularly useful for multi-step reasoners where different steps have different requirements.

Image Generation

# Works with both DALL-E and OpenRouter
result = await app.ai_with_vision(
    "A beautiful sunset over mountains",
    model="dall-e-3"  # or "openrouter/google/gemini-2.5-flash-image-preview"
)
result.images[0].save("sunset.png")

See full example: examples/python_agent_nodes/image_generation_hello_world/

Audio Generation Beta

Generate audio responses using specialized audio models. Supports schema parameter via **kwargs for structured output.

# Basic audio generation (returns MultimodalResponse)
audio_result = await app.ai_with_audio("Say hello warmly")
audio_result.audio.save("greeting.wav")
audio_result.audio.play()

# With structured output (returns Pydantic model)
from pydantic import BaseModel

class Greeting(BaseModel):
    text: str
    tone: str

greeting = await app.ai_with_audio(
    "Say hello warmly",
    schema=Greeting  # Returns Greeting instance, not MultimodalResponse
)
print(greeting.text)  # Access as Pydantic model

# Customize voice and format
audio_result = await app.ai_with_audio(
    "Explain quantum computing in simple terms",
    voice="nova",  # alloy, echo, fable, onyx, nova, shimmer
    format="mp3",  # wav, mp3
    model="openai/gpt-4o-audio-preview"
)

# Access both text and audio
print(audio_result.text)  # Text version
audio_result.audio.save("explanation.mp3")

# OpenAI direct mode with instructions (bypasses LiteLLM)
audio_result = await app.ai_with_audio(
    "Provide a warm, professional greeting",
    voice="alloy",
    format="wav",
    model="openai/gpt-4o-mini-tts",
    mode="openai_direct",
    instructions="Speak slowly and clearly with enthusiasm",
    speed=0.9
)

Image Generation Beta

Generate images using DALL-E or other image models. Always returns MultimodalResponse with images.

LiteLLM Dependency: Image generation capabilities are determined by LiteLLM's supported providers. Agentfield passes requests directly to LiteLLM's aimage_generation() API. Available models, sizes, and features depend on what LiteLLM supports for your configured provider.

ai_with_vision() does not support the schema parameter - it's for image generation, not text completion.

# Basic image generation (always returns MultimodalResponse)
image_result = await app.ai_with_vision("A sunset over mountains")
image_result.images[0].save("sunset.png")
image_result.images[0].show()

# Customize image parameters
image_result = await app.ai_with_vision(
    "A futuristic cityscape with flying cars",
    size="1792x1024",  # 256x256, 512x512, 1024x1024, 1792x1024, 1024x1792
    quality="hd",  # standard, hd
    style="vivid",  # vivid, natural (DALL-E 3 only)
    model="openai/dall-e-3"
)

# Access generated image
image = image_result.images[0]
image.save("cityscape.png")
print(image.revised_prompt)  # See how DALL-E interpreted the prompt

Explicit Multimodal Control

Request specific output modalities for complex workflows. Supports schema parameter via **kwargs for structured output.

# Request text + audio output (returns MultimodalResponse)
result = await app.ai_with_multimodal(
    "Describe this image and provide audio narration",
    image_from_url("https://example.com/chart.jpg"),
    modalities=["text", "audio"],
    audio_config={"voice": "nova", "format": "wav"}
)

# Access all outputs
print(result.text)  # Text description
result.audio.save("narration.wav")  # Audio narration

# With structured output (returns Pydantic model)
from pydantic import BaseModel

class ImageAnalysis(BaseModel):
    description: str
    key_elements: list[str]

analysis = await app.ai_with_multimodal(
    "Analyze this chart",
    image_from_url("https://example.com/chart.jpg"),
    schema=ImageAnalysis  # Returns ImageAnalysis instance
)
print(analysis.description)  # Access as Pydantic model

# Complex multimodal workflow
result = await app.ai_with_multimodal(
    Text(text="Create a presentation summary"),
    image_from_file("./slide1.png"),
    image_from_file("./slide2.png"),
    audio_from_file("./presenter-audio.wav"),
    modalities=["text", "audio"],
    audio_config={"voice": "alloy", "format": "mp3"},
    model="openai/gpt-4o-audio-preview"
)

Streaming Responses

Enable streaming for real-time output processing. Returns async generator instead of complete response.

# Enable streaming
stream = await app.ai(
    "Write a long essay about AI.",
    stream=True,
    max_tokens=2000
)

# Process chunks as they arrive
async for chunk in stream:
    if hasattr(chunk.choices[0].delta, 'content'):
        content = chunk.choices[0].delta.content
        if content:
            print(content, end='', flush=True)

When stream=True, app.ai() returns an async generator that yields response chunks as they arrive from the LLM. This enables real-time display and reduces perceived latency for long responses.

Error Handling and Fallbacks

Agentfield automatically handles rate limits and provides fallback models.

# Configure fallback models in AIConfig
app = Agent(
    node_id="resilient-agent",
    ai_config=AIConfig(
        model="openai/gpt-4o",
        fallback_models=[
            "openai/gpt-4o-mini",
            "anthropic/claude-3-haiku"
        ],
        enable_rate_limit_retry=True,  # Automatic retry with exponential backoff
        rate_limit_max_retries=3,
        rate_limit_base_delay=1.0,
        rate_limit_max_delay=60.0
    )
)

# Automatic fallback on failure
try:
    response = await app.ai("Analyze this complex data")
    # If gpt-4o fails, automatically tries gpt-4o-mini, then claude-3-haiku
except Exception as e:
    print(f"All models failed: {e}")

# Manual error handling
try:
    response = await app.ai(
        "Generate analysis",
        model="openai/gpt-4o"
    )
except Exception as e:
    # Fallback to simpler model
    response = await app.ai(
        "Generate analysis",
        model="openai/gpt-4o-mini",
        temperature=0.0  # More deterministic for reliability
    )

Rate limiting is enabled by default. Agentfield automatically retries with exponential backoff (1s → 2s → 4s → ...) up to max_delay. Circuit breaker prevents cascading failures.

Response Object

The MultimodalResponse object provides comprehensive access to all response content. This is returned when no schema parameter is provided to app.ai(), ai_with_audio(), or ai_with_multimodal().

When a schema parameter is provided, these methods return a validated Pydantic model instance instead of MultimodalResponse.

Properties

Prop

Type

Methods

Prop

Type

AudioOutput Methods

Prop

Type

ImageOutput Methods

Prop

Type

Specialized Methods

ai_with_audio()

Optimized for audio generation with automatic model selection.

Prop

Type

Returns: MultimodalResponse (without schema) or Pydantic model instance (with schema)

ai_with_vision()

Generate images with LiteLLM or OpenRouter. Routes automatically based on model name.

Prop

Type

Returns: MultimodalResponse with generated images

Examples:

# DALL-E (LiteLLM)
result = await app.ai_with_vision("A sunset over mountains")
result.images[0].save("output.png")

# OpenRouter (Gemini)
result = await app.ai_with_vision(
    "A futuristic city",
    model="openrouter/google/gemini-2.5-flash-image-preview",
    image_config={"aspect_ratio": "16:9"}
)

# Base64 data
result = await app.ai_with_vision("A landscape", response_format="b64_json")

ai_with_multimodal()

Explicit control over input and output modalities.

Prop

Type

Returns: MultimodalResponse (without schema) or Pydantic model instance (with schema)

Advanced Features

Automatic Prompt Trimming

Agentfield automatically trims prompts to fit model context windows using token-aware trimming.

# Long conversation - automatically trimmed to fit context window
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    # ... hundreds of messages ...
]

response = await app.ai(messages)
# Agentfield uses LiteLLM's trim_messages() to keep within token limits
# Preserves system message and recent context

Trimming uses LiteLLM's token counter for accurate token counting. Preserves system messages and trims from the middle using "middle-out" strategy. Configurable via max_input_tokens in AIConfig.

Memory Integration Coming Soon

Development Status: The memory_scope parameter is defined in the SDK but automatic memory injection is not yet implemented. Currently, memory must be manually retrieved and passed via the context parameter.

# Current approach - manual memory retrieval
@app.reasoner
async def context_aware_chat(user_message: str, user_id: str) -> str:
    # Manually retrieve memory
    history = await app.memory.get(f"user_{user_id}_history", default=[])

    # Pass as context
    response = await app.ai(
        system="You are a helpful assistant.",
        user=user_message,
        context={"history": history}  # Manual injection
    )

    return response

Automatic injection via memory_scope parameter is planned for a future release.

Best Practices

Use Pydantic Schemas for Reliability

Structured output is more reliable than parsing free-form text.

# ❌ Unreliable - parsing free text
response = await app.ai("Return JSON with sentiment and confidence")
data = json.loads(response)  # May fail if LLM doesn't format correctly

# ✅ Reliable - enforced schema
class Analysis(BaseModel):
    sentiment: str
    confidence: float

response = await app.ai("Analyze sentiment", schema=Analysis)
# Guaranteed to be valid Analysis object

Schema Complexity vs Model Capability: Complex nested schemas may fail with smaller models. If using gpt-4o-mini or similar, keep schemas simple (2-3 levels deep max). For complex schemas with deep nesting, use more capable models like gpt-4o or claude-3-opus.

# ❌ Too complex for mini models
class ComplexSchema(BaseModel):
    level1: dict[str, dict[str, list[dict[str, Any]]]]

# ✅ Simple schema works reliably
class SimpleSchema(BaseModel):
    category: str
    items: list[str]

Handle Multimodal Content Safely

Always check for content availability before accessing.

response = await app.ai_with_audio("Generate greeting")

# ✅ Safe access
if response.has_audio:
    response.audio.save("greeting.wav")
else:
    print("No audio generated, using text:", response.text)

# ❌ Unsafe - may raise AttributeError
response.audio.save("greeting.wav")  # Fails if audio is None

Use Appropriate Models for Tasks

Different models excel at different tasks.

# Fast, cheap tasks - use mini models
quick_summary = await app.ai(
    "Summarize in one sentence: ...",
    model="openai/gpt-4o-mini"
)

# Complex reasoning - use full models
deep_analysis = await app.ai(
    "Analyze this complex dataset: ...",
    model="openai/gpt-4o"
)

# Multimodal tasks - use specialized models
audio_response = await app.ai_with_audio(
    "Explain this concept",
    model="openai/gpt-4o-audio-preview"
)

Leverage Configuration Hierarchy

Set sensible defaults, override when needed.

# Agent-level defaults for most operations
app = Agent(
    node_id="assistant",
    ai_config=AIConfig(
        model="openai/gpt-4o-mini",  # Fast, cheap default
        temperature=0.7,
        max_tokens=1000
    )
)

# Override for specific needs
creative_output = await app.ai(
    "Write a poem",
    model="openai/gpt-4o",  # Better model
    temperature=1.2  # More creative
)

Performance Considerations

Token Counting Overhead:

Prompt trimming uses LiteLLM's token counter for accuracy
Only triggered when prompt exceeds model's context window
Minimal overhead for typical prompts (< 5ms)
Uses "middle-out" trimming strategy to preserve context

Rate Limiting:

Enabled by default with exponential backoff
Adds ~0ms overhead when no rate limits hit
Prevents cascading failures in production

Multimodal Processing:

Image/audio conversion to base64 adds ~50-200ms
Cached after first conversion
Use explicit classes to skip auto-detection

Fallback Models:

No overhead if primary model succeeds
Automatic retry adds ~1-5s on failure
Configure fallback_models in AIConfig

Agent Node - Initialize agents with AIConfig
@app.reasoner - Use app.ai() within reasoners
app.memory - Inject memory into AI context
AIConfig - Configure model defaults and fallbacks
Execution Context - Track AI calls in workflows