app.ai()
Universal LLM interface with multimodal support and intelligent type detection
app.ai()
Universal LLM interface with multimodal support and intelligent type detection
Universal interface to Large Language Models with automatic multimodal detection, structured output validation, and hierarchical configuration. Handles text, images, audio, and files with intelligent response wrapping.
Return Type Behavior: - With schema parameter: Returns validated
Pydantic model instance - Without schema parameter: Returns
MultimodalResponse object (backward compatible as string)
Basic Example
from agentfield import Agent
app = Agent(node_id="assistant")
# Simple text-to-text
response = await app.ai("What is the capital of France?")
print(response) # "The capital of France is Paris."
# System + user pattern
response = await app.ai(
system="You are a geography expert.",
user="What is the capital of France?"
)Parameters
Prop
Type
Parameter Handling: The SDK passes all **kwargs directly to LiteLLM without hard-coding parameters. Provider-specific transformations (e.g., OpenAI's max_tokens → max_completion_tokens) are handled in litellm_adapters.py.
Common Patterns
Structured Output with Pydantic
Enforce type-safe, validated responses using Pydantic models.
from pydantic import BaseModel
class SentimentAnalysis(BaseModel):
sentiment: str # positive, negative, neutral
confidence: float # 0.0 to 1.0
keywords: list[str]
reasoning: str
@app.reasoner
async def analyze_sentiment(text: str) -> SentimentAnalysis:
"""Returns validated Pydantic object, not raw text."""
return await app.ai(
system="Analyze sentiment systematically.",
user=text,
schema=SentimentAnalysis # Automatic validation
)
# Usage
result = await analyze_sentiment("I love this product!")
print(result.sentiment) # "positive"
print(result.confidence) # 0.95
print(result.keywords) # ["love", "product"]The schema parameter automatically augments the system prompt with strict
schema adherence instructions. The SDK validates the LLM response and returns
a typed Pydantic instance.
Multimodal Input - Automatic Detection
Agentfield automatically detects and processes images, audio, and files from URLs or local paths.
# Image from URL - automatically detected
response = await app.ai(
"Describe this image in detail.",
"https://example.com/product-photo.jpg"
)
# Local image file - automatically converted to base64
response = await app.ai(
"What's in this screenshot?",
"./screenshots/error-message.png"
)
# Audio file - automatically processed
response = await app.ai(
"Transcribe this audio.",
"./recordings/meeting-notes.mp3"
)
# Mix multiple types
response = await app.ai(
"Compare the audio description with the visual content.",
"./product-review.wav",
"https://example.com/product-image.jpg",
"Additional context: Premium product line."
)Automatic detection handles: image URLs, local image files (jpg, png, gif, webp), audio files (wav, mp3, flac, ogg), base64 data URLs, and raw bytes. No manual type specification needed.
Multimodal Input - Explicit Control
Use input classes for precise control over multimodal content.
from agentfield import Text, Image, Audio
from agentfield import image_from_url, image_from_file, audio_from_file
# Explicit multimodal composition
response = await app.ai(
Text(text="Describe this chart and the audio commentary"),
image_from_url("https://example.com/sales-chart.png"),
audio_from_file("./presenter-notes.wav")
)
# Convenience functions
response = await app.ai(
"Analyze the presentation",
image_from_file("./slide1.png"),
image_from_file("./slide2.png"),
audio_from_file("./narration.mp3")
)Multimodal Response Handling
When called without a schema parameter, app.ai() returns a MultimodalResponse object that works as a string but provides rich multimodal access.
This section applies only when no schema parameter is provided. When
using schema, the method returns a validated Pydantic model instance
instead.
# Backward compatible - works as string
response = await app.ai("Generate a greeting with audio")
print(response) # Prints text content
print(str(response)) # Explicit string conversion
# Access multimodal content
if response.has_audio:
response.audio.save("greeting.wav")
response.audio.play() # Requires pygame
if response.has_images:
for i, image in enumerate(response.images):
image.save(f"generated_{i}.png")
image.show() # Requires PIL
# Check content types
print(f"Has audio: {response.has_audio}")
print(f"Has images: {response.has_images}")
print(f"Is multimodal: {response.is_multimodal}")
# Save all content at once
saved_files = response.save_all("./output", prefix="ai_response")
# Returns: {"text": "path/to/text.txt", "audio": "path/to/audio.wav", ...}Configuration Overrides
Override agent defaults on a per-call basis using hierarchical configuration.
Why Override Configurations? - Cost Optimization: Use cheaper models (gpt-4o-mini) for simple tasks, expensive models (gpt-4o) only when needed - Task-Specific Performance: Different models excel at different tasks (reasoning vs speed vs multimodal) - Quality Control: Adjust temperature for deterministic outputs (0.0) vs creative generation (1.2+) - Token Management: Set appropriate max_tokens based on expected response length
# Agent defaults for most operations
app = Agent(
node_id="writer",
ai_config=AIConfig(
model="openai/gpt-4o-mini", # Cost-effective default
temperature=0.7,
max_tokens=1000
)
)
# Override for creative writing (better quality, higher cost)
creative_story = await app.ai(
"Write a sci-fi short story.",
model="openai/gpt-4o", # Better model for creative tasks
temperature=1.2, # More creative
max_tokens=2000 # Longer output
)
# Override for precise analysis (deterministic, lower cost)
analysis = await app.ai(
"Analyze this data for errors.",
temperature=0.0, # Deterministic
max_tokens=500 # Concise
)
# Provider-specific parameters
response = await app.ai(
"Generate diverse ideas.",
top_p=0.9, # LiteLLM parameter
frequency_penalty=0.5, # OpenAI parameter
presence_penalty=0.3 # OpenAI parameter
)Configuration hierarchy: Agent defaults → Method parameters → Runtime kwargs. Later values override earlier ones. All LiteLLM parameters are supported via **kwargs.
Reasoner Pattern with Model Selection
Pass model configuration through reasoners for flexible AI routing.
from pydantic import BaseModel
class Analysis(BaseModel):
summary: str
complexity: str
recommendations: list[str]
@app.reasoner
async def analyze_document(
document: str,
model: str = "openai/gpt-4o-mini" # Accept model as parameter
) -> Analysis:
"""Analyze document with configurable model selection."""
# Route to appropriate model based on task complexity
return await app.ai(
system="You are a document analyzer.",
user=f"Analyze: {document}",
schema=Analysis,
model=model # Pass through to app.ai()
)
# Use cheap model for simple documents
simple_analysis = await analyze_document(
"Short memo about meeting",
model="openai/gpt-4o-mini"
)
# Use powerful model for complex documents
complex_analysis = await analyze_document(
"50-page technical specification",
model="openai/gpt-4o"
)This pattern enables cost-aware AI routing: use cheaper models by default, upgrade to powerful models only when complexity demands it. Particularly useful for multi-step reasoners where different steps have different requirements.
Image Generation
# Works with both DALL-E and OpenRouter
result = await app.ai_with_vision(
"A beautiful sunset over mountains",
model="dall-e-3" # or "openrouter/google/gemini-2.5-flash-image-preview"
)
result.images[0].save("sunset.png")See full example: examples/python_agent_nodes/image_generation_hello_world/
Audio Generation Beta
Generate audio responses using specialized audio models. Supports schema parameter via **kwargs for structured output.
# Basic audio generation (returns MultimodalResponse)
audio_result = await app.ai_with_audio("Say hello warmly")
audio_result.audio.save("greeting.wav")
audio_result.audio.play()
# With structured output (returns Pydantic model)
from pydantic import BaseModel
class Greeting(BaseModel):
text: str
tone: str
greeting = await app.ai_with_audio(
"Say hello warmly",
schema=Greeting # Returns Greeting instance, not MultimodalResponse
)
print(greeting.text) # Access as Pydantic model
# Customize voice and format
audio_result = await app.ai_with_audio(
"Explain quantum computing in simple terms",
voice="nova", # alloy, echo, fable, onyx, nova, shimmer
format="mp3", # wav, mp3
model="openai/gpt-4o-audio-preview"
)
# Access both text and audio
print(audio_result.text) # Text version
audio_result.audio.save("explanation.mp3")
# OpenAI direct mode with instructions (bypasses LiteLLM)
audio_result = await app.ai_with_audio(
"Provide a warm, professional greeting",
voice="alloy",
format="wav",
model="openai/gpt-4o-mini-tts",
mode="openai_direct",
instructions="Speak slowly and clearly with enthusiasm",
speed=0.9
)Image Generation Beta
Generate images using DALL-E or other image models. Always returns MultimodalResponse with images.
LiteLLM Dependency: Image generation capabilities are determined by LiteLLM's supported providers. Agentfield passes requests directly to LiteLLM's aimage_generation() API. Available models, sizes, and features depend on what LiteLLM supports for your configured provider.
ai_with_vision() does not support the schema parameter - it's for image generation, not text completion.
# Basic image generation (always returns MultimodalResponse)
image_result = await app.ai_with_vision("A sunset over mountains")
image_result.images[0].save("sunset.png")
image_result.images[0].show()
# Customize image parameters
image_result = await app.ai_with_vision(
"A futuristic cityscape with flying cars",
size="1792x1024", # 256x256, 512x512, 1024x1024, 1792x1024, 1024x1792
quality="hd", # standard, hd
style="vivid", # vivid, natural (DALL-E 3 only)
model="openai/dall-e-3"
)
# Access generated image
image = image_result.images[0]
image.save("cityscape.png")
print(image.revised_prompt) # See how DALL-E interpreted the promptExplicit Multimodal Control
Request specific output modalities for complex workflows. Supports schema parameter via **kwargs for structured output.
# Request text + audio output (returns MultimodalResponse)
result = await app.ai_with_multimodal(
"Describe this image and provide audio narration",
image_from_url("https://example.com/chart.jpg"),
modalities=["text", "audio"],
audio_config={"voice": "nova", "format": "wav"}
)
# Access all outputs
print(result.text) # Text description
result.audio.save("narration.wav") # Audio narration
# With structured output (returns Pydantic model)
from pydantic import BaseModel
class ImageAnalysis(BaseModel):
description: str
key_elements: list[str]
analysis = await app.ai_with_multimodal(
"Analyze this chart",
image_from_url("https://example.com/chart.jpg"),
schema=ImageAnalysis # Returns ImageAnalysis instance
)
print(analysis.description) # Access as Pydantic model
# Complex multimodal workflow
result = await app.ai_with_multimodal(
Text(text="Create a presentation summary"),
image_from_file("./slide1.png"),
image_from_file("./slide2.png"),
audio_from_file("./presenter-audio.wav"),
modalities=["text", "audio"],
audio_config={"voice": "alloy", "format": "mp3"},
model="openai/gpt-4o-audio-preview"
)Streaming Responses
Enable streaming for real-time output processing. Returns async generator instead of complete response.
# Enable streaming
stream = await app.ai(
"Write a long essay about AI.",
stream=True,
max_tokens=2000
)
# Process chunks as they arrive
async for chunk in stream:
if hasattr(chunk.choices[0].delta, 'content'):
content = chunk.choices[0].delta.content
if content:
print(content, end='', flush=True)When stream=True, app.ai() returns an async generator that yields response
chunks as they arrive from the LLM. This enables real-time display and reduces
perceived latency for long responses.
Error Handling and Fallbacks
Agentfield automatically handles rate limits and provides fallback models.
# Configure fallback models in AIConfig
app = Agent(
node_id="resilient-agent",
ai_config=AIConfig(
model="openai/gpt-4o",
fallback_models=[
"openai/gpt-4o-mini",
"anthropic/claude-3-haiku"
],
enable_rate_limit_retry=True, # Automatic retry with exponential backoff
rate_limit_max_retries=3,
rate_limit_base_delay=1.0,
rate_limit_max_delay=60.0
)
)
# Automatic fallback on failure
try:
response = await app.ai("Analyze this complex data")
# If gpt-4o fails, automatically tries gpt-4o-mini, then claude-3-haiku
except Exception as e:
print(f"All models failed: {e}")
# Manual error handling
try:
response = await app.ai(
"Generate analysis",
model="openai/gpt-4o"
)
except Exception as e:
# Fallback to simpler model
response = await app.ai(
"Generate analysis",
model="openai/gpt-4o-mini",
temperature=0.0 # More deterministic for reliability
)Rate limiting is enabled by default. Agentfield automatically retries with exponential backoff (1s → 2s → 4s → ...) up to max_delay. Circuit breaker prevents cascading failures.
Response Object
The MultimodalResponse object provides comprehensive access to all response content. This is returned when no schema parameter is provided to app.ai(), ai_with_audio(), or ai_with_multimodal().
When a schema parameter is provided, these methods return a validated
Pydantic model instance instead of MultimodalResponse.
Properties
Prop
Type
Methods
Prop
Type
AudioOutput Methods
Prop
Type
ImageOutput Methods
Prop
Type
Specialized Methods
ai_with_audio()
Optimized for audio generation with automatic model selection.
Prop
Type
Returns: MultimodalResponse (without schema) or Pydantic model instance (with schema)
ai_with_vision()
Generate images with LiteLLM or OpenRouter. Routes automatically based on model name.
Prop
Type
Returns: MultimodalResponse with generated images
Examples:
# DALL-E (LiteLLM)
result = await app.ai_with_vision("A sunset over mountains")
result.images[0].save("output.png")
# OpenRouter (Gemini)
result = await app.ai_with_vision(
"A futuristic city",
model="openrouter/google/gemini-2.5-flash-image-preview",
image_config={"aspect_ratio": "16:9"}
)
# Base64 data
result = await app.ai_with_vision("A landscape", response_format="b64_json")ai_with_multimodal()
Explicit control over input and output modalities.
Prop
Type
Returns: MultimodalResponse (without schema) or Pydantic model instance (with schema)
Advanced Features
Automatic Prompt Trimming
Agentfield automatically trims prompts to fit model context windows using token-aware trimming.
# Long conversation - automatically trimmed to fit context window
messages = [
{"role": "system", "content": "You are a helpful assistant."},
# ... hundreds of messages ...
]
response = await app.ai(messages)
# Agentfield uses LiteLLM's trim_messages() to keep within token limits
# Preserves system message and recent contextTrimming uses LiteLLM's token counter for accurate token counting. Preserves system messages and trims from the middle using "middle-out" strategy. Configurable via max_input_tokens in AIConfig.
Memory Integration Coming Soon
Development Status: The memory_scope parameter is defined in the SDK but automatic memory injection is not yet implemented. Currently, memory must be manually retrieved and passed via the context parameter.
# Current approach - manual memory retrieval
@app.reasoner
async def context_aware_chat(user_message: str, user_id: str) -> str:
# Manually retrieve memory
history = await app.memory.get(f"user_{user_id}_history", default=[])
# Pass as context
response = await app.ai(
system="You are a helpful assistant.",
user=user_message,
context={"history": history} # Manual injection
)
return responseAutomatic injection via memory_scope parameter is planned for a future release.
Best Practices
Use Pydantic Schemas for Reliability
Structured output is more reliable than parsing free-form text.
# ❌ Unreliable - parsing free text
response = await app.ai("Return JSON with sentiment and confidence")
data = json.loads(response) # May fail if LLM doesn't format correctly
# ✅ Reliable - enforced schema
class Analysis(BaseModel):
sentiment: str
confidence: float
response = await app.ai("Analyze sentiment", schema=Analysis)
# Guaranteed to be valid Analysis objectSchema Complexity vs Model Capability: Complex nested schemas may fail with smaller models. If using gpt-4o-mini or similar, keep schemas simple (2-3 levels deep max). For complex schemas with deep nesting, use more capable models like gpt-4o or claude-3-opus.
# ❌ Too complex for mini models
class ComplexSchema(BaseModel):
level1: dict[str, dict[str, list[dict[str, Any]]]]
# ✅ Simple schema works reliably
class SimpleSchema(BaseModel):
category: str
items: list[str]Handle Multimodal Content Safely
Always check for content availability before accessing.
response = await app.ai_with_audio("Generate greeting")
# ✅ Safe access
if response.has_audio:
response.audio.save("greeting.wav")
else:
print("No audio generated, using text:", response.text)
# ❌ Unsafe - may raise AttributeError
response.audio.save("greeting.wav") # Fails if audio is NoneUse Appropriate Models for Tasks
Different models excel at different tasks.
# Fast, cheap tasks - use mini models
quick_summary = await app.ai(
"Summarize in one sentence: ...",
model="openai/gpt-4o-mini"
)
# Complex reasoning - use full models
deep_analysis = await app.ai(
"Analyze this complex dataset: ...",
model="openai/gpt-4o"
)
# Multimodal tasks - use specialized models
audio_response = await app.ai_with_audio(
"Explain this concept",
model="openai/gpt-4o-audio-preview"
)Leverage Configuration Hierarchy
Set sensible defaults, override when needed.
# Agent-level defaults for most operations
app = Agent(
node_id="assistant",
ai_config=AIConfig(
model="openai/gpt-4o-mini", # Fast, cheap default
temperature=0.7,
max_tokens=1000
)
)
# Override for specific needs
creative_output = await app.ai(
"Write a poem",
model="openai/gpt-4o", # Better model
temperature=1.2 # More creative
)Performance Considerations
Token Counting Overhead:
- Prompt trimming uses LiteLLM's token counter for accuracy
- Only triggered when prompt exceeds model's context window
- Minimal overhead for typical prompts (< 5ms)
- Uses "middle-out" trimming strategy to preserve context
Rate Limiting:
- Enabled by default with exponential backoff
- Adds ~0ms overhead when no rate limits hit
- Prevents cascading failures in production
Multimodal Processing:
- Image/audio conversion to base64 adds ~50-200ms
- Cached after first conversion
- Use explicit classes to skip auto-detection
Fallback Models:
- No overhead if primary model succeeds
- Automatic retry adds ~1-5s on failure
- Configure fallback_models in AIConfig
Related
- Agent Node - Initialize agents with AIConfig
- @app.reasoner - Use app.ai() within reasoners
- app.memory - Inject memory into AI context
- AIConfig - Configure model defaults and fallbacks
- Execution Context - Track AI calls in workflows