Media Generation
Generate images, video, audio, and transcriptions with a unified API backed by pluggable media providers.
Generate images, audio, video, and transcriptions from your agents -- same method pattern as
app.ai().
Agents that generate reports, product listings, marketing content, or customer-facing assets often need more than text. AgentField's media generation API lets you create images, narrate text, produce video, and transcribe audio through a unified interface backed by pluggable providers like fal.ai, DALL-E, and ElevenLabs.
from agentfield import Agent, AIConfig
app = Agent(
node_id="product-listing-generator",
ai_config=AIConfig(fal_api_key="your-fal-key"), # or set FAL_KEY env var
)
@app.reasoner()
async def generate_product_listing(product: dict) -> dict:
# Generate a product image from description
image = await app.ai_generate_image(
prompt=f"Professional product photo: {product['name']}, {product['description']}",
model="fal-ai/flux/schnell",
size="square_hd",
)
image.images[0].save(f"/output/{product['id']}_hero.png")
# Generate audio narration for the listing
audio = await app.ai_generate_audio(
text=f"Describe this product in an engaging way: {product['name']}",
model="fal-ai/f5-tts",
voice="alloy",
)
if audio.audio:
audio.audio.save(f"/output/{product['id']}_narration.mp3")
# Generate a short product demo video
video = await app.ai_generate_video(
prompt=f"Product demonstration video: {product['name']} in use",
model="fal-ai/minimax-video/image-to-video",
duration=10,
)
if video.files:
video.files[0].save(f"/output/{product['id']}_demo.mp4")
# Transcribe customer review audio
transcript = await app.ai_transcribe_audio(
audio_url=product["review_audio_url"],
model="fal-ai/whisper",
)
return {
"image_url": image.images[0].url if image.images else None,
"audio_url": audio.audio.url if audio.audio else None,
"video_url": video.files[0].url if video.files else None,
"transcript": transcript.text,
}// Media generation is Python-only.
// TypeScript agents can call a Python agent's media generation
// capabilities via agent-to-agent calls:
const result = await agent.call('media-agent.generateProductImage', {
prompt: `Professional product photo: ${product.name}`,
model: 'fal-ai/flux/schnell',
});
console.log(result.imageUrl);// Media generation is Python-only.
// Go agents can call a Python agent's media generation
// capabilities via agent-to-agent calls:
result, _ := app.Call(ctx, "media-agent.generateProductImage", map[string]any{
"prompt": fmt.Sprintf("Professional product photo: %s", product.Name),
"model": "fal-ai/flux/schnell",
})
fmt.Println(result["imageUrl"])What you get back
The example did four output-producing operations in one workflow: image generation, audio generation, video generation, and transcription. The point to emphasize above the fold is that those outputs come back as typed objects with saveable files and URLs, not as ad hoc provider responses you have to normalize yourself.
{
"image_url": "https://cdn.example.com/product_hero.webp",
"audio_url": "https://cdn.example.com/product_narration.mp3",
"video_url": "https://cdn.example.com/product_demo.mp4",
"transcript": "Customer review text..."
}