What is the Inference API?
The Inference API gives you access to hundreds of AI models through a single, consistent interface. Generate text, images, and videos without managing infrastructure or juggling multiple provider SDKs. Capabilities:- Text Generation: Chat completions, tool calling, vision, structured output
- Image Generation: Text-to-image, image-to-image editing
- Video Generation: Text-to-video, image-to-video, reference-to-video, video-to-video editing
Authentication
All requests require a bearer token:Endpoints
| Endpoint | Method | Description |
|---|---|---|
/chat/completions | POST | Text generation (chat, vision, tool use) |
/images/generate | POST | Image generation |
/images/edit | POST | Image editing |
/videos/generate | POST | Video generation |
/ai/queue | POST | Async image/video generation |
/media/generations/status/:namespace/*model_name | GET | Poll async generation status |
/media/generations/:namespace/:generation_id | DELETE | Cancel a queued generation |
/evaluations/models | GET | List available models |
/evaluations/models/:id | GET | Get model details and parameter schema |
Common Parameters
These parameters are accepted across multiple endpoints:| Parameter | Type | Description |
|---|---|---|
model | string | Required. The model to use (e.g. gpt-4o-mini, flux-2-dev, kling-video-o3-pro-reference-to-video). |
response_format | string | "url" (default) returns a hosted URL. "b64_json" returns base64-encoded bytes inline. Supported on image and video endpoints. |
target_namespace | string | Namespace to save results and bill to. Defaults to your user. Can be an organization name. |
Discovering Models
List all models, optionally filtered by capability:json_request_schema field with the complete parameter definitions, types, defaults, and constraints for that model.
Pricing
Pricing varies by model:| Method | How it works | Examples |
|---|---|---|
token | Per input/output token | GPT, Claude, Gemini |
per_image | Fixed cost per image | FLUX, DALL-E |
per_video_output_second | Cost per second of output video | Kling, Sora |
input_cost_per_token, output_cost_per_token, cost_per_image, video_cost_per_second, video_cost_per_second_with_audio, video_cost_per_second_high_res.
Error Format
Errors use one of two formats:unauthenticated, invalid_params, resource_not_found, unknown_error.
Quick Starts
Chat
Text generation in minutes
Image Generation
Text-to-image in minutes
Video Generation
Text-to-video in minutes
API Reference
Chat Completions
Text generation, vision, tool calling
Image Generation
Text-to-image generation
Image Editing
Edit images with text prompts
Video Generation
Text-to-video, image-to-video, multi-shot
Async Queue
Background image/video generation
Model References
Kling O3 Pro: Reference to Video
Multi-shot video with reference images, elements, and audio