kling-video-o3-pro-reference-to-video
Pricing: $0.224/second of output video (same rate with audio)
Input: image + text | Output: video
Endpoint
total video duration x $0.224/s. A default 5-second video costs ~$1.12.
For fire-and-forget or batch generation, use /ai/queue instead.
Request Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
model | string | yes | — | "kling-video-o3-pro-reference-to-video" |
prompt | string | one of | — | Single prompt for the video. Use this or multi_prompt, not both. Max 512 characters. |
multi_prompt | array | one of | — | Multi-shot prompts. See multi_prompt below. |
duration | integer | no | 5 | Duration in seconds when using prompt. |
input_image | array of URIs | no | — | Reference images for style/appearance (max 4 combined with elements). Reference in prompts as @Image1, @Image2, etc. |
start_image_url | string (URI) | no | — | First frame of the video. The model extends from this image. |
tail_image_url | string (URI) | no | — | Last frame of the video. Requires start_image_url. The model fills in between the frames. |
elements | array of objects | no | — | Structured element references for characters/objects. See elements below. |
negative_prompt | string | no | "blur, distort, and low quality" | Text describing what to avoid in the generated video. |
aspect_ratio | string | no | "16:9" | "9:16", "1:1", or "16:9". |
generate_audio | boolean | no | false | Generate native audio. Supports Chinese and English voice output. |
response_format | string | no | "url" | "url" returns a hosted URL. "b64_json" returns base64-encoded video bytes inline. |
target_namespace | string | no | current user | Namespace to save results and bill to. Can be an organization name. |
prompt vs multi_prompt
Use eitherprompt or multi_prompt, not both. Sending both returns:
multi_prompt: []) returns:
prompt, the duration defaults to 5 seconds. Override with duration:
multi_prompt
Array of shot objects. Each shot generates a segment of the video.| Field | Type | Required | Default | Description |
|---|---|---|---|---|
prompt | string | yes | — | Prompt for this shot. Max 512 characters. |
duration | integer | no | 5 | Duration of this shot in seconds (1-15). |
Duration Constraints
| Constraint | Value |
|---|---|
| Minimum total duration | 3 seconds |
| Maximum total duration | 15 seconds |
| Maximum per shot | 15 seconds |
| Default per shot | 5 seconds |
| Configuration | Total | Result |
|---|---|---|
Single shot, duration: 1 | 1s | Fails |
Single shot, duration: 2 | 2s | Fails |
Single shot, duration: 3 | 3s | Works |
Two shots: duration: 2 + duration: 1 | 3s | Works |
Two shots: duration: 1 + duration: 1 | 2s | Fails |
Single shot, duration: 15 | 15s | Works |
Three shots: duration: 5 + duration: 5 + duration: 5 | 15s | Works |
Three shots: duration: 5 + duration: 5 + duration: 6 | 16s | Fails |
elements
Array of element objects for character/object reference. Use@Element1, @Element2, etc. in prompts.
| Field | Type | Required | Description |
|---|---|---|---|
frontal_image_url | string (URI) | yes | Front view of the reference object or character. |
reference_image_urls | array of URIs | no | Additional angles. Max 3 images per element. |
input_image references.
Examples
Minimal: text only
input_image is optional. Without it the model generates purely from the prompt.
Single prompt with reference image
Multi-shot with reference image
With start/end frames and elements
Response (response_format: "url")
Response (response_format: "b64_json")
Using with /ai/queue
Recommended for video generation. Returns immediately, processes in the background.Enqueue
Poll
count of 0 means all generations are complete.
Cancel
Errors
| Error | Cause | Fix |
|---|---|---|
Cannot provide both 'prompt' and 'multi_prompt' | Sent both fields | Use one or the other |
Either 'prompt' or 'multi_prompt' must be provided | Neither sent, or empty array | Provide at least one |
Field required | multi_prompt item missing prompt | Every shot needs a prompt string |
duration value '2' is invalid | Total duration < 3 seconds | Ensure total across shots >= 3 |
Total shot duration (16s) exceeds maximum allowed (15s) | Total duration > 15 seconds | Keep total at 15 seconds or less |
Input should be '1', '2', ... or '15' | Single shot > 15 | Keep each shot at 15 seconds or less |
num_generations must be an integer between 1 and 4 | Invalid count (via /ai/queue) | Use 1-4 |
Other Kling Models
| Model | Input | Use Case | Cost/sec |
|---|---|---|---|
kling-video-v2-6-pro-text-to-video | Text only | Simple text-to-video | $0.070 |
kling-video-v2-6-pro-image-to-video | Image | Animate a single image | $0.070 |
kling-video-o3-pro-image-to-video | Image + text | Higher quality image animation | $0.224 |
kling-video-o3-pro-reference-to-video | Images + text | Reference-conditioned, multi-shot | $0.224 |
kling-video-o3-pro-video-to-video-edit | Video | Edit existing video | $0.336 |
kling-video-v3-pro-motion-control | Text + image + video | Camera/motion control | $0.168 |