Video Generation

Overview

LemonData provides access to video generation models through a single unified API. Video generation is asynchronous: submit a request, receive a task ID and poll_url, then poll for the final result.

Availability and polling

The model inventory changes over time. For the latest public availability, use the Models API or visit the Models page. If a create response returns poll_url, call that exact URL. When it points to /v1/tasks/{id}, treat that as the canonical fixed status endpoint.

Model and media behavior

Audio behavior is model-dependent. In LemonData, Veo 3 family requests default to audio-on when output_audio is omitted. Some public models are silent-only or do not expose a stable toggle. For production integrations, prefer publicly reachable https URLs over inline base64 for images, videos, and audio. Inline data: URLs are still supported by compatible models, but URLs are easier to retry, inspect, and debug.

Async Workflow

Public Operations

LemonData’s current public video contract centers on these operations:

text-to-video
image-to-video
reference-to-video
start-end-to-video
video-to-video
motion-control

The request contract also accepts audio-to-video and video-extension for model-specific flows, but the current generally enabled public model inventory in this docs build does not include a broadly enabled model that advertises either capability.

Capability Matrix

Legend: ✅ Supported by at least one currently enabled public model in that provider family | ❌ Not currently represented by an enabled public model

Series	T2V	I2V	Reference	Start-End	V2V	Motion
OpenAI	✅	✅	❌	❌	❌	❌
Kuaishou	✅	✅	✅	✅	✅	✅
Google	✅	✅	✅	✅	❌	❌
ByteDance	✅	✅	❌	❌	❌	❌
MiniMax	✅	✅	❌	❌	❌	❌
Alibaba	✅	✅	✅	❌	❌	❌
Shengshu	✅	✅	✅	✅	❌	❌
xAI	✅	✅	❌	❌	✅	❌
Other	❌	❌	❌	❌	✅	❌

Capability Definitions

T2V (Text-to-Video): Generate video from a text prompt.
I2V (Image-to-Video): Animate a starting image. For the broadest compatibility, provide image_url.
Reference: Condition generation on one or more reference images via reference_images.
Start-End: Control the first and last frames with start_image and end_image.
V2V (Video-to-Video): Use an existing video as the primary source input.
Motion: Combine a subject image with a motion reference video.

Current Public Model Inventory

OpenAI

Model	Public operations
`sora-2`	Text-to-video, image-to-video
`sora-2-pro`	Text-to-video, image-to-video
`sora-2-pro-storyboard`	Image-to-video

Kuaishou

Model	Public operations
`kling-3.0-motion-control`	Motion control
`kling-3.0-video`	Text-to-video, image-to-video, start-end-to-video, element references
`kling-v2.1-master`	Text-to-video, image-to-video
`kling-v2.1-pro`	Image-to-video, start-end-to-video
`kling-v2.1-standard`	Image-to-video
`kling-v2.5-turbo-pro`	Text-to-video, image-to-video, start-end-to-video
`kling-v2.5-turbo-std`	Text-to-video, image-to-video
`kling-v2.6-pro`	Text-to-video, image-to-video, start-end-to-video
`kling-v2.6-std`	Text-to-video, image-to-video
`kling-v3.0-pro`	Text-to-video, image-to-video, start-end-to-video
`kling-v3.0-std`	Text-to-video, image-to-video, start-end-to-video
`kling-video-o1-pro`	Text-to-video, image-to-video, reference-to-video, start-end-to-video, video-to-video
`kling-video-o1-std`	Text-to-video, image-to-video, reference-to-video, start-end-to-video, video-to-video

Google

Model	Public operations
`veo3`	Text-to-video, image-to-video
`veo3-fast`	Text-to-video, image-to-video
`veo3-pro`	Text-to-video, image-to-video
`veo3.1`	Text-to-video, image-to-video, reference-to-video, start-end-to-video
`veo3.1-fast`	Text-to-video, image-to-video, reference-to-video, start-end-to-video
`veo3.1-pro`	Text-to-video, image-to-video, start-end-to-video

ByteDance

Model	Public operations
`seedance-1.5-pro`	Text-to-video, image-to-video

MiniMax

Model	Public operations
`hailuo-2.3-fast`	Image-to-video
`hailuo-2.3-pro`	Text-to-video, image-to-video
`hailuo-2.3-standard`	Text-to-video, image-to-video

Alibaba

Model	Public operations
`wan-2.2-plus`	Text-to-video, image-to-video
`wan-2.5`	Text-to-video, image-to-video
`wan-2.6`	Text-to-video, image-to-video, reference-to-video

Shengshu

Model	Public operations
`viduq2`	Text-to-video, reference-to-video
`viduq2-pro`	Image-to-video, reference-to-video, start-end-to-video
`viduq2-pro-fast`	Image-to-video, start-end-to-video
`viduq2-turbo`	Image-to-video, start-end-to-video
`viduq3-pro`	Text-to-video, image-to-video, start-end-to-video
`viduq3-turbo`	Text-to-video, image-to-video, start-end-to-video

xAI

Model	Public operations
`grok-imagine-video`	Text-to-video, image-to-video
`grok-imagine-image-to-video`	Image-to-video
`grok-imagine-text-to-video`	Text-to-video
`grok-imagine-upscale`	Video-to-video

Other

Model	Public operations
`topaz-video-upscale`	Video-to-video

Usage Examples

Text-to-Video

response = requests.post(f"{BASE}/videos/generations",
    headers=headers,
    json={
        "model": "sora-2",
        "prompt": "A calm cinematic shot of a cat walking through a sunlit garden.",
        "operation": "text-to-video",
        "duration": 4,
        "aspect_ratio": "16:9"
    }
)

Image-to-Video

response = requests.post(f"{BASE}/videos/generations",
    headers=headers,
    json={
        "model": "hailuo-2.3-standard",
        "prompt": "The scene begins from the provided image and adds gentle natural motion.",
        "operation": "image-to-video",
        "image_url": "https://example.com/portrait.jpg",
        "duration": 6,
        "aspect_ratio": "16:9"
    }
)

Kling 3.0 Elements

Use kling_elements with kling-3.0-video when you need element references. Provide an image-conditioned request (image_url, image_urls, start_image, or end_image) and reference each element in the prompt with @name.

response = requests.post(f"{BASE}/videos/generations",
    headers=headers,
    json={
        "model": "kling-3.0-video",
        "prompt": "Place @hero_bag on a studio turntable with soft product lighting.",
        "operation": "image-to-video",
        "image_url": "https://example.com/studio-start.png",
        "duration": 5,
        "resolution": "720p",
        "kling_elements": [
            {
                "name": "hero_bag",
                "description": "black leather handbag",
                "element_input_urls": [
                    "https://example.com/bag-front.png",
                    "https://example.com/bag-side.png"
                ]
            }
        ]
    }
)

Reference-to-Video

For seedance-2.0 and seedance-2.0-fast, LemonData currently supports up to 9 reference images plus up to 3 reference videos and 3 reference audios. duration controls generated output length only; it does not define a separate reference video input duration limit.

response = requests.post(f"{BASE}/videos/generations",
    headers=headers,
    json={
        "model": "veo3.1",
        "prompt": "Keep the same subject identity and palette while adding subtle motion.",
        "operation": "reference-to-video",
        "reference_images": [
            "https://example.com/ref-a.jpg",
            "https://example.com/ref-b.jpg"
        ],
        "duration": 8,
        "resolution": "720p",
        "aspect_ratio": "9:16"
    }
)

Start-End-to-Video

response = requests.post(f"{BASE}/videos/generations",
    headers=headers,
    json={
        "model": "viduq2-pro",
        "prompt": "Smooth transition from day to night.",
        "operation": "start-end-to-video",
        "start_image": "https://example.com/city-day.jpg",
        "end_image": "https://example.com/city-night.jpg",
        "duration": 5,
        "resolution": "720p",
        "aspect_ratio": "16:9"
    }
)

Video-to-Video

response = requests.post(f"{BASE}/videos/generations",
    headers=headers,
    json={
        "model": "topaz-video-upscale",
        "operation": "video-to-video",
        "video_url": "https://example.com/source.mp4",
        "prompt": "Upscale this clip while preserving the original motion."
    }
)

Motion Control

response = requests.post(f"{BASE}/videos/generations",
    headers=headers,
    json={
        "model": "kling-3.0-motion-control",
        "operation": "motion-control",
        "prompt": "Keep the subject stable while following the motion reference.",
        "image_url": "https://example.com/subject.png",
        "video_url": "https://example.com/motion.mp4",
        "resolution": "720p"
    }
)

Parameters Reference

Parameter	Type	Notes
`operation`	string	Explicit `operation` is recommended in production.
`image_url`	string	Preferred image input form for broad cross-model compatibility.
`image`	string	Inline data URL; useful for debugging and small local integrations.
`reference_images`	string[]	Canonical public field for reference-image conditioning.
`reference_image_type`	string	Optional `asset` / `style` selector when supported.
`video_url`	string	Required for current public `video-to-video` and `motion-control` models.
`audio_url`	string	Used by model-specific audio-conditioned flows when available.
`output_audio`	boolean	Veo 3 family defaults to `true` when omitted.

Model Selection Guide

Best Quality

veo3.1-pro, kling-video-o1-pro, and viduq3-pro are strong choices when fidelity matters more than speed.

Fastest Public Options

veo3.1-fast, hailuo-2.3-fast, and viduq3-turbo are good starting points for faster iteration.

Reference-Heavy Flows

Use veo3.1, veo3.1-fast, wan-2.6, or kling-video-o1-pro / std when you need dedicated reference-image conditioning.

Video-to-Video

topaz-video-upscale, grok-imagine-upscale, and kling-video-o1-pro / std cover the current generally enabled public video-to-video paths.

Billing

Billing is model-dependent. Some public video models are effectively priced per request, while others are priced per second. Check the Models page or the Pricing API for the current public price surface.

Getting Started

Core Guides

Coding Agents

Overview

Availability and polling

Model and media behavior

Async Workflow

Public Operations

Capability Matrix

Capability Definitions

Current Public Model Inventory

OpenAI

Kuaishou

Google

ByteDance

MiniMax

Alibaba

Shengshu

xAI

Other

Usage Examples

Text-to-Video

Image-to-Video

Kling 3.0 Elements

Reference-to-Video

Start-End-to-Video

Video-to-Video

Motion Control

Parameters Reference

Model Selection Guide

Best Quality

Fastest Public Options

Reference-Heavy Flows

Video-to-Video

Billing

Getting Started

Core Guides

Coding Agents

Documentation Index

​Overview

​Availability and polling

​Model and media behavior

​Async Workflow

​Public Operations

​Capability Matrix

​Capability Definitions

​Current Public Model Inventory

​OpenAI

​Kuaishou

​Google

​ByteDance

​MiniMax

​Alibaba

​Shengshu

​xAI

​Other

​Usage Examples

​Text-to-Video

​Image-to-Video

​Kling 3.0 Elements

​Reference-to-Video

​Start-End-to-Video

​Video-to-Video

​Motion Control

​Parameters Reference

​Model Selection Guide

Best Quality

Fastest Public Options

Reference-Heavy Flows

Video-to-Video

​Billing

Overview

Availability and polling

Model and media behavior

Async Workflow

Public Operations

Capability Matrix

Capability Definitions

Current Public Model Inventory

OpenAI

Kuaishou

Google

ByteDance

MiniMax

Alibaba

Shengshu

xAI

Other

Usage Examples

Text-to-Video

Image-to-Video

Kling 3.0 Elements

Reference-to-Video

Start-End-to-Video

Video-to-Video

Motion Control

Parameters Reference

Model Selection Guide

Billing