Create Video

Overview

For coding agents, discover the current recommended video shortlist first with GET /v1/models?recommended_for=video, then send the selected model explicitly to this endpoint. Video generation is asynchronous. You submit a request, receive a canonical async task ID and poll_url, then poll for the result.

Polling behavior

For the most reliable polling behavior, follow the exact poll_url returned by the create request. If a create response returns poll_url, call that exact URL. When it points to /v1/tasks/{id}, treat that as the canonical fixed status endpoint. The async task identifier may surface as id or task_id depending on the adapter. Treat them as the same task identity.

Model and media behavior

Audio output is model-dependent. In LemonData, Veo 3 family requests default to audio-on when output_audio is omitted. When a model supports audio control, use output_audio to explicitly toggle it. The camelCase alias outputAudio is also accepted for compatibility. For production integrations, prefer publicly reachable https URLs for images, videos, and audio. Inline data: URLs remain supported for compatible models, but large base64 payloads are harder to retry, inspect, and debug.

Request Body

model

string

default:"sora-2"

Video model ID. API default is sora-2. See the Video Generation Guide for the current public model matrix and supported capabilities.

prompt

string

required

Text description of the video to generate. Required for most public video models.

operation

string

Video operation to run. Supported contract values include text-to-video, image-to-video, reference-to-video, start-end-to-video, video-to-video, video-extension, audio-to-video, and motion-control. LemonData can infer the operation from the supplied inputs, but explicit operation is recommended for production reliability.

image_url

string

Publicly accessible URL of the starting image for image-to-video generation. For best cross-model compatibility, prefer image_url.

image

string

Inline image as a data URL (for example, data:image/jpeg;base64,...). Supported by compatible models, but image_url provides the broadest compatibility across public video models.

reference_images

array

Reference image inputs for models that support dedicated reference conditioning. The supported count is model-dependent. For seedance-2.0 and seedance-2.0-fast, LemonData currently supports up to 9 reference images plus up to 3 reference videos and 3 reference audios. Public https URLs are recommended; compatible models also accept inline data: URLs.

reference_image_type

string

Optional reference role for models that distinguish between asset and style references.

kling_elements

array

Kling 3.0 element reference definitions. Only supported by kling-3.0-video for image-conditioned requests. Define 1-3 elements; each element has name, optional description, and element_input_urls with 2-4 image URLs. Reference an element in prompt as @name.

video_url

string

Publicly accessible URL of the source video. Required for video-to-video style flows and for motion-control models that combine a subject image with a motion reference video.

video_urls

array

Additional reference video inputs for models that support multimodal reference conditioning. The supported count is model-dependent. For seedance-2.0 and seedance-2.0-fast, LemonData currently supports up to 3 reference videos.

audio_url

string

Publicly accessible audio URL for models that support audio-to-video.

audio_urls

array

Additional reference audio inputs for models that support multimodal reference conditioning. The supported count is model-dependent. For seedance-2.0 and seedance-2.0-fast, LemonData currently supports up to 3 reference audios.

task_id

string

Provider-specific task identifier used by some continuation, extension, or derivative flows.

extend_at

integer

Model-specific extension start offset used by some video-extension flows.

extend_times

string

Model-specific extension multiplier or repeat count used by some video-extension flows.

duration

integer

Generated output video duration in seconds (model-dependent). This field controls output length, not the duration of reference video inputs.

aspect_ratio

string

Aspect ratio (for example, 16:9, 9:16, or 1:1).

resolution

string

Model-dependent output resolution (for example, 720p, 1080p, or 4k).

output_audio

boolean

Model-dependent audio output toggle. In LemonData, Veo 3 family requests default to true when this field is omitted. Other public video models follow their governed default behavior. The camelCase alias outputAudio is accepted for compatibility.

fps

integer

Frames per second (1-120) for models that expose FPS control.

negative_prompt

string

What to avoid in the generated video.

seed

integer

Random seed for reproducible generation.

cfg_scale

number

Prompt adherence strength (0-20) for models that expose CFG-style control.

motion_strength

number

Motion intensity (0-1) for models that expose it.

start_image

string

URL or compatible image input for the first frame in start-end-to-video.

end_image

string

URL or compatible image input for the last frame in start-end-to-video.

size

string

Model-specific size tier used by some OpenAI-compatible video models.

watermark

boolean

Optional watermark toggle for models that expose it.

effect_type

string

Model-specific effect selector for specialized editing flows.

user

string

A unique identifier for the end-user.

Compatibility Notes

Canonical public fields are snake_case: reference_images, reference_image_type, and output_audio.
For compatibility, LemonData also accepts the camelCase aliases referenceImages, referenceImageType, and outputAudio.
If operation is omitted, LemonData infers it from the supplied inputs. For production traffic, explicit operation is still recommended.

Media Input Best Practices

Prefer publicly reachable https URLs over inline base64 for image_url, reference_images, video_url, and audio_url.
Avoid mixing inline base64 and remote URLs in the same request when possible; using one representation per request is easier to reason about and debug.
If you use signed URLs, keep them valid long enough to cover retries and asynchronous task creation.

Response

string

Canonical async task identifier. Treat this as the same identity as task_id when both are present.

task_id

string

Canonical async task identifier for polling. This is the same task identity used by async status endpoints.

poll_url

string

Preferred polling URL for this task. Use this exact path when checking status.

billing_transaction_id

string

LemonData billing transaction ID when settlement already completed. This is the dashboard/accounting transaction identifier and is separate from the async id / task_id.

status

string

Initial status: pending.

created

integer

Unix timestamp when the task was created.

model

string

Model used.

video_url

string

Direct video URL when the result is already available.

video

object

Single video payload with url, duration, width, and height when available.

videos

array

Multiple video payloads when the provider returns more than one output.

error

string

Error message or structured error object when the task fails.

curl -X POST "https://api.lemondata.cc/v1/videos/generations" \
  -H "Authorization: Bearer sk-your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "sora-2",
    "prompt": "A cat walking through a garden, cinematic lighting",
    "operation": "text-to-video",
    "duration": 4,
    "aspect_ratio": "16:9"
  }'

{
  "id": "ldtask_aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
  "task_id": "ldtask_aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
  "poll_url": "/v1/tasks/ldtask_aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
  "status": "pending",
  "model": "sora-2",
  "created": 1706000000
}

Image to Video

response = requests.post(
    "https://api.lemondata.cc/v1/videos/generations",
    headers={"Authorization": "Bearer sk-your-api-key"},
    json={
        "model": "hailuo-2.3-standard",
        "prompt": "The scene begins from the provided image and adds gentle natural motion.",
        "operation": "image-to-video",
        "image_url": "https://example.com/image.jpg",
        "duration": 6,
        "aspect_ratio": "16:9"
    }
)

Kling 3.0 Elements

Use kling_elements with kling-3.0-video when you need element references. Provide an image-conditioned request (image_url, image_urls, start_image, or end_image) and reference each element in the prompt with @name.

response = requests.post("https://api.lemondata.cc/v1/videos/generations",
    headers=headers,
    json={
        "model": "kling-3.0-video",
        "prompt": "Place @hero_bag on a studio turntable with soft product lighting.",
        "operation": "image-to-video",
        "image_url": "https://example.com/studio-start.png",
        "duration": 5,
        "resolution": "720p",
        "kling_elements": [
            {
                "name": "hero_bag",
                "description": "black leather handbag",
                "element_input_urls": [
                    "https://example.com/bag-front.png",
                    "https://example.com/bag-side.png"
                ]
            }
        ]
    }
)

Reference to Video

Use operation=reference-to-video when the model supports dedicated reference conditioning. In LemonData’s public contract, image references use reference_images, while multimodal reference videos and audios use video_urls and audio_urls. For seedance-2.0 and seedance-2.0-fast, LemonData currently supports up to 9 reference images plus up to 3 reference videos and 3 reference audios. duration controls generated output length only; it does not set a separate limit for reference video input duration.

response = requests.post(
    "https://api.lemondata.cc/v1/videos/generations",
    headers={"Authorization": "Bearer sk-your-api-key"},
    json={
        "model": "veo3.1",
        "prompt": "Keep the same subject identity, palette, and framing while adding subtle natural motion.",
        "operation": "reference-to-video",
        "reference_images": [
            "https://example.com/ref-a.jpg",
            "https://example.com/ref-b.jpg"
        ],
        "reference_image_type": "asset",
        "duration": 8,
        "resolution": "720p",
        "aspect_ratio": "9:16"
    }
)

Keyframe Control

Use start_image and end_image to control the first and last frames:

response = requests.post(
    "https://api.lemondata.cc/v1/videos/generations",
    headers={"Authorization": "Bearer sk-your-api-key"},
    json={
        "model": "viduq2-pro",
        "prompt": "Smooth transition from day to night",
        "operation": "start-end-to-video",
        "start_image": "https://example.com/day.jpg",
        "end_image": "https://example.com/night.jpg",
        "duration": 5,
        "resolution": "720p",
        "aspect_ratio": "16:9"
    }
)

Video to Video

Use operation=video-to-video when the model accepts an existing video as the primary input.

response = requests.post(
    "https://api.lemondata.cc/v1/videos/generations",
    headers={"Authorization": "Bearer sk-your-api-key"},
    json={
        "model": "topaz-video-upscale",
        "operation": "video-to-video",
        "video_url": "https://example.com/source.mp4",
        "prompt": "Upscale the clip while preserving the original motion.",
        "resolution": "1080p"
    }
)

Motion Control

Use operation=motion-control when the model expects both a subject image and a motion reference video. LemonData maps the public image_url + video_url request shape to the upstream motion-control contract.

response = requests.post(
    "https://api.lemondata.cc/v1/videos/generations",
    headers={"Authorization": "Bearer sk-your-api-key"},
    json={
        "model": "kling-3.0-motion-control",
        "operation": "motion-control",
        "prompt": "Keep the subject identity stable while following the motion reference.",
        "image_url": "https://example.com/subject.png",
        "video_url": "https://example.com/motion.mp4",
        "resolution": "720p"
    }
)

Audio-to-Video and Video Extension Availability

LemonData’s public contract accepts audio-to-video and video-extension for model-specific flows, but the current generally enabled public video model list in this docs build does not include a broad public model that advertises either capability. Use the Models API or the Models page to confirm current availability before integrating those operations.

Currently Enabled Public Video Models

This list is aligned with the current enabled public video model inventory in this docs build. For the freshest state, query the Models API.

OpenAI

Model	Public operations
`sora-2`	Text-to-video, image-to-video
`sora-2-pro`	Text-to-video, image-to-video
`sora-2-pro-storyboard`	Image-to-video

Kuaishou

Model	Public operations
`kling-3.0-motion-control`	Motion control
`kling-3.0-video`	Text-to-video, image-to-video, start-end-to-video, element references
`kling-v2.5-turbo-pro`	Text-to-video, image-to-video, start-end-to-video
`kling-v2.5-turbo-std`	Text-to-video, image-to-video
`kling-v2.6-pro`	Text-to-video, image-to-video, start-end-to-video
`kling-v2.6-std`	Text-to-video, image-to-video
`kling-v3.0-pro`	Text-to-video, image-to-video, start-end-to-video
`kling-v3.0-std`	Text-to-video, image-to-video, start-end-to-video
`kling-video-o1-pro`	Text-to-video, image-to-video, reference-to-video, start-end-to-video, video-to-video
`kling-video-o1-std`	Text-to-video, image-to-video, reference-to-video, start-end-to-video, video-to-video

Google

Model	Public operations
`veo3`	Text-to-video, image-to-video
`veo3-fast`	Text-to-video, image-to-video
`veo3-pro`	Text-to-video, image-to-video
`veo3.1`	Text-to-video, image-to-video, reference-to-video, start-end-to-video
`veo3.1-fast`	Text-to-video, image-to-video, reference-to-video, start-end-to-video
`veo3.1-pro`	Text-to-video, image-to-video, start-end-to-video

ByteDance

Model	Public operations
`seedance-1.5-pro`	Text-to-video, image-to-video

MiniMax

Model	Public operations
`hailuo-2.3-fast`	Image-to-video
`hailuo-2.3-pro`	Text-to-video, image-to-video
`hailuo-2.3-standard`	Text-to-video, image-to-video

Alibaba

Model	Public operations
`wan-2.2-plus`	Text-to-video, image-to-video
`wan-2.5`	Text-to-video, image-to-video
`wan-2.6`	Text-to-video, image-to-video, reference-to-video

Shengshu

Model	Public operations
`viduq2`	Text-to-video, reference-to-video
`viduq2-pro`	Image-to-video, reference-to-video, start-end-to-video
`viduq2-pro-fast`	Image-to-video, start-end-to-video
`viduq2-turbo`	Image-to-video, start-end-to-video
`viduq3-pro`	Text-to-video, image-to-video, start-end-to-video
`viduq3-turbo`	Text-to-video, image-to-video, start-end-to-video

xAI

Model	Public operations
`grok-imagine-image-to-video`	Image-to-video
`grok-imagine-text-to-video`	Text-to-video
`grok-imagine-upscale`	Video-to-video

Other

Model	Public operations
`topaz-video-upscale`	Video-to-video

Core

Cache

Text

Files & Batches

Images & Media

Async Jobs

Gemini Native

Management

Overview

Polling behavior

Model and media behavior

Request Body

Compatibility Notes

Media Input Best Practices

Response

Image to Video

Kling 3.0 Elements

Reference to Video

Keyframe Control

Video to Video

Motion Control

Audio-to-Video and Video Extension Availability

Currently Enabled Public Video Models

OpenAI

Kuaishou

Google

ByteDance

MiniMax

Alibaba

Shengshu

xAI

Other

Core

Cache

Text

Files & Batches

Images & Media

Async Jobs

Gemini Native

Management

Documentation Index

​Overview

​Polling behavior

​Model and media behavior

​Request Body

​Compatibility Notes

​Media Input Best Practices

​Response

​Image to Video

​Kling 3.0 Elements

​Reference to Video

​Keyframe Control

​Video to Video

​Motion Control

​Audio-to-Video and Video Extension Availability

​Currently Enabled Public Video Models

​OpenAI

​Kuaishou

​Google

​ByteDance

​MiniMax

​Alibaba

​Shengshu

​xAI

​Other

Overview

Polling behavior

Model and media behavior

Request Body

Compatibility Notes

Media Input Best Practices

Response

Image to Video

Kling 3.0 Elements

Reference to Video

Keyframe Control

Video to Video

Motion Control

Audio-to-Video and Video Extension Availability

Currently Enabled Public Video Models

OpenAI

Kuaishou

Google

ByteDance

MiniMax

Alibaba

Shengshu

xAI

Other