13.6B Unified Video Model

LongCat Video AI Generator Text to Video & Image to Video

Turn prompts, still images, or partial clips into coherent, minutes-long 720p/30fps videos. One architecture for text-to-video, image-to-video, and video continuation—built for production workflows without color drift.

480p: 1 credit/sec · 720p: 2 credits/sec

Generate 720p videos at 30fps in minutes. Extend scenes with native continuation pretraining—minutes-long output without color drifting or quality degradation.

LongCat Video Overview

A unified 13.6B foundational model for text-to-video, image-to-video, and long-form continuation.

LongCat-Video is a foundational video generation model with 13.6 billion parameters, developed by Meituan’s LongCat team. It delivers strong performance across text-to-video, image-to-video, and video-continuation—three tasks unified in a single architecture so teams can ideate, animate, and extend stories without switching models.

The model is designed as a first step toward world models: it excels at efficient, high-quality long video generation. Native pretraining on video-continuation tasks enables minutes-long narratives while preserving subject identity, lighting, and color consistency across extended durations.

For production teams, LongCat Video outputs 720p resolution at 30 frames per second within practical runtimes. A coarse-to-fine strategy along temporal and spatial axes, combined with Block Sparse Attention, keeps inference efficient even at higher resolutions—suitable for SaaS dashboards, marketing pipelines, and rapid creative iteration.

Quality is reinforced through multi-reward reinforcement learning from human feedback (RLHF) using Group Relative Policy Optimization (GRPO). Benchmarks cited in public materials show performance comparable to leading open-source video models and recent commercial solutions.

Whether you start from a text brief, a product photo, or an existing clip, LongCat Video gives you one consistent backbone for short social loops, ad concepts, and longer story-driven sequences—ready to export as edit-friendly MP4 assets.

Key Features of LongCat Video

Unified tasks, long-form continuity, efficient 720p inference, and GRPO-aligned quality.

Unified Architecture for Multiple Tasks

Text-to-video, image-to-video, and video-continuation live in one framework. A single model natively supports all three modes with consistent quality—no juggling separate checkpoints per workflow.

Long Video Generation

Pretrained on continuation tasks, LongCat Video produces minutes-long sequences without color drifting or quality degradation. Ideal for multi-beat ads, explainers, and narrative content that must hold together end to end.

Efficient Inference at 720p / 30fps

Generate professional 720p, 30fps clips within minutes. Coarse-to-fine generation plus Block Sparse Attention optimizes speed at high resolution for real production schedules.

Strong Performance with Multi-Reward RLHF

GRPO-trained alignment improves motion quality, aesthetics, and prompt adherence. Evaluations on internal and public benchmarks position LongCat Video alongside top open and commercial video generators.

Prompt-Guided Motion Control

Direct camera movement, subject animation, and environmental effects with natural-language prompts. Specify dolly shots, parallax, gestures, and atmosphere instead of hand-keying every frame.

Flexible Duration & Frame Control

Scale clip length by frame count at 30fps—from punchy 2–3 second loops to longer 4–6 second beats and extended continuations—so output length matches your channel and story needs.

Explore LongCat Video AI Video Showcase

Sample outputs across text-to-video, image-to-video, and continuation—each card includes the prompt so you can reproduce or adapt results in your workflow.

Formula 1 cockpit text-to-video sample
Text-to-Video

Cinematic Text-to-Video

A graceful ballerina in a flowing white dress dances slowly on the surface of a calm shallow lake at sunrise. She gently extends her arms and turns her body in an elegant ballet pose, creating soft ripples around her feet. The water reflects her full figure like a mirror, with a perfect inverted reflection beneath her. Mist floats above the lake, distant forest silhouettes line the horizon, and the sky glows with soft pastel orange, pink, and blue morning light. Cinematic wide shot, dreamy atmosphere, smooth slow camera movement, natural body motion, realistic water ripples, elegant choreography, soft haze, high detail, serene and poetic mood.

Surreal text-to-video environment
Text-to-Video

Surreal Scene Generation

Macro cinematic close-up of a ripe red strawberry wrapped inside a transparent crystal lattice shell on a warm wooden table. Sunlight shines through the glass-like net, creating sparkling reflections, rainbow highlights, and soft shadows. Shallow depth of field, slow camera push-in, luxurious surreal product shot, ultra realistic, high detail.

Image-to-video product animation
Image-to-Video

Product Image-to-Video

First-person view of a humanoid robot at a wooden office desk. Two white robotic arms with black joints interact with everyday objects: a laptop, glass of water, smartphone, teddy bear, and tissue box. The robot hand slowly reaches for the glass with precise finger movement. Warm daylight, modern workspace, realistic shadows, smooth stable camera, cinematic realism.

Portrait image-to-video sample
Image-to-Video

Portrait Motion from Still

Gentle head movement, natural blinking, soft background bokeh shift, studio lighting unchanged, realistic skin texture.

Long video continuation sample
Video Continuation

Long-Form Continuation

First-person POV mountain biking downhill on a rocky alpine trail, handlebars and gloved hands visible, turquoise lake and snow-capped mountains in the background, bright sunlight, GoPro wide-angle view, fast motion, slight camera shake, realistic outdoor adventure footage.

Extended narrative video continuation
Video Continuation

Multi-Beat Narrative Extension

Extend establishing shot into follow-up beats—camera intent and subject identity remain stable as duration increases past one minute.

Why Choose LongCat Video

One model, minutes-long output, production speed, and benchmark-competitive quality.

One Model for Every Video Task

Stop maintaining separate pipelines for generation, animation, and extension. LongCat Video unifies all three modes so creative direction stays consistent from first frame to final export.

Minutes-Long Without Drift

Continuation-native pretraining reduces color shift and identity collapse—the pain points that break long-form AI video. Ship longer stories clients can actually approve.

Production-Ready Speed & Resolution

720p at 30fps in minutes fits weekly content cadences and ad iteration. Block Sparse Attention keeps high-resolution runs practical behind credit-based SaaS pricing.

Benchmark-Competitive Quality

Multi-reward GRPO alignment targets motion coherence, prompt fidelity, and visual appeal—on par with leading open models and recent commercial offerings in published evaluations.

How to Use LongCat Video

From input to export in three steps: text, image, or clip in; 720p video out.

Choose Your Input Mode - step 01
STEP 01

Choose Your Input Mode

Start with a text prompt for text-to-video, upload a still image for image-to-video, or provide an existing clip for video continuation. Describe subject, style, camera motion, and atmosphere in natural language.

Set Duration & Motion - step 02
STEP 02

Set Duration & Motion

Pick duration in 5-second steps at 30fps—from short social loops to longer beats. Refine motion with specific prompts: dolly, pan, zoom, subject gestures, and environmental effects.

Generate, Extend & Export - step 03
STEP 03

Generate, Extend & Export

Produce your first pass, then extend with continuation for minutes-long narratives. Download MP4 at 720p/30fps and drop into your editor, ad platform, or CMS workflow.

LongCat Video Use Cases

From ads and social content to previz and long-form explainers—see where unified video generation delivers the most value.

Marketing & Advertising

Turn product photography into dynamic video ads with camera movement and atmospheric effects. Test multiple motion variants from one hero image before committing to a full shoot.

Social & Content Creation

Animate travel stills, portraits, and B-roll for Reels, Shorts, and TikTok. Extend a strong opening frame into longer narrative clips without reshooting.

Concept & Pre-Visualization

Prototype storyboards and concept art as motion previews. Explore camera and pacing options in hours instead of days—ideal for agency pitches and film previz.

Long-Form Explainers & Stories

Build multi-beat explainers and serialized content with continuation that preserves identity, lighting, and color—reducing revision loops on long sequences.

Dynamic motion text-to-video
Action motion sample
Surreal scene video
Surreal environment continuation
Image-to-video sample 1-5
Image-to-video sample 1-6
Image-to-video sample 2-1
Image-to-video sample 2-4
Long video sample 1-3-1
Long video sample 1-4-1
Long video sample 1-5-1
Long video sample 2-5
User Voices

What Users Say About LongCat Video

Teams across marketing, agencies, SaaS, and entertainment rely on LongCat Video for production-ready motion.

We turned a single product shot into a minute-long hero sequence. Identity stayed locked and motion felt natural—export was fast enough for our weekly drops.

Long-form ads from one still image.
Maya R. · Creative Director
Maya R. · Creative DirectorDTC Apparel
5.0 / 5

I continued scenes from one frame and got smooth 720p/30fps clips without switching tools. It genuinely changed my production cadence.

Unified pipeline for solo creators.
Alex P. · Creator
Alex P. · CreatorEducation YouTube
5.0 / 5

We prototyped ad concepts in hours. Characters stayed consistent across beats so clients approved story logic before we shot anything.

Faster client approvals on storyboards.
Jordan C. · Producer
Jordan C. · ProducerBoutique Agency
5.0 / 5

Scene extensions matched lighting and camera intent—no weird color drift. Our explainer series finally scales past 60 seconds.

Long-form continuity for explainers.
Priya S. · Head of Content
Priya S. · Head of ContentB2B SaaS
5.0 / 5

Image-to-video plus continuation in one flow cut revision loops in half. We iterate motion prompts instead of re-rendering entire timelines.

Half the revisions on launch work.
Diego M. · Motion Designer
Diego M. · Motion DesignerLaunch Campaigns
5.0 / 5

Perfect for stitching trailer beats—extends action with coherent motion, then we selectively regenerate moments to refine pacing.

Trailer previz without full mocap.
Elena K. · Trailer Editor
Elena K. · Trailer EditorIndie Game Studio
5.0 / 5
LongCat AI Pricing

Choose Your Credit Pack

One-time purchases for LongCat Video and LongCat Avatar. Credits never expire—use them across generation, editing, and avatar workflows.

Base

$9.9one-time
90 Credits
Up to 18 videos generation
Audio-driven avatar generation
480p, 720p, 1080p resolution
Super-realistic lip synchronization
Natural human dynamics
Up to 30s audio duration
Long-term identity consistency
Most Popular

Pro

$29.9one-time
400 Credits
Up to 80 videos generation
Audio-driven avatar generation
480p, 720p, 1080p resolution
Super-realistic lip synchronization
Natural human dynamics
Multi-Character support
Up to 30s audio duration
Long-term identity consistency
Priority processing

Ultimate

$49.9one-time
800 Credits
Up to 160 videos generation
Audio-driven avatar generation
480p, 720p, 1080p resolution
Super-realistic lip synchronization
Natural human dynamics
Multi-Character interactions
Long-form video generation
Up to 30s audio duration
Long-term identity consistency
Priority processing
Production-ready quality

Creator

$99.9one-time
1800 Credits
Up to 360 videos generation
Audio-driven avatar generation
480p, 720p, 1080p resolution
Super-realistic lip synchronization
Natural human dynamics
Multi-Character & infinite-length support
Long-form video generation
Up to 30s audio duration
Long-term identity consistency
Highest priority processing
Production-ready architecture
Commercial license

Choose one-time credits • Flexible billing options

Choose one-timeCredits never expireSecure paymentsEmail support support@longcatai.net

FAQ of LongCat Video

Everything you need to know about unified video generation with LongCat Video.

LongCat Video is Meituan’s unified 13.6B video generation model for text-to-video, image-to-video, and video continuation—optimized for minutes-long 720p/30fps output with efficient inference.

Three native modes: generate from text, animate a still image, or continue an existing clip—all within one architecture for consistent look and identity.

The model is pretrained for continuation and can produce minutes-long sequences without color drifting or quality degradation, depending on your frame count and workflow settings.

Production workflows target 720p (1280×720) at 30 frames per second. Duration scales with frame count—e.g., 90 frames ≈ 3 seconds, 180 frames ≈ 6 seconds.

It unifies generation and extension in one model, with continuation pretraining and attention optimizations that stabilize identity and color over longer durations.

Yes. Prompts control camera movement (dolly, pan, zoom), subject animation, and environmental effects. Specific motion descriptions yield more predictable results than generic “make it dynamic” prompts.

Common formats such as JPG, JPEG, PNG, WebP, GIF, and AVIF are supported for still-image inputs.

Yes. Credit-based pricing, queue-based generation, and MP4 export fit multi-tenant products. Integrate behind your auth, storage, and billing layer.

Get started

Start Creating with LongCat Video Today

Generate text-to-video, image-to-video, and continuation clips at 720p/30fps. Build minutes-long narratives with one unified model—ready for your next campaign or content pipeline.