Skip to main content
Tools & Tech Stack

LTX-2 Complete Guide 2026: Review + How To

32 min read
LTX-2 Complete Guide 2026 hero image with title in white and orange on dark navy background, Lightricks open-source AI video model branding

Hero image is AI-generated. See our AI-disclosure policy.

TL;DR: LTX-2.3 is Lightricks' open-weights AI video model, released March 5, 2026. The only True Model that ships open weights with synchronized 4K video plus audio in a single diffusion pass. fal.ai pricing: Fast 0.04 USD/s 1080p, Pro 0.06 USD/s i2v or 0.08 USD/s t2v, Quality 0.0024 USD/MP (LoRA 0.0027 USD/MP). LTX-2 Community License is free under 10M USD ARR, NOT Apache 2.0. IP indemnification capped at lower of annual fees or 1M USD on the sales-gated direct API only. Self-host needs 16 GB VRAM on a Blackwell card for audio-synced 1080p, not the marketed 12 GB. C2PA not embedded by default; manual c2patool work required for EU AI Act Article 50 compliance from August 2, 2026.

LTX-2 is Lightricks Lab’s open-weights AI video generation model, with LTX-2.3 (released March 5, 2026) as the current 22-billion-parameter version. It is the only True Model in the 2026 video stack that ships open weights and generates synchronized 4K video plus audio in a single diffusion pass, the only one offering native 4K at 50 fps, and the only one with a documented (if capped) IP indemnification path on its primary cloud API. The architecture is an asymmetric dual-stream Diffusion Transformer combining a 14-billion-parameter video stream with a 5-billion-parameter audio stream and bidirectional cross-modal attention, per the arXiv 2601.03233 paper. This guide covers product identity, the three fal.ai product tiers, ComfyUI self-host workflow with VRAM math, head-to-head benchmarks against the True Model lineup, and the EU AI Act and California AB 853 compliance picture for client work shipping after August 2, 2026.


What Is LTX-2?

Answer capsule. LTX-2 is Lightricks’ open-weights AI video foundation model, currently shipping as LTX-2.3 (released March 5, 2026) at 22 billion parameters. The architecture is an asymmetric dual-stream Diffusion Transformer: a 14B-parameter video stream with 3D Rotary Positional Embedding plus a 5B-parameter audio stream with 1D RoPE, joined by bidirectional cross-modal attention. It is the only True Model in 2026 that generates synchronized video and audio in a single forward pass with open downloadable weights.

Lightricks is a Jerusalem-headquartered creative-software company founded in 2013 by Zeev Farbman and four co-founders. The consumer product portfolio (Facetune, Videoleap, Photoleap) has accumulated more than 730 million downloads with 15 million monthly active users per the Lightricks corporate page. The company reached a 1.8 billion USD valuation following a 130 million USD Series D in September 2021, with total cumulative funding of 335 million USD across four rounds. The LTX research division is led by Yoav HaCohen as Generative AI Research Team Lead, who is the first author on the LTX-2 arXiv paper.

The LTX-2 lineage:

  • LTX-Video (LTX-1) released late 2024 / early 2025 as the original 2B-class video-only model with no audio capability.
  • LTX-2 (19B) released January 6, 2026 as the first dual-stream architecture combining a 14B video stream and a 5B audio stream with native synchronized output.
  • LTX-2.3 (22B) released March 5, 2026 with a rebuilt Variational Autoencoder for sharper textures, an upgraded HiFi-GAN audio vocoder, gated attention text conditioning, native 9:16 portrait support, 24/48 FPS options, last-frame interpolation, and LoRA fine-tuning.

Native audio generation is the architectural achievement. Rather than the conventional “generate video, dub audio in post” pipeline, LTX-2.3 denoises video and audio latents inside the same diffusion pass with cross-attention. The audio path processes input as 16 kHz mel-spectrograms encoded by the Audio VAE into 128-dimensional latent tokens at roughly one token per 1/25 of a second, then a HiFi-GAN V1 vocoder with doubled generator capacity reconstructs the output as 24 kHz stereo waveforms with lip sync, foley, and ambient sound. The text encoder is Google Gemma 3 12B Instruction-Tuned, adapted via the UL2 objective and producing specialized “thinking tokens” that improve semantic stability and phonetic accuracy.


LTX-2 Specifications and Architecture

Answer capsule. LTX-2.3 generates synchronized video and audio up to 20 seconds at resolutions from 480p through native 4K (2160p) at frame rates up to 50 fps. The aspect ratio supports both 16:9 landscape and 9:16 portrait (added in 2.3). The Video VAE compresses 33 frames of 512x512 pixel data into a tensor with a compression ratio exceeding 150x. Frame counts must follow the formula (n times 8) plus 1, yielding valid lengths of 9, 17, 25, 57, 121, 241, and so on. Width and height must be strictly divisible by 32. Any deviation fails the generation request.

Specifications verified against the Hugging Face Lightricks/LTX-2.3 model card and the arXiv paper:

SpecificationLTX-2.3
Total parameters22 billion (14B video + 5B audio + connector layers)
ArchitectureAsymmetric dual-stream Diffusion Transformer
Transformer layers48
Maximum resolution2160p (4K) native
Maximum frame rate50 fps
Maximum duration20 seconds
Native audio24 kHz stereo with lip sync, foley, ambient
Audio sample rate (input)16 kHz mel-spectrograms
Audio sample rate (output)24 kHz HiFi-GAN V1 vocoder
Aspect ratios16:9 landscape, 9:16 portrait
Text encoderGoogle Gemma 3 12B IT (UL2 adapted)
Text encoder context (effective)~1,000 tokens (~750 English words)
Image conditioningStart frame, end frame, multi-keyframe
LoRA fine-tuningSupported on Quality tier

The Gemma 3 12B encoder is a load-bearing technical detail with operational consequences. Although the fal.ai API exposes a 10,000-character prompt field, the encoder context window is throttled to roughly 1,000 tokens, and anything past that is silently dropped before the diffusion sampler sees it. Operators writing long narrative prompts past 750 English words are wasting characters.

NVIDIA has featured LTX-2 at both GDC 2026 (RTX AI Garage) and CES 2026, with NVFP4 and NVFP8 datatype support delivering up to 2.5x performance gains and 60 percent lower VRAM usage on RTX 50-series cards. The Lightricks team trained LTX-2 on Google Cloud TPUs via JAX, then partnered with NVIDIA for inference optimization, per the Lightricks Google Cloud case study.


Pricing and Access Paths in 2026

Answer capsule. LTX-2.3 ships in three fal.ai product tiers with two billing models. Fast at 0.04 USD per second 1080p for either text-to-video or image-to-video. Pro at 0.06 USD per second 1080p for image-to-video or 0.08 USD per second 1080p for text-to-video (a 33 percent T2V premium most articles miss). Quality on the separate ltx-2.3-quality SKU bills per megapixel at 0.0024075 USD, or 0.0027075 USD with LoRA fine-tuning. All three tiers include native audio. There is no 720p tier; the floor is 1080p across all endpoints.

LTX-2.3 fal.ai pricing matrix 2026 infographic showing three product tiers Fast at 0.04 USD per second 1080p, Pro at 0.06 USD per second image-to-video and 0.08 USD per second text-to-video at 1080p, Quality at 0.0024 USD per megapixel base and 0.0027 USD per megapixel with LoRA, all tiers include native audio, no 720p tier exists, dark navy background with white and orange text

Cloud pricing matrix (fal.ai, verified 2026-06-05)

Endpoint1080p1440p2160p (4K)
Fast text-to-video0.04 USD/s0.08 USD/s0.16 USD/s
Fast image-to-video0.04 USD/s0.08 USD/s0.16 USD/s
Pro image-to-video0.06 USD/s0.12 USD/s0.24 USD/s
Pro text-to-video0.08 USD/s0.16 USD/s0.32 USD/s
Quality image-to-video0.0024075 USD/MP0.0024075 USD/MP0.0024075 USD/MP
Quality audio-to-video0.0024075 USD/MP0.0024075 USD/MP0.0024075 USD/MP
Quality LoRA image-to-video0.0027075 USD/MP0.0027075 USD/MP0.0027075 USD/MP

A 10-second 1080p generation costs 0.40 USD on Fast, 0.60 USD on Pro image-to-video, 0.80 USD on Pro text-to-video, or 1.20 USD on Quality image-to-video. A 60-second social media campaign requiring six 10-second clips runs 2.40 USD on Fast versus 3.60 USD on Pro image-to-video. That same campaign on Veo 3.1 Quality at 0.40 USD per second would cost 24.00 USD, which is the operator-relevant arbitrage decision documented later in the head-to-head section.

Alternative cloud hosts

  • Replicate runs the lightricks/ltx-2-retake endpoint at 0.10 USD per video-second for editing existing clips.
  • Lightricks LTX Studio consumer subscription at ltx.io/studio/pricing bundles credits: Free with 800 one-time credits, Lite at 12 USD per month (personal only), Standard at 28 USD per month with commercial license, Pro at 100 USD per month with 110,000 credits.
  • Lightricks direct API is sales-gated at console.ltx.video. This is the only access path with documented IP indemnification.
  • Krea AI added LTX-2.3 hosting on April 30, 2026 as a bundled seat subscription.
  • WaveSpeed, Apatero, MindStudio all expose LTX-2.3 inference for niche workflows.

Self-host

The weights download free from Hugging Face under the LTX-2 Community License. Hardware costs amortize against either an RTX 5090 (32 GB) at approximately 3,200 USD or an RTX 4090 (24 GB) at approximately 2,000 USD on the used market. Break-even versus fal.ai Fast at 0.04 USD per second lands at approximately 25 minutes of finished output per month on the RTX 4090 or 37 minutes per month on the RTX 5090. Critically, the RTX 4090 has a documented duty-cycle ceiling around 8 minutes per month for audio-synced 1080p work because the Gemma 3 text encoder cannot offload cleanly without thrashing the PCIe bus.


How to Use LTX-2: Cloud Tutorial (fal.ai)

Answer capsule. Sign up at fal.ai, generate an API key, and call the text-to-video or image-to-video endpoint with five required parameters: prompt, optional image URL, duration, resolution, and audio toggle. A 10-second 1080p Fast generation costs 0.40 USD and returns a downloadable MP4 in 30 to 90 seconds. Width and height must be divisible by 32. Frame counts must follow the formula (n times 8) plus 1.

LTX-2.3 cloud workflow on fal.ai four-step flow diagram showing account setup with API key generation, choose endpoint Fast Pro or Quality, configure parameters prompt image_url duration resolution audio toggle, execute and download returns MP4 URL in 30 to 90 seconds, with operational constraints frame count rule n times 8 plus 1 yielding 9 17 25 57 121 241, dimensions must be divisible by 32, 1080p is the floor no 720p tier exists, dark navy background with white and orange text

Step 1: Account setup

Navigate to fal.ai, complete account registration, and attach a billing method. New accounts receive a modest free credit pool sufficient for initial testing. Generate an API key from the developer dashboard and store it securely in environment variables, never in code.

Step 2: Choose the right endpoint

Three product tiers with distinct strategic uses:

  • Fast for rapid prototyping, B-roll iteration, and cost-sensitive bulk generation
  • Pro for client-deliverable quality with the I2V vs T2V cost tradeoff
  • Quality for LoRA fine-tuning, audio-to-video workflows, and per-megapixel billing on long-duration shorts

Step 3: Required parameters

{
  "prompt": "A middle-aged man speaks in a slow-paced voice, 'I remember that day.' He pauses and looks to the side, then continues, 'It changed everything.'",
  "image_url": "https://your-host.com/first-frame.png",
  "duration": 10,
  "resolution": "1080p",
  "aspect_ratio": "16:9",
  "generate_audio": true
}

Parameter notes:

  • prompt accepts up to 10,000 characters at the API level but the Gemma 3 encoder context throttles to roughly 1,000 tokens (~750 English words). Past that, content is silently dropped.
  • image_url is optional for text-to-video and required for image-to-video. Pinterest, Instagram, and Reddit URLs frequently fail because their hosts 403 the fal fetcher; upload via the fal.storage.upload() SDK call instead.
  • duration must produce a frame count matching the (n times 8) plus 1 formula. At 24 fps, valid durations include 1 second (24 frames rounds to 25), 2.5 seconds (60 rounds to 57), 5 seconds (120 rounds to 121), and 10 seconds (240 rounds to 241).
  • resolution accepts “1080p”, “1440p”, “2160p”. There is no “720p” tier on any LTX-2.3 fal.ai endpoint.
  • aspect_ratio accepts “16:9” or “9:16”. Portrait was added in LTX-2.3.
  • generate_audio boolean. Setting false does not currently reduce per-second cost.

Step 4: Common errors

ErrorCauseFix
HTTP 422 resolutionSpecified a tier not exposed (e.g., 720p)Use 1080p, 1440p, or 2160p
HTTP 422 frame countDuration does not match (n*8)+1 formulaRound to nearest valid frame count
HTTP 422 dimensionWidth or height not divisible by 32Use standard resolutions
HTTP 403 image fetchUpstream image host blocks fal fetcherUpload via fal.storage.upload()

How to Use LTX-2: Self-Host Tutorial (ComfyUI v0.16)

Answer capsule. Install ComfyUI v0.16, add the ComfyUI-LTXVideo custom node package from Lightricks, download the LTX-2.3 weights matching your VRAM tier (NVFP4 for RTX 5090, FP8 for RTX 4090, Q4_K GGUF for 16 GB cards), download the Gemma 3 12B text encoder, download the separate audio VAE (102 MB, not auto-fetched by installers), and launch with python -m main --reserve-vram 5 to prevent first-step crashes. The audio VAE omission is the single most common cause of “video has no sound” first-install failures.

LTX-2.3 ComfyUI v0.16 node graph architecture diagram showing data flow from text prompt to LTXAVTextEncoderLoader Gemma 3 12B IT to KSampler with 8-12 steps and CFG 1.0, parallel data flow to LTXVAudioVAELoader 102 MB separate download, merging into video plus audio combine node and final MP4 output at 1080p or 4K, with launch command python -m main --reserve-vram 5 callout to prevent first-step out-of-memory crash, dark navy background with white and orange text

Step 1: Install ComfyUI v0.16 and the LTX-2 node package

Update ComfyUI to v0.16 or later. Open ComfyUI Manager, search for “LTXVideo”, install the official Lightricks/ComfyUI-LTXVideo package, and restart. Windows users running ComfyUI Desktop must install Git from git-scm.com first; the Desktop build does not ship with Git and Manager-installed custom nodes will fail without it.

Step 2: Download the weights for your hardware

Match your quantization to your VRAM:

Hardware tierRecommended quantizationHugging Face download
RTX 6000 Ada (48 GB) or A100 80 GBbf16 fullLightricks/LTX-2.3
RTX 5090 (32 GB)NVFP4Lightricks/LTX-2.3-nvfp4
RTX 4090 (24 GB)FP8Lightricks/LTX-2.3-fp8
RTX 5080 (16 GB) or RTX 4080 SUPERNVFP4 + projection-only GemmaNVFP4 base + extracted text encoder
RTX 4070 Ti (16 GB)Q4_K GGUFunsloth/LTX-2.3-GGUF

Place main checkpoints in models/checkpoints/. Place the Gemma 3 12B text encoder in models/text_encoders/. Place the spatial and temporal latent upscalers (ltx-2.3-spatial-upscaler-x2-1.1 and ltx-2.3-temporal-upscaler-x2-1.0) in models/latent_upscale_models/.

Step 3: Download the audio VAE separately (load-bearing)

The audio VAE is a 102 MB file at huggingface.co/Lightricks/LTX-2/tree/main/audio that auto-installers and the popup template do not fetch automatically. Without it, video generates correctly but the MP4 plays silent with no error message. Per GitHub issue 316, this is the most common first-install failure. Place the file in models/vae/.

Step 4: Build the canonical node graph

Standard audio-visual LTX-2.3 ComfyUI workflow:

  1. Text prompt feeds LTXAVTextEncoderLoader (Gemma 3 instructions)
  2. Visual generation runs through standard KSampler with 8-12 steps and CFG 1.0 for distilled models
  3. Audio generation routes through LTXVAudioVAELoader which decodes 16 kHz mel-spectrogram latents into 24 kHz waveforms
  4. Outputs merge into a video-combined node multiplexing the .wav track into the .mp4 container

For dual-GPU setups (RTX 3090 plus RTX 4060 Ti is a common community pattern), swap standard loader nodes for the low_vram_loaders.py modules in the LTXVideo repository and use CheckPointLoaderSimpleDisTorch2MultiGPU to isolate the Gemma 3 text encoder on the secondary GPU.

Step 5: Launch with the safety buffer

python -m main --reserve-vram 5

The --reserve-vram 5 flag forces ComfyUI to maintain a 5 GB safety buffer, preventing the catastrophic out-of-memory crash that occurs during the massive memory spike on the first diffusion step. Without this flag, even 24 GB cards routinely crash on the initial generation per r/comfyui dual-GPU workflow threads.


LTX-2 VRAM and GPU Requirements

Answer capsule. The minimum honest spec for stable audio-synced LTX-2.3 generation at 1080p is 16 GB of VRAM on a Blackwell-architecture card running NVFP4 with a projection-only Gemma 3 extract. The widely-cited 12 GB minimum claim is wrong because it counts only video weights and silently omits the Gemma 3 12B text encoder, which adds 8.8 GB in FP4 or 28 GB at full precision. Full 20-second 4K generation requires 24 GB on an RTX 4090 or 32 GB on an RTX 5090 for headroom.

LTX-2.3 VRAM by quantization 2026 reference table showing bf16 full 44 GB peak 58 GB needs A100 80GB, FP8 22 GB peak 34 GB needs RTX 5090 32GB, NVFP4 11 GB peak 22 GB needs RTX 5090 or RTX 5080 video only, Q4_K GGUF 12-14 GB peak 24 GB needs RTX 4090 24GB, with Gemma 3 12B text encoder adding 8.8 GB FP4 or 28 GB full precision, dark navy background with white and orange text

VRAM by quantization tier

QuantizationFile sizePeak VRAM (inference)Peak VRAM with audioMinimum GPU
bf16 full~44 GB~50 GB+~58 GB+A100 80 GB or 2x RTX 6000 Ada
FP8 cast~22 GB~26 GB~34 GBRTX 5090 32 GB or A100
NVFP4~11 GB~14 GB~22 GBRTX 5090 (full audio) or RTX 5080 (video only)
Q4_K GGUF~12-14 GB~15 GB~24 GBRTX 4090 or RTX 6000 Ada

The Gemma 3 12B text encoder is the silent VRAM trap. Community benchmarks claiming “LTX-2 runs on a 12 GB GPU” omit the encoder entirely. With encoder loaded, the actual VRAM floor for any audio-synced output is 16 GB. NVFP4 quantization is only available on Blackwell-architecture cards (RTX 50-series), per NVIDIA RTX AI Garage CES 2026 coverage.

Generation time benchmarks (10-second 1080p with audio)

GPUQuantizationEstimated wall-clock
RTX 5090NVFP4~7-9 minutes
RTX 4090FP8~12-15 minutes
RTX 4080 SUPER (16 GB)NVFP4 video only~15-18 minutes (no audio)
A100 80 GBbf16~4-6 minutes
RTX 6000 Adabf16~6-8 minutes

Per the Zenn LTX-2.3 RTX 5090 benchmark, LTX-2.3 inference is between 5.7x and 14x faster than Wan 2.2 on equivalent hardware across prompt categories.

Self-host vs cloud break-even math

Amortizing an RTX 5090 (3,200 USD MSRP per GPU Poet pricing data) over a 36-month lifespan plus electricity yields an effective cost per finished second of approximately 0.020 USD on the Fast tier. Break-even versus fal.ai Fast at 0.04 USD per second:

  • RTX 4090 (2,000 USD used) breaks even at approximately 25 minutes of finished 1080p output per month, but the duty-cycle ceiling on audio-synced work is approximately 8 minutes per month due to Gemma encoder offload thrashing.
  • RTX 5090 (3,200 USD new) breaks even at approximately 37 minutes of finished output per month with no duty-cycle constraint.
  • Below those thresholds, fal.ai Fast wins on total cost.
  • Above approximately 60 minutes per month, RTX 5090 self-host wins decisively.

For high-volume operators (more than 2 hours of finished output per month), the break-even falls inside the first month versus Veo 3.1 Quality (0.40 USD per second) routing.


LTX-2 Prompt Guide

Answer capsule. LTX-2.3 prompts work best as single flowing paragraphs in present tense, written for the Gemma 3 12B text encoder which interprets long descriptive narrative better than terse keyword strings. Camera direction uses explicit cinematographer terminology (slow dolly in, handheld tracking, whip pan). Dialogue must be in quotation marks with physical acting beats between lines. Avoid bracket-style multi-shot timing markers like “SHOT 1 (0-2s):” because they render literally as on-screen text rather than triggering shot changes.

Structural rules

Write the prompt as a single paragraph in present tense. Match prompt length to video duration: a brief two-sentence prompt applied to a 20-second generation forces the model to hallucinate filler action and degrades temporal consistency. Wide establishing shots require extensive environmental description; extreme close-ups require precise facial morphology and skin-texture detail. The effective text encoder context is approximately 1,000 tokens (~750 English words) despite the 10,000-character API field; anything past that is silently dropped.

Camera control vocabulary

Use precise constraints to reduce unwanted wobble and jitter:

  • slow dolly in, dolly back
  • handheld tracking shot, static lock-off
  • whip pan right, slow pan left
  • low-angle push-in, crane shot, aerial pullback

In the negative prompting field, exclude unwanted behaviors: no Dutch angle, no rolling shutter wobble, no lens distortion. For deeper cinematic prompt grammar across all True Models, see the Cinematic AI Video Prompts 2026 pillar.

Dialogue and audio sync

Spoken dialogue must be enclosed in quotation marks with physical acting beats inserted between lines. Example from the LTX-2.3 Prompt Guide on the Lightricks blog:

A middle-aged man speaks in a slow-paced voice, “I remember that day.” He pauses and looks to the side, then continues, “It changed everything.”

Avoid abstract emotional labels like “the man is sad” and use physical cues instead: “his brow furrows, his voice cracks”. Physical cues drive both visual facial animation and audio vocoder tone.

For atmospheric audio, describe the soundscape directly in the prompt: soft ambient music, the sound of rain on pavement, distant coffee shop chatter.

Anti-pattern: bracket multi-shot timing

Prompts in the format SHOT 1 (0-2s): get rendered LITERALLY as captions on the video. Operators coming from Seedance experience (where this syntax works for shot blocking) will get burned. Use prose temporal connectors instead: A beat of silence, then she turns..., Three seconds pass before the camera lifts....

Multi-image and omni-reference conditioning

LTX-2.3 does not support multi-reference arrays equivalent to Seedance’s 12-file omni-reference. Character persistence is solved via three alternative paths:

  1. Start frame plus end frame chaining through the standard image_url and end_image_url parameters
  2. LoRA fine-tuning on the Quality tier for true character lock across shots
  3. LTXV Adain Latent ComfyUI node for extracting style characteristics from reference latents

LTX Studio additionally exposes the “Flux Kontext” tool which lets users pull the pose from Image A while adopting the lighting profile of Image B, per the LTX Studio multi-image references blog post.


LTX-2 vs Veo 3.1, Kling 3.0, Seedance 2.0, Wan 2.7, Happy Horse 1.0

Answer capsule. LTX-2.3 wins clearly on open weights, native 4K at 50 fps, longest duration ceiling (20 seconds), cheapest cloud per-second rate with audio, and the only capped IP indemnification path among non-Veo True Models. It loses clearly on dialog-grade audio (Veo 3.1’s 48 kHz dialogue beats LTX’s 24 kHz), multi-subject physics under load (Seedance 2.0 wins), rigid character lock across 6+ shot storyboards (Kling 3.0 wins), and raw blind-test quality (Happy Horse 1.0 leads on the Artificial Analysis Video Arena).

LTX-2.3 vs True Model video lineup 2026 head-to-head capability matrix infographic comparing LTX-2.3 (open weights, native 4K at 50fps, 20-second max, 24 kHz stereo audio, fal.ai 0.04 USD per second Fast, capped indemnification), Veo 3.1 (closed weights, 8s native extendable, 48 kHz dialogue audio, Vertex AI 0.40 USD per second, uncapped indemnification), Kling 3.0 (closed weights, 10s max, no audio, fal.ai 0.11 USD per second, no indemnification), Seedance 2.0 (closed weights, 15s max, dual-branch audio, fal.ai 0.30 USD per second, no indemnification), Wan 2.7 (open weights, 6-10s, post-composited audio, fal.ai 0.04-0.10 USD per second), Happy Horse 1.0 (closed weights, 5-10s max, native lip-sync audio, pricing pending), dark navy background with white and orange text

Capability matrix

CapabilityLTX-2.3Veo 3.1Kling 3.0Seedance 2.0Wan 2.7Happy Horse 1.0
Max duration20s8s native, extendable10s15s6-10s5-10s
Max resolutionNative 4K@50fps1080p1080p Pro720p native, 1080p upscale1080p1080p
Native audio24 kHz stereo48 kHz dialogueNoneDual-branchPost-compositedNative lip-sync
Character consistencyModerate (start/end frame)StrongStrong (rigid lock)Strong (12-file omni-ref)ModerateStrong
Multi-subject physicsModerateStrongModerateStrongModerateModerate
Open weightsYesNoNoNoYesNo
IP indemnification (primary cloud)Capped (lower of fees or 1M USD on direct API only)Yes (uncapped, Vertex AI)NoNoYes (Apache 2.0 weights)Pending
fal.ai cost per second 1080p (Fast/Pro)0.04 / 0.06-0.08 USD0.40 USD Quality~0.11 USD0.30 USD Standard0.04-0.10 USDPending

When LTX-2 is the right tool

  • B-roll arbitrage: 5-second 1080p clip costs 0.20 USD on Fast versus 2.00 USD on Veo 3.1 Quality, a 10x cost differential at viewing-distance-equivalent quality. For Pro I2V the differential is 6.7x; for Pro T2V it’s 5x; for Quality it’s 3.3x.
  • Ambient and establishing shots where no on-camera dialog is needed
  • Privacy-locked workflows for medical, legal, or defense work that cannot transmit data to external cloud servers (self-host on RTX 6000 Ada is the standard pattern)
  • UGC ad pipelines where character + dialog under 20 seconds is the deliverable shape
  • High-volume internal content at break-even volumes above 60 minutes finished output per month on RTX 5090 self-host

When LTX-2 is the wrong tool

  • Client hero shots requiring uncapped IP indemnification: Veo 3.1 via Google Vertex AI wins under Google Cloud Service Specific Terms Section 14 with no capped liability
  • Dialog-grade hero shots: Veo 3.1’s 48 kHz synchronized dialogue beats LTX-2.3’s 24 kHz stereo for broadcast-quality speech
  • Extended narrative sequences past 20 seconds: Veo 3.1’s chain-extension capability beats LTX-2.3’s hard 20-second ceiling
  • Rigid character lock across 6+ shot storyboards: Kling 3.0’s character consistency mechanism wins
  • Multi-subject combat or contact physics: Seedance 2.0 wins for fight choreography and intricate collision

For the broader True Model lineup framing, see the Best AI Video Tools 2026 Tech Stack pillar. For the deepest comparison data on the Seedance side, see Seedance vs Kling vs Veo and the Seedance vs Kling vs Sora 2 API comparison.


Use Cases and Operator Stories

Answer capsule. LTX-2.3 has fundamentally altered agency B-roll economics. The most prominent production use case is high-volume background generation routed away from Veo 3.1 Quality to LTX-2.3 Fast on fal.ai. A standard 60-second social campaign requiring six 10-second clips costs 2.40 USD on Fast or 3.60 USD on Pro image-to-video, versus 24.00 USD on Veo 3.1 Quality. Local privacy-locked workflows on RTX 6000 Ada hardware are the second major use case for enterprise operators handling film studio or defense contractor IP.

Five production use cases

  1. B-roll arbitrage: routing background plates, establishing shots, and ambient transition clips from Veo or Kling to LTX-2.3 Fast. The 10x cost cut at indistinguishable viewing-distance quality is the highest-leverage routing decision in the 2026 stack.

  2. Ambient and establishing scenes without dialogue: 5-10 second cinematic plates where audio quality is secondary. LTX-2.3 Fast at 1080p with generate_audio: false delivers usable output at 0.04 USD per second.

  3. UGC ad pipelines (15-30 seconds with character + dialog): LTX-2.3 hits all three requirements simultaneously (audio with passable dialog quality, character lock within a single ad via start/end frame chaining, duration above 10 seconds).

  4. Character reels via LoRA fine-tuning: the Quality tier with LoRA support enables true character persistence across multiple shots, at 0.0027075 USD per megapixel of generated video data. A 5-second 1080p character shot runs approximately 0.68 USD.

  5. Privacy-locked enterprise workflows: legal, medical, defense, and pre-release film studio clients cannot legally transmit proprietary prompt data or reference imagery to external APIs. Self-host on RTX 6000 Ada or A100 hardware in air-gapped environments is the standard solution.

Outreach: operator case studies wanted

AI Video Bootcamp is sourcing operator case studies for v2 of this guide. Operators running LTX-2.3 in production for client work, especially with documented monthly finished-output volume and a cost-per-second comparison versus prior cloud spend, are invited to reach out via the AI Video Bootcamp homepage contact form. Specific data points being collected: GPU hardware in use, monthly finished output volume, cost-per-second versus prior cloud spend, and one anonymized client deliverable example.


Community Pulse: What Operators Are Saying

Answer capsule. Sentiment across r/StableDiffusion, r/comfyui, and the broader operator community is polarized but predominantly positive. The most-cited praise focuses on the audio capabilities (bypassing secondary TTS tools) and the Gemma 3 text encoder’s prompt adherence. The most-cited frustrations focus on VRAM constraints, the gap between marketing claims (12 GB minimum) and operational reality (16 GB true minimum for audio sync), and the instability of ComfyUI-LTXVideo custom nodes during version updates.

The launch r/StableDiffusion thread for LTX-2 hit approximately 700 upvotes within 48 hours and remains the most-cited community reference. The LTX-2 Hugging Face model page has accumulated over 948,000 monthly downloads, the highest among open-weights video models in 2026.

Lightricks co-founder and CEO Zeev Farbman ran an AMA on r/StableDiffusion announcing the open-source release, stating: “We just open-sourced LTX-2, a production-ready audio-video AI model.” The thread sits at /r/StableDiffusion/comments/1q7dzq2 with deep technical Q&A.

Recurring undocumented prompting tricks from community sources:

  • Anchor sentence opener: starting the prompt with a high-detail first sentence anchors the diffusion process and improves overall coherence
  • Quantization stacking: combining the Q4_K_M GGUF LTX checkpoint with the FP8-scaled Gemma encoder and a rank175_fp8 distilled LoRA at CFG 4 and 20 steps delivers official-demo-quality output on consumer hardware (RTX 4090)
  • Prose temporal connectors: replacing bracket timing markers (SHOT 1 (0-2s):) with prose (A beat of silence, then...) eliminates the on-screen text rendering bug

Cloud-Managed vs Self-Host: Decision Tree

Answer capsule. Self-host LTX-2.3 if the operator has an RTX 5090 already, generates more than 37 minutes of finished output per month, AND can accept zero IP indemnification. Use fal.ai Fast for any cloud workflow under that volume threshold. Use Lightricks LTX Studio for non-technical creators. Use the sales-gated Lightricks direct API when capped IP indemnification is required.

Operator decision tree

  1. Do you generate client deliverables requiring uncapped IP indemnification?
    • Yes -> Route hero shots through Veo 3.1 on Vertex AI. Use LTX-2.3 only for B-roll and internal work.
    • No -> Continue.
  2. Do you generate more than 60 minutes of finished output per month?
    • Yes -> RTX 5090 self-host on NVFP4 wins on cost.
    • No -> Continue.
  3. Do you have an RTX 5090 already?
    • Yes -> Self-host on NVFP4 still wins for incremental output cost.
    • No -> fal.ai Fast at 0.04 USD per second is the default.
  4. Are you a non-technical creator?
    • Yes -> LTX Studio Standard at 28 USD per month with commercial license.
    • No -> fal.ai Fast or Pro tier on the developer dashboard.

Cloud host inventory

HostTarget userPricing modelIndemnification
fal.aiDeveloperPer second (Fast/Pro) or per megapixel (Quality)None
ReplicateDeveloperPer video-second on retake endpointNone
LTX StudioNon-technical creatorMonthly credit subscriptionNone for consumer plans
Lightricks direct APIEnterprise buyerSales-gated per-secondCapped at lower of fees or 1M USD
Krea AIDesignerSeat subscription bundleNone
WaveSpeed, MindStudio, ApateroNiche workflowsVariesNone

The AI Video Bootcamp team is building PromptWise, a managed-hosting platform for AI image, video, and sound generation launching soon. PromptWise will host LTX-2.3 alongside the broader True Models lineup as a curriculum-aligned cloud option for AVB members who want the LTX-2 cost advantages without ComfyUI self-host setup or per-vendor API key management across fal.ai, Replicate, and the Lightricks direct API. This section will be updated with pricing and signup details closer to launch.


Compliance and Commercial Use

Answer capsule. LTX-2.3 weights ship under the LTX-2 Community License Agreement, which is free for commercial use under 10 million USD annual revenue but is NOT Apache 2.0 despite frequent misattribution. IP indemnification is zero on self-host, fal.ai, Replicate, and LTX Studio consumer plans; only the sales-gated Lightricks direct API offers indemnification, capped at the lower of annual fees paid or 1 million USD. C2PA Content Credentials are NOT embedded by default in any access path, creating a load-bearing compliance gap for EU AI Act Article 50 enforcement starting August 2, 2026.

LTX-2.3 IP indemnification by access path infographic showing five paths with one offering capped indemnification and four offering zero, self-host Hugging Face weights ZERO operator bears full IP risk, fal.ai endpoints ZERO pass-through inference no defense, Replicate endpoints ZERO same pass-through structure as fal.ai, LTX Studio Free Lite Standard Pro ZERO consumer plans bare, Lightricks Direct API sales-gated CAPPED lower of annual fees or 1M USD, with recommended pairing for client work to route hero shots through Veo 3.1 on Vertex AI for uncapped indemnification and B-roll through LTX-2.3 for capped or none, dark navy background with white and orange text

The LTX-2 Community License (read carefully)

The license file at github.com/Lightricks/LTX-2/blob/main/LICENSE does not meet the Open Source Initiative definition of open source. Three defining operational constraints:

  1. Revenue cap: free commercial use only for entities generating less than 10 million USD in annual revenue across all subsidiaries and affiliates under common corporate control. Entities exceeding this threshold must negotiate a paid Commercial Use Agreement directly with Lightricks.
  2. Asymmetric indemnification: redistributors must defend Lightricks against claims arising from their use. Lightricks does not indemnify free-weight users in the reverse direction.
  3. Redistribution restrictions: external API exposure may be treated as redistribution depending on clause interpretation.

The third-party blog framing of LTX-2 as “Apache 2.0” or “fully open source” is incorrect.

Indemnification by access path

Access pathIP indemnification
Self-host from Hugging FaceNone. Operator bears all risk.
fal.ai endpointsNone disclosed. Pass-through inference.
Replicate endpointsNone disclosed.
LTX Studio consumer (Free, Lite, Standard, Pro)None.
LTX Studio EnterpriseCustom terms, [UNVERIFIED] without contract.
Lightricks direct API (sales-gated)Yes. Capped at lower of annual fees or 1M USD per the LTX 2 API License Agreement.

For the broader 2026 indemnification picture across the AI image market, see the AI Image Generators A-Z Encyclopedia. The three-paths-one-trap framing applies in modified form to LTX-2.

EU AI Act Article 50 (effective August 2, 2026)

Article 50 mandates machine-readable disclosure that content is AI-generated. Penalty ceiling: 15 million EUR or 3 percent global turnover. LTX-2.3 does not embed C2PA Content Credentials by default in any access path. Operators serving EU clients after August 2, 2026 must add C2PA manifests manually via c2patool or the Adobe CAI SDK. This is the single biggest operator-facing compliance gotcha. The AI Disclosure Compliance 2026 pillar documents the workflow.

California AB 853 (operative August 2, 2026)

AB 853 enforces both manifest (visible) disclosures and latent (embedded cryptographic) metadata on AI-generated video. Stripping embedded metadata during export constitutes a direct violation. The civil penalty is 5,000 USD per violation per day, enforced by the California Attorney General. Operators must ensure IPTC 2025.1 headers and C2PA signatures survive the final MP4 render through Adobe Premiere, DaVinci Resolve, or CapCut export.

Training data provenance

Lightricks public statements indicate LTX-2 was trained on licensed Getty Images and Shutterstock content. This is the structural IP-risk mitigant since contractual indemnification is bare on most paths. The licensed-data posture is meaningfully stronger than competitors trained on scraped public web data, but it is not a substitute for vendor IP indemnification.

Geographic data residency

fal.ai infrastructure is US-based. Lightricks direct API routes through their own infrastructure documented in the Lightricks Trust Center. EU client work requires a Transfer Impact Assessment under GDPR Article 46 when routing through US-based infrastructure.


FAQ

Is LTX-2 free?

LTX-2 weights are free to download from Hugging Face under the LTX-2 Community License Agreement. Commercial use is free for organizations generating under 10 million USD in annual revenue across all affiliates. Above that threshold, a paid commercial license must be negotiated directly with Lightricks. The license is not Apache 2.0 despite common misattribution.

How much does LTX-2 cost on fal.ai?

fal.ai hosts three LTX-2.3 product tiers. Fast text-to-video and image-to-video both cost 0.04 USD per second at 1080p. Pro image-to-video is 0.06 USD per second at 1080p and Pro text-to-video is 0.08 USD per second at 1080p. The separate Quality SKU bills per megapixel at 0.0024075 USD with LoRA fine-tuning at 0.0027075 USD per megapixel. All tiers include native audio at no extra cost.

What VRAM do I need to run LTX-2 locally?

The widely-cited 12 GB minimum claim is wrong; it counts only video weights and omits the Gemma 3 12B text encoder which adds 8.8 GB in FP4 or 28 GB at full precision. The actual minimum for stable audio-synced 1080p generation is 16 GB on a Blackwell card such as the RTX 5080. Full 20-second 4K generation requires 24 GB minimum.

Does LTX-2 have an API?

Yes. LTX-2.3 is accessible through fal.ai (recommended developer path), Replicate, the sales-gated Lightricks direct API, and partner platforms including WaveSpeed, MindStudio, Apatero, and Krea AI. The Lightricks direct API is the only access path with documented IP indemnification, capped at the lower of annual fees paid or 1 million USD.

Is LTX-2 open source?

Technically no. The LTX-2 Community License Agreement does not meet the Open Source Initiative definition because it includes a commercial-use revenue cap, redistribution restrictions, and asymmetric indemnification terms. The weights are “source-available” for academic and small-commercial use. True OSI-compliant open source in the 2026 video model lineup includes Wan 2.7 and parts of the Hunyuan family.


Last reviewed by Daniel Riley on June 5, 2026. For the broader 2026 AI video model landscape, see the Best AI Video Tools 2026 Tech Stack, the Seedance 2.0 Complete Guide, and the Cinematic AI Video Prompts 2026 pillar. For the open-weights trilogy comparison, see the Happy Horse 1.0 pillar. New to AI Video Bootcamp? Start with What is AI Video Bootcamp.

Last reviewed by Daniel Riley on · per our editorial standards.

Frequently Asked Questions

Is LTX-2 free?
LTX-2 weights are free to download from Hugging Face under the LTX-2 Community License Agreement. Commercial use is free for organizations generating under 10 million USD in annual revenue across all affiliates. Above that threshold, a paid commercial license must be negotiated directly with Lightricks. The license is NOT Apache 2.0 despite common misattribution; indemnification flows TO Lightricks (redistributors must defend Lightricks), not from Lightricks to users.
How much does LTX-2 cost on fal.ai?
fal.ai hosts three LTX-2.3 product tiers. Fast text-to-video and image-to-video both cost 0.04 USD per second at 1080p, 0.08 USD at 1440p, and 0.16 USD at 4K. Pro image-to-video is 0.06 USD per second at 1080p and Pro text-to-video is 0.08 USD per second at 1080p. The separate Quality SKU bills per megapixel at 0.0024075 USD with LoRA fine-tuning at 0.0027075 USD per megapixel. All tiers include native audio at no extra cost.
What VRAM do I need to run LTX-2 locally?
The widely-cited 12 GB minimum claim is wrong; it counts only video weights and omits the Gemma 3 12B text encoder which adds 8.8 GB in FP4 or 28 GB at full precision. The actual minimum for stable audio-synced 1080p generation is 16 GB on a Blackwell card such as the RTX 5080. Full 20-second 4K generation requires 24 GB minimum on an RTX 4090 or 32 GB on an RTX 5090 for headroom.
Does LTX-2 have an API?
Yes. LTX-2.3 is accessible through fal.ai (recommended developer path), Replicate (lightricks org), the sales-gated Lightricks direct API at console.ltx.video, and through partner platforms including WaveSpeed, MindStudio, Apatero, and Krea AI (added April 30, 2026). The Lightricks direct API is the only access path with documented IP indemnification, capped at the lower of annual fees paid or 1 million USD per the LTX 2 API License Agreement.
Is LTX-2 open source?
Technically no, despite frequent marketing claims. The LTX-2 Community License Agreement does not meet the Open Source Initiative definition because it includes a commercial-use revenue cap (10 million USD ARR), redistribution restrictions, and asymmetric indemnification terms. The weights are 'source-available' for academic and small-commercial use. True OSI-compliant open source under Apache 2.0 in the 2026 video model lineup includes Wan 2.7 and parts of the Hunyuan family.