ComfyUI HunyuanVideo Guide: Text-to-Video Generation Setup
How to set up and run Tencent's HunyuanVideo model in ComfyUI for text-to-video generation — model downloads, workflow setup, and optimization tips.
What is HunyuanVideo?
HunyuanVideo is Tencent's open-source video generation model. It produces high-quality videos from text descriptions with good motion coherence and visual fidelity. It's one of the first major video models to receive native ComfyUI support.
Hardware Requirements
| Configuration | VRAM | Notes |
|---|---|---|
| BF16 (full precision) | 24 GB+ | Best quality |
| FP8 weight type | 12–16 GB | Good quality with lower VRAM |
| Reduced resolution + FP8 | 8–12 GB | Usable but slower |
Model Download
Download these files and place them in the corresponding ComfyUI folders:
Diffusion Model
| File | Size | Location | Download |
|---|---|---|---|
| hunyuan_video_t2v_720p_bf16.safetensors | ~25.6 GB | models/diffusion_models/ | HuggingFace |
Text Encoders
| File | Size | Location | Download |
|---|---|---|---|
| clip_l.safetensors | ~246 MB | models/text_encoders/ | HuggingFace |
| llava_llama3_fp8_scaled.safetensors | ~9 GB | models/text_encoders/ | HuggingFace |
VAE
| File | Size | Location | Download |
|---|---|---|---|
| hunyuan_video_vae_bf16.safetensors | ~493 MB | models/vae/ | HuggingFace |
Directory Structure
ComfyUI/
├── models/
│ ├── diffusion_models/
│ │ └── hunyuan_video_t2v_720p_bf16.safetensors
│ ├── text_encoders/
│ │ ├── clip_l.safetensors
│ │ └── llava_llama3_fp8_scaled.safetensors
│ └── vae/
│ └── hunyuan_video_vae_bf16.safetensorsSupported Resolutions
HunyuanVideo supports multiple aspect ratios:
| Resolution | 9:16 | 16:9 | 1:1 |
|---|---|---|---|
| 540p | 544x960 | 960x544 | 720x720 |
| 720p (recommended) | 720x1280 | 1280x720 | 960x960 |
Frame count is typically 73 or 129 frames.
Workflow Setup
Key Nodes
- UNET Loader — loads
hunyuan_video_t2v_720p_bf16.safetensors. Set weight type todefault(orfp8for lower VRAM) - DualCLIPLoader — loads both
clip_l.safetensorsandllava_llama3_fp8_scaled.safetensors, type set tohunyuan_video - VAE Loader — loads the VAE model
- EmptyHunyuanLatentVideo — sets video dimensions and frame count
- CLIP Text Encode — your video description prompt
- FluxGuidance — controls prompt adherence (default: 6.0)
- KSampler — sampler:
euler, scheduler:simple, steps: 20–30 - VAEDecodeTiled — decodes video (use tiled version for memory efficiency)
- Save Animated WEBP — saves the output
Use VAEDecodeTiled instead of VAEDecode — it processes the video in tiles and uses significantly less memory. Set tile_size to 256 and overlap to 64. Reduce these values if you run into memory issues.
Running the Workflow
- Verify all models are loaded in the correct nodes
- Set video dimensions and frame count in EmptyHunyuanLatentVideo
- Write a detailed prompt describing the scene, motion, and style
- Click Run (
Ctrl+Enter)
Prompt Tips
HunyuanVideo responds well to structured prompts:
[Subject], [Action], [Scene], [Style], [Quality]Example:
a girl with long black hair wearing a white dress walking through
a field of sunflowers, golden hour lighting, cinematic composition,
high quality, detailedUse detailed English descriptions. Include motion words (walking, running, flowing) for better animation.
Memory Optimization
If you're running into VRAM limits:
- Switch to FP8 — In the UNET Loader, change weight type to
fp8_e4m3fn - Reduce VAEDecodeTiled settings — Lower tile_size to 128, overlap to 32
- Lower resolution and frame count — Use 540p instead of 720p, fewer frames
- Close other applications — HunyuanVideo is memory-intensive
Common Issues and Fixes
Out of memory
- Use FP8 weight type in UNET Loader
- Reduce tile_size and overlap in VAEDecodeTiled
- Use 540p resolution with 73 frames instead of 129
Slow generation
- Video generation is inherently slow — a 5-second clip can take 10–30 minutes depending on hardware
- Reduce steps to 20 and use fewer frames
- Use lower resolution for test runs, then increase for final output
Poor video quality
- Increase sampling steps to 25–30
- Adjust FluxGuidance value (try 5.0–8.0)
- Write more detailed, specific prompts
- Try different samplers (euler_ancestral, dpmpp_2m)
EmptyHunyuanLatentVideo node not found
- Update ComfyUI to the latest version — this node was added in a recent update
Related Guides
- Wan Video Guide — Alibaba's video generation models
- FramePack Guide — Low-VRAM video generation
- Text to Image — Basic image generation concepts
ComfyUI Wan Video Guide: Text-to-Video & Image-to-Video Generation
Complete guide to generating AI videos with Wan 2.1 and Wan 2.2 models in ComfyUI — model downloads, T2V and I2V workflows, and VRAM options.
ComfyUI FramePack Guide: Generate Videos with Just 6GB VRAM
How to use FramePack in ComfyUI for low-VRAM video generation — setup, first-last frame workflows, and comparison of available custom nodes.
Wonderful Launcher 문서