ComfyUI HunyuanVideo Guide: Text-to-Video Generation Setup

How to set up and run Tencent's HunyuanVideo model in ComfyUI for text-to-video generation — model downloads, workflow setup, and optimization tips.

What is HunyuanVideo?

HunyuanVideo is Tencent's open-source video generation model. It produces high-quality videos from text descriptions with good motion coherence and visual fidelity. It's one of the first major video models to receive native ComfyUI support.

Hardware Requirements

Configuration	VRAM	Notes
BF16 (full precision)	24 GB+	Best quality
FP8 weight type	12–16 GB	Good quality with lower VRAM
Reduced resolution + FP8	8–12 GB	Usable but slower

Model Download

Download these files and place them in the corresponding ComfyUI folders:

Diffusion Model

File	Size	Location	Download
hunyuan_video_t2v_720p_bf16.safetensors	~25.6 GB	`models/diffusion_models/`	HuggingFace

Text Encoders

File	Size	Location	Download
clip_l.safetensors	~246 MB	`models/text_encoders/`	HuggingFace
llava_llama3_fp8_scaled.safetensors	~9 GB	`models/text_encoders/`	HuggingFace

VAE

File	Size	Location	Download
hunyuan_video_vae_bf16.safetensors	~493 MB	`models/vae/`	HuggingFace

Directory Structure

ComfyUI/
├── models/
│   ├── diffusion_models/
│   │   └── hunyuan_video_t2v_720p_bf16.safetensors
│   ├── text_encoders/
│   │   ├── clip_l.safetensors
│   │   └── llava_llama3_fp8_scaled.safetensors
│   └── vae/
│       └── hunyuan_video_vae_bf16.safetensors

Supported Resolutions

HunyuanVideo supports multiple aspect ratios:

Resolution	9:16	16:9	1:1
540p	544x960	960x544	720x720
720p (recommended)	720x1280	1280x720	960x960

Frame count is typically 73 or 129 frames.

Workflow Setup

Key Nodes

UNET Loader — loads hunyuan_video_t2v_720p_bf16.safetensors. Set weight type to default (or fp8 for lower VRAM)
DualCLIPLoader — loads both clip_l.safetensors and llava_llama3_fp8_scaled.safetensors, type set to hunyuan_video
VAE Loader — loads the VAE model
EmptyHunyuanLatentVideo — sets video dimensions and frame count
CLIP Text Encode — your video description prompt
FluxGuidance — controls prompt adherence (default: 6.0)
KSampler — sampler: euler, scheduler: simple, steps: 20–30
VAEDecodeTiled — decodes video (use tiled version for memory efficiency)
Save Animated WEBP — saves the output

Use VAEDecodeTiled instead of VAEDecode — it processes the video in tiles and uses significantly less memory. Set tile_size to 256 and overlap to 64. Reduce these values if you run into memory issues.

Running the Workflow

Verify all models are loaded in the correct nodes
Set video dimensions and frame count in EmptyHunyuanLatentVideo
Write a detailed prompt describing the scene, motion, and style
Click Run (Ctrl+Enter)

Prompt Tips

HunyuanVideo responds well to structured prompts:

[Subject], [Action], [Scene], [Style], [Quality]

Example:

a girl with long black hair wearing a white dress walking through
a field of sunflowers, golden hour lighting, cinematic composition,
high quality, detailed

Use detailed English descriptions. Include motion words (walking, running, flowing) for better animation.

Memory Optimization

If you're running into VRAM limits:

Switch to FP8 — In the UNET Loader, change weight type to fp8_e4m3fn
Reduce VAEDecodeTiled settings — Lower tile_size to 128, overlap to 32
Lower resolution and frame count — Use 540p instead of 720p, fewer frames
Close other applications — HunyuanVideo is memory-intensive

Common Issues and Fixes

Out of memory

Use FP8 weight type in UNET Loader
Reduce tile_size and overlap in VAEDecodeTiled
Use 540p resolution with 73 frames instead of 129

Slow generation

Video generation is inherently slow — a 5-second clip can take 10–30 minutes depending on hardware
Reduce steps to 20 and use fewer frames
Use lower resolution for test runs, then increase for final output

Poor video quality

Increase sampling steps to 25–30
Adjust FluxGuidance value (try 5.0–8.0)
Write more detailed, specific prompts
Try different samplers (euler_ancestral, dpmpp_2m)

EmptyHunyuanLatentVideo node not found

Update ComfyUI to the latest version — this node was added in a recent update

Wan Video Guide — Alibaba's video generation models
FramePack Guide — Low-VRAM video generation
Text to Image — Basic image generation concepts

ComfyUI HunyuanVideo Guide: Text-to-Video Generation Setup

ComfyUI 문제를 해결할 준비가 되셨나요?