ComfyUI Wan Video Guide: Text-to-Video & Image-to-Video Generation

Complete guide to generating AI videos with Wan 2.1 and Wan 2.2 models in ComfyUI — model downloads, T2V and I2V workflows, and VRAM options.

What is Wan?

Wan is an open-source video generation model family from Alibaba, licensed under Apache 2.0 (commercial use allowed). It covers text-to-video (T2V) and image-to-video (I2V) generation with two main releases:

Version	Release	Key Feature
Wan 2.1	Feb 2025	Solid baseline, 14B and 1.3B parameter versions
Wan 2.2	Mid 2025	MoE architecture, film-level aesthetics, 5B hybrid model, first-last frame generation

Hardware Requirements

Model	VRAM	Notes
Wan 2.2 5B (Hybrid)	8 GB+	Best entry point — supports both T2V and I2V

If you're new to video generation, start with the Wan 2.2 5B model. It handles both text-to-video and image-to-video in a single model and works on 8 GB VRAM with ComfyUI's native offloading.

Wan 2.2: Recommended Starting Point

Wan 2.2 introduces a MoE (Mixture of Experts) architecture with separate high-noise and low-noise expert models for better quality. The 5B hybrid model is ideal for beginners — it handles both T2V and I2V in a single model.

Wan 2.2 5B Setup (Easiest)

Models (place in corresponding folders):

File	Location	Download
wan2.2_ti2v_5B_fp16.safetensors	`models/diffusion_models/`	HuggingFace
wan2.2_vae.safetensors	`models/vae/`	HuggingFace
umt5_xxl_fp8_e4m3fn_scaled.safetensors	`models/text_encoders/`	HuggingFace

Workflow: Update ComfyUI to the latest version, then go to Workflows → Browse Templates → Video and select "Wan2.2 5B video generation".

Steps:

Load the diffusion model, text encoder, and VAE in the corresponding nodes
Write a video description in the CLIP Text Encoder node
(Optional) Load an image for I2V mode — enable the Load Image node with Ctrl+B
Adjust frame count via the length parameter
Click Run (Ctrl+Enter)

Wan 2.2 14B T2V Setup

For higher quality text-to-video, the 14B version uses two diffusion models (high-noise and low-noise experts):

File	Location	Download
wan2.2_t2v_high_noise_14B_fp8_scaled.safetensors	`models/diffusion_models/`	HuggingFace
wan2.2_t2v_low_noise_14B_fp8_scaled.safetensors	`models/diffusion_models/`	HuggingFace
wan_2.1_vae.safetensors	`models/vae/`	HuggingFace
umt5_xxl_fp8_e4m3fn_scaled.safetensors	`models/text_encoders/`	Same as above

Wan 2.2 14B I2V Setup

For image-to-video, download the I2V-specific diffusion models:

File	Download
wan2.2_i2v_high_noise_14B_fp16.safetensors	HuggingFace
wan2.2_i2v_low_noise_14B_fp16.safetensors	HuggingFace

Wan 2.2 First-Last Frame Video

A unique mode that generates a video transitioning from a start frame to an end frame. Uses the same I2V models — load two images as first and last frames, and ComfyUI interpolates the motion between them.

Wan 2.1 Setup (Alternative)

Wan 2.1 remains a solid option with broader community tooling (Kijai wrapper, GGUF versions).

ComfyUI Native T2V

File	Location	Download
wan2.1_t2v_14B_fp8_e4m3fn.safetensors	`models/diffusion_models/`	HuggingFace
umt5_xxl_fp8_e4m3fn_scaled.safetensors	`models/text_encoders/`	HuggingFace
wan_2.1_vae.safetensors	`models/vae/`	HuggingFace

For I2V, also download:

I2V diffusion model: 480p or 720p
CLIP Vision: clip_vision_h.safetensors (place in models/clip_vision/)

GGUF Version (Low VRAM)

Requires the ComfyUI-GGUF plugin.

File	Download
T2V GGUF models	city96/Wan2.1-T2V-14B-gguf
I2V GGUF models	city96/Wan2.1-I2V-14B-720P-gguf

T2V and I2V use separate diffusion models. Make sure you download the correct one for your workflow — they are not interchangeable.

Use the Wan 2.2 5B model (works on 8 GB VRAM)
Use FP8 or GGUF quantized models
Reduce resolution (480p instead of 720p)
Reduce frame count

Video has visual drift or inconsistent motion

Wan 2.2's MoE architecture significantly reduces drift compared to 2.1
Write more specific motion descriptions in your prompt
Try first-last frame mode for controlled transitions

T2V model specified but I2V model needed (or vice versa)

T2V and I2V use separate diffusion models — make sure you download the correct one
I2V workflows also require a CLIP Vision model that T2V does not

Verify files are in the correct folder (diffusion_models/, not checkpoints/)
Restart ComfyUI after adding new model files

HunyuanVideo Guide — Tencent's video generation model
FramePack Guide — Low-VRAM video generation with FramePack
Text to Image — Basic ComfyUI image generation