ComfyUI FramePack Guide: Generate Videos with Just 6GB VRAM

How to use FramePack in ComfyUI for low-VRAM video generation — setup, first-last frame workflows, and comparison of available custom nodes.

What is FramePack?

FramePack is a video generation technology developed by Dr. Lvmin Zhang's team (the creator of ControlNet) at Stanford University. Its breakthrough is lowering the VRAM requirement for video generation from 12+ GB to just 6 GB — making it accessible on consumer GPUs like the RTX 3060.

Key innovations:

Feature	Description
Dynamic context compression	Key frames retain full detail (1536 markers), transitional frames are compressed (192 markers)
Drift-resistant sampling	Bidirectional memory prevents image drift and maintains motion continuity
Low VRAM	Generates 60-second videos on 6 GB VRAM
First + last frame control	Define start and end images, FramePack generates the motion between them

Available ComfyUI Implementations

Three community plugins implement FramePack in ComfyUI:

Plugin	Author	First-Last Frame	Recommended
ComfyUI-FramePackWrapper	Kijai	Yes	Yes — repackaged models, best compatibility
ComfyUI_RH_FramePack	HM-RunningHub	Yes	No — uses original repo structure, larger disk usage
TTP_Comfyui_FramePack_SE	TTPlanetPig	Yes	No — fork of above, same limitations

We recommend the Kijai version. It uses repackaged model files compatible with other ComfyUI workflows, and has the most reliable compatibility.

Setup: Kijai FramePackWrapper

1. Install Required Plugins

Install these four custom nodes via ComfyUI Manager:

ComfyUI-FramePackWrapper — may require Git install via Manager
ComfyUI-KJNodes
ComfyUI-VideoHelperSuite
ComfyUI_essentials

2. Download Models

Diffusion Model (choose one):

File	Precision	Size	VRAM	Download
FramePackI2V_HY_fp8_e4m3fn.safetensors	FP8	16.3 GB	Lower	HuggingFace
FramePackI2V_HY_bf16.safetensors	BF16	25.7 GB	Higher	HuggingFace

Other required models:

File	Location	Download
clip_l.safetensors	`models/text_encoders/`	HuggingFace
llava_llama3_fp16.safetensors	`models/text_encoders/`	HuggingFace
sigclip_vision_patch14_384.safetensors	`models/clip_vision/`	HuggingFace
hunyuan_video_vae_bf16.safetensors	`models/vae/`	HuggingFace

3. File Structure

ComfyUI/
├── models/
│   ├── diffusion_models/
│   │   └── FramePackI2V_HY_fp8_e4m3fn.safetensors
│   ├── text_encoders/
│   │   ├── clip_l.safetensors
│   │   └── llava_llama3_fp16.safetensors
│   ├── clip_vision/
│   │   └── sigclip_vision_patch14_384.safetensors
│   └── vae/
│       └── hunyuan_video_vae_bf16.safetensors

Running the Workflow

First-Last Frame Video Generation

Load FramePackModel → select your diffusion model
DualCLIPLoader → load clip_l.safetensors and llava_llama3_fp16.safetensors
Load CLIP Vision → load sigclip_vision_patch14_384.safetensors
Load VAE → load hunyuan_video_vae_bf16.safetensors
CLIP Text Encoder → describe the motion you want
Load Image (first frame) → your starting image
Load Image (last frame) → your ending image (optional — bypass if not needed)
FramePackSampler → set total_second_length (e.g., 5 seconds)
Click Run (Ctrl+Enter)

If you only want image-to-video without a last frame, bypass (Ctrl+B) the last frame input node and its connected processing nodes.

Writing Motion Prompts

FramePack works best with motion-focused prompts. The FramePack creator provides a useful pattern:

Describe subject first, then motion, then environment.

Good examples:

The girl dances gracefully, with clear movements, full of charm
A cat jumps from the table to the floor, landing softly
Camera slowly pans across a mountain landscape at sunset

Tip: Prefer dynamic motions (dancing, jumping, running) over static ones (standing, sitting) for more impressive results.

Common Issues and Fixes

Update your ComfyUI frontend to version 1.16.9 or later
This is a known frontend bug that affects FramePack workflows

Out of memory

Use the FP8 model variant (16.3 GB vs 25.7 GB)
Reduce total_second_length to 2–3 seconds
Close other applications

Video has inconsistent motion or visual drift

FramePack's bidirectional sampling should minimize drift, but very long videos (30+ seconds) may still show some
Use first-last frame mode to anchor the start and end states
Break long videos into shorter segments

Plugins not found in ComfyUI Manager

ComfyUI-FramePackWrapper may not be registered in the Manager — install it via Git URL in the Manager's "Install via Git" option

HunyuanVideo Guide — Full HunyuanVideo T2V setup
Wan Video Guide — Alternative video generation models
Install Custom Nodes — How to install plugins