ComfyUI SDXL Guide: Setup, Refiner Workflow & Best Practices
How to use Stable Diffusion XL (SDXL) in ComfyUI — model setup, base + refiner workflow, resolution tips, and ReVision for image-guided generation.
What is SDXL?
Stable Diffusion XL (SDXL) is Stability AI's high-resolution image generation model. Compared to SD 1.5, SDXL produces significantly better image quality, handles complex compositions well, and generates more coherent text in images.
Key differences from SD 1.5:
| Aspect | SD 1.5 | SDXL |
|---|---|---|
| Native resolution | 512x512 | 1024x1024 |
| Model size | ~2 GB | ~6.5 GB |
| Text quality | Poor | Good |
| Composition | Basic | Complex scenes handled well |
| Architecture | Single model | Base + optional Refiner |
Setup
Model Download
| Model | Purpose | Download |
|---|---|---|
| SDXL Base | Main generation model | HuggingFace |
| SDXL Refiner (optional) | Enhances detail in the final generation steps | HuggingFace |
Place both in ComfyUI/models/checkpoints/.
Resolution
SDXL was trained at 1024x1024. Always use resolutions that total approximately 1 megapixel:
| Aspect Ratio | Resolution |
|---|---|
| 1:1 | 1024x1024 |
| 3:4 | 896x1152 |
| 16:9 | 1344x768 |
| 9:16 | 768x1344 |
| 21:9 | 1536x640 |
Using resolutions far from 1MP (like 512x512 or 2048x2048) will produce poor results. SDXL is trained at 1024x1024 — stay near that pixel count.
Basic Text-to-Image Workflow
The simplest SDXL workflow is identical to SD 1.5:
- Load Checkpoint → SDXL Base model
- CLIP Text Encode (positive) → your prompt
- CLIP Text Encode (negative) → elements to avoid
- Empty Latent Image → 1024x1024
- KSampler → steps: 25–30, cfg: 6–8
- VAE Decode → Save Image
Base + Refiner Workflow
SDXL's unique feature is the two-stage generation pipeline. The Base model handles the main composition and structure, then the Refiner model improves fine detail and texture quality in a second pass.
How It Works
- The Base model generates for the first portion of sampling steps (e.g., 20 out of 25 total)
- The Refiner takes over for the remaining steps, refining details without changing composition
Node Setup
- Load Checkpoint (Base) → SDXL Base
- Load Checkpoint (Refiner) → SDXL Refiner
- CLIP Text Encode (x2) → connected to Base's CLIP
- Empty Latent Image → 1024x1024
- KSampler (Base) → steps: 25, end_at_step: 20
- KSampler (Refiner) → steps: 25, start_at_step: 20, connects to Base's latent output
- VAE Decode → Save Image (using Refiner's VAE)
You can give the Base and Refiner different prompts. For example, use a detailed composition prompt for the Base and a quality-focused prompt for the Refiner.
When to Skip the Refiner
The Refiner adds detail but also adds processing time and VRAM usage. Skip it when:
- You're doing quick iterations or testing prompts
- Your GPU has limited VRAM
- The base output already looks good
Many community checkpoints fine-tuned from SDXL don't benefit from the refiner.
SDXL ReVision: Image-Guided Generation
ReVision lets you use images as conceptual input instead of (or alongside) text prompts. It extracts visual concepts from reference images and generates new images inspired by them.
Setup
Download the CLIP-G Vision model:
- clip_vision_g.safetensors → place in
ComfyUI/models/clip_vision/
Workflow
- Load Image → your reference image(s)
- CLIP Vision Encode → extracts visual features
- unCLIP Conditioning → applies features as conditioning
- KSampler → generates inspired by the reference
You can combine multiple reference images by chaining unCLIP Conditioning nodes. The strength parameter controls how strongly each image influences the output.
Common Issues and Fixes
Output quality is poor or blurry
- Verify you're generating at ~1MP resolution (1024x1024 or equivalent)
- 512x512 will produce very poor results with SDXL
VRAM too low for Base + Refiner
- Use the Base model alone — results are still good
- Or use a community fine-tuned SDXL checkpoint that doesn't need a refiner
LoRA not working with SDXL
- Make sure the LoRA is specifically made for SDXL — SD1.5 LoRAs are incompatible
- SDXL LoRAs are labeled accordingly on Civitai and HuggingFace
Colors look washed out
- Try a different VAE — some SDXL checkpoints benefit from an external VAE
- The SDXL Refiner's VAE often produces better colors
Related Guides
- Text to Image — Basic generation concepts
- Flux Guide — Next-generation model with even better quality
- SD 3.5 Guide — Latest Stability AI model
- LoRA Guide — Fine-tuning with adapters
ComfyUI Flux Kontext Guide: AI Image Editing with Text Instructions
How to use FLUX.1 Kontext in ComfyUI for context-aware image editing — change objects, transfer styles, edit text in images, and maintain character consistency.
ComfyUI Stable Diffusion 3.5 Guide: Setup and Workflows
How to run Stable Diffusion 3.5 in ComfyUI — model versions, download links, FP16 and FP8 workflows, and tips for best results.
Wonderful Launcher Dokümanları