ComfyUI SDXL Guide: Setup, Refiner Workflow & Best Practices

How to use Stable Diffusion XL (SDXL) in ComfyUI — model setup, base + refiner workflow, resolution tips, and ReVision for image-guided generation.

What is SDXL?

Stable Diffusion XL (SDXL) is Stability AI's high-resolution image generation model. Compared to SD 1.5, SDXL produces significantly better image quality, handles complex compositions well, and generates more coherent text in images.

Key differences from SD 1.5:

Aspect	SD 1.5	SDXL
Native resolution	512x512	1024x1024
Model size	~2 GB	~6.5 GB
Text quality	Poor	Good
Composition	Basic	Complex scenes handled well
Architecture	Single model	Base + optional Refiner

Setup

Model Download

Model	Purpose	Download
SDXL Base	Main generation model	HuggingFace
SDXL Refiner (optional)	Enhances detail in the final generation steps	HuggingFace

Place both in ComfyUI/models/checkpoints/.

Resolution

SDXL was trained at 1024x1024. Always use resolutions that total approximately 1 megapixel:

Aspect Ratio	Resolution
1:1	1024x1024
3:4	896x1152
16:9	1344x768
9:16	768x1344
21:9	1536x640

Using resolutions far from 1MP (like 512x512 or 2048x2048) will produce poor results. SDXL is trained at 1024x1024 — stay near that pixel count.

Basic Text-to-Image Workflow

The simplest SDXL workflow is identical to SD 1.5:

Load Checkpoint → SDXL Base model
CLIP Text Encode (positive) → your prompt
CLIP Text Encode (negative) → elements to avoid
Empty Latent Image → 1024x1024
KSampler → steps: 25–30, cfg: 6–8
VAE Decode → Save Image

Base + Refiner Workflow

SDXL's unique feature is the two-stage generation pipeline. The Base model handles the main composition and structure, then the Refiner model improves fine detail and texture quality in a second pass.

How It Works

The Base model generates for the first portion of sampling steps (e.g., 20 out of 25 total)
The Refiner takes over for the remaining steps, refining details without changing composition

Node Setup

Load Checkpoint (Base) → SDXL Base
Load Checkpoint (Refiner) → SDXL Refiner
CLIP Text Encode (x2) → connected to Base's CLIP
Empty Latent Image → 1024x1024
KSampler (Base) → steps: 25, end_at_step: 20
KSampler (Refiner) → steps: 25, start_at_step: 20, connects to Base's latent output
VAE Decode → Save Image (using Refiner's VAE)

You can give the Base and Refiner different prompts. For example, use a detailed composition prompt for the Base and a quality-focused prompt for the Refiner.

When to Skip the Refiner

The Refiner adds detail but also adds processing time and VRAM usage. Skip it when:

You're doing quick iterations or testing prompts
Your GPU has limited VRAM
The base output already looks good

Many community checkpoints fine-tuned from SDXL don't benefit from the refiner.

SDXL ReVision: Image-Guided Generation

ReVision lets you use images as conceptual input instead of (or alongside) text prompts. It extracts visual concepts from reference images and generates new images inspired by them.

Setup

Download the CLIP-G Vision model:

clip_vision_g.safetensors → place in ComfyUI/models/clip_vision/

Workflow

Load Image → your reference image(s)
CLIP Vision Encode → extracts visual features
unCLIP Conditioning → applies features as conditioning
KSampler → generates inspired by the reference

You can combine multiple reference images by chaining unCLIP Conditioning nodes. The strength parameter controls how strongly each image influences the output.

Common Issues and Fixes

Output quality is poor or blurry

Verify you're generating at ~1MP resolution (1024x1024 or equivalent)
512x512 will produce very poor results with SDXL

VRAM too low for Base + Refiner

Use the Base model alone — results are still good
Or use a community fine-tuned SDXL checkpoint that doesn't need a refiner

LoRA not working with SDXL

Make sure the LoRA is specifically made for SDXL — SD1.5 LoRAs are incompatible
SDXL LoRAs are labeled accordingly on Civitai and HuggingFace

Colors look washed out

Try a different VAE — some SDXL checkpoints benefit from an external VAE
The SDXL Refiner's VAE often produces better colors

Text to Image — Basic generation concepts
Flux Guide — Next-generation model with even better quality
SD 3.5 Guide — Latest Stability AI model
LoRA Guide — Fine-tuning with adapters

ComfyUI SDXL Guide: Setup, Refiner Workflow & Best Practices

ComfyUI sorunlarını çözmeye hazır mısınız?