Instructions to use Qwen/Qwen-Image-Edit-2509 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use Qwen/Qwen-Image-Edit-2509 with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline from diffusers.utils import load_image # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("Qwen/Qwen-Image-Edit-2509", dtype=torch.bfloat16, device_map="cuda") prompt = "Turn this cat into a dog" input_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png") image = pipe(image=input_image, prompt=prompt).images[0] - Inference
- Notebooks
- Google Colab
- Kaggle
24GB run this model
#6
by zhiTTime - opened
Here is the complete code from community of Qwen-Image_edit.I modified it slightly
import torch
from transformers import BitsAndBytesConfig as TransformersBitsAndBytesConfig
from transformers import Qwen2_5_VLForConditionalGeneration
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig
from diffusers import QwenImageEditPipeline, QwenImageTransformer2DModel,QwenImageEditPlusPipeline
from diffusers.utils import load_image
model_id = "Qwen/Qwen-Image-Edit-2509"
torch_dtype = torch.bfloat16
device = "cuda"
quantization_config = DiffusersBitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
llm_int8_skip_modules=["transformer_blocks.0.img_mod"],
)
transformer = QwenImageTransformer2DModel.from_pretrained(
model_id,
subfolder="transformer",
quantization_config=quantization_config,
torch_dtype=torch_dtype,
)
transformer = transformer.to("cpu")
quantization_config = TransformersBitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
)
text_encoder = Qwen2_5_VLForConditionalGeneration.from_pretrained(
model_id,
subfolder="text_encoder",
quantization_config=quantization_config,
torch_dtype=torch_dtype,
)
text_encoder = text_encoder.to("cpu")
pipe = QwenImageEditPlusPipeline.from_pretrained(
model_id, transformer=transformer, text_encoder=text_encoder, torch_dtype=torch_dtype
)
#optionally load LoRA weights to speed up inference
pipe.load_lora_weights("lightx2v/Qwen-Image-Lightning", weight_name="Qwen-Image-Lightning-8steps-V1.1.safetensors")
pipe.load_lora_weights(
"lightx2v/Qwen-Image-Lightning", weight_name="Qwen-Image-Lightning-4steps-V1.0-bf16.safetensors"
)
pipe.enable_model_cpu_offload()
generator = torch.Generator(device="cuda").manual_seed(21)
image = load_image(
"/path/to/your/image.jpg"
).convert("RGB")
prompt = "your prompt here"
# change steps to 8 or 4 if you used the lighting loras
image = pipe(image, prompt, num_inference_steps=8).images[0]
image.save("qwenimageedit.png")
zhiTTime changed discussion status to closed
zhiTTime changed discussion status to open
I'm pretty sure I got torchao going. let me run some more test and then will paste code in. Just want to make sure is all. Running test to mirror what they display in the README.
You have any examples of the bnb quantized outputs? @zhiTTime