Update 19/02/26: uploaded v2 with some layers kept at BF16 and longer calibration (more steps, adjusted learning rate, etc). Clearly better than the first version I did, doesn't get "fuzzy" on edges or hair.


NVFP4 version of Qwen-Image-2512 and Qwen2.5-VL-7B-Instruct, both created from BF16 versions in Comfy-Org/Qwen repo. Made with Silveroxides/convert_to_quant script. Drop-in replacement in Comfy, there's a bit of quality loss as in nunchaku models or others nvfp4.

Conversion command: convert_to_quant -i qwen_image_2512_bf16.safetensors -o qwen-image2512-nvfp4.safetensors --nvfp4 --comfy_quant --qwen

⚠️ I strongly suspect a bottleneck somewhere (memory bandwidth? dequant ops?) that throttles the GPU down in some cases, running small batches instead of single image seems to remove it. I only have surface knowledge of this stuff and don't know how to troubleshoot it.

Batch of 4 Qwen-Image-2512-fp8 Qwen-Image-2512-nvfp4
Vanilla 194s (45.6s per image) 76s (18.3s per image)
Lightning 20.4s (5s per image) 7.5s (1.75s per image)

(wall time, on a 5090. Vanilla = 50 steps & CFG 4, Lightning = 8 steps & CFG 1 with Lightning lora)

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Hippotes/Qwen-Image-2512-nvfp4

Finetuned
(25)
this model