What is it?

#1
by qpqpqpqpqpqp - opened

Pruned Chroma.1?

Is chroma based on Z-image possible?

Is chroma based on Z-image possible?

Oops, it seems it's another model. Also, do you remember Lumina2 "Chroma"? lodestone(s) stopped updating it

Z-image?

No?, it says:
RuntimeError: Error(s) in loading state_dict for NextDiT:
size mismatch for x_embedder.weight: copying a param with shape torch.Size([3840, 128]) from checkpoint, the shape in current model is torch.Size([3840, 64]).
tried to load in fp8
Newbie 0.1?

it's a modified z-image with flux 2 vae and some slight arch changes and custom loss
this model is not ready yet it's just started training literally yesterday.

Am I dreaming? A new year, a new surprise?

I have a question: why not use the z-Image-De-Turbo model as the base model?

This is exciting, but @loadstones, will LoRAs made for Chroma1-HD still work on this, or will I have to retrain all of my loras? I'm pretty sure my chroma loras don't work with z-image

"I'm pretty sure my chroma loras don't work with z-image" Yes, don't work, of course

@loadstones by the way, why did you decide not to wait for the base version of z-image? It seemed to me that the distilled version is less flexible to fine tuning.

@Yndear this is not fine tuning, this is closer to pretraining and instead of random init, im using z-image as the "initial seed"

the arch is different, it uses DeCo head and flux 2 vae, the loss is different it's an fm-x0 loss instead of fm-velocity

this arch + loss function combo has huge benefit of ridiculously fast convergence. But even with that it still costly to pretrain one from scratch.
it's better to have some residual knowledge as initial seed than starting from blank slate.

@Yndear this is not fine tuning, this is closer to pretraining and instead of random init, im using z-image as the "initial seed"

the arch is different, it uses DeCo head and flux 2 vae, the loss is different it's an fm-x0 loss instead of fm-velocity

this arch + loss function combo has huge benefit of ridiculously fast convergence. But even with that it still costly to pretrain one from scratch.
it's better to have some residual knowledge as initial seed than starting from blank slate.

Is the text encoder still using Qwen3-4B, or will it be replaced with a larger-parameter text encoder?

still qwen

is there any workflow for comfyui that works with this new mode? also the same for the wip radiance model, the comfyui workflow for it is very old and in wondering if there is a newer one that works better?

Sign up or log in to comment