SDXL ControlNet - Brightness Control
This is a ControlNet model trained on Stable Diffusion XL to control image generation through brightness/grayscale information. It enables precise control over lighting, tonal values, and overall brightness structure in generated images.
Model Description
This ControlNet allows you to condition SDXL image generation using grayscale/brightness maps. By providing a grayscale image as input, you can control the brightness distribution and lighting structure of the generated image while maintaining creative freedom through text prompts.
Key features:
- ๐จ Control brightness and lighting in generated images
- ๐ผ๏ธ Works with SDXL at 512x512 resolution
- ๐ Compatible with standard SDXL pipelines
- ๐ก Trained on high-quality aesthetic images
Intended uses:
- Image recoloring and colorization
- Lighting control in text-to-image generation
- Artistic image manipulation
- Photo enhancement and stylization
Training Details
Training Data
The model was trained on 100,000 samples from the latentcat/grayscale_image_aesthetic_3M dataset, which consists of:
- High-quality aesthetic images
- Paired with their grayscale/brightness versions
- Images resized to 512x512 resolution
Training Configuration
- Base Model:
stabilityai/stable-diffusion-xl-base-1.0 - ControlNet Architecture: SDXL-based (initialized from UNet)
- Training Resolution: 512x512
- Training Steps: 782 (1 epoch)
- Batch Size: 32 per device
- Gradient Accumulation Steps: 4 (effective batch size: 128)
- Learning Rate: 1e-5
- Mixed Precision: FP16
- Hardware: NVIDIA H100 80GB
- Training Time: ~49 minutes
- Optimizer: 8-bit Adam
Training Script
Training was performed using the diffusers ControlNet training script with the following key parameters:
accelerate launch --mixed_precision="fp16" train_controlnet_sdxl.py \
--pretrained_model_name_or_path="stabilityai/stable-diffusion-xl-base-1.0" \
--dataset_name="<local_dataset_path>" \
--conditioning_image_column="conditioning_image" \
--image_column="image" \
--caption_column="text" \
--mixed_precision="fp16" \
--resolution=512 \
--learning_rate=1e-5 \
--train_batch_size=32 \
--gradient_accumulation_steps=4 \
--num_train_epochs=1 \
--checkpointing_steps=390 \
--enable_xformers_memory_efficient_attention \
--use_8bit_adam
Usage
Installation
pip install diffusers transformers accelerate torch
Basic Usage
from diffusers import StableDiffusionXLControlNetPipeline, ControlNetModel
import torch
from PIL import Image
# Load ControlNet
controlnet = ControlNetModel.from_pretrained(
"Oysiyl/controlnet-brightness-sdxl-100k",
torch_dtype=torch.float16
)
# Load SDXL pipeline
pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
controlnet=controlnet,
torch_dtype=torch.float16
)
pipe.enable_xformers_memory_efficient_attention()
pipe.to("cuda")
# Load your grayscale/brightness control image
control_image = Image.open("path/to/grayscale_image.png")
# Generate image
prompt = "a beautiful landscape, highly detailed, vibrant colors"
negative_prompt = "blurry, low quality, distorted"
image = pipe(
prompt=prompt,
negative_prompt=negative_prompt,
image=control_image,
num_inference_steps=30,
controlnet_conditioning_scale=0.8, # Adjust between 0.0-2.0
guidance_scale=7.5,
).images[0]
image.save("output.png")
ControlNet Conditioning Scale
The controlnet_conditioning_scale parameter controls how strongly the brightness map influences the generation:
- 0.3-0.5: Weak control, more creative freedom
- 0.6-0.8: Balanced control (recommended)
- 0.9-1.2: Strong control, closely follows brightness structure
- 1.3-2.0: Very strong control, minimal deviation
Limitations and Bias
Current Limitations
โ ๏ธ Important: This model (trained for 1 epoch) has limited capability in preserving sharp geometric patterns:
- โ Not recommended for QR codes - The model does not preserve QR code structure reliably
- โ Limited geometric pattern preservation - Sharp edges and precise patterns may not be maintained
- โ ๏ธ Natural images only - Best suited for natural scenes, landscapes, portraits, etc.
For geometric patterns and QR codes, consider using:
- Canny ControlNet (edge detection)
- Tile ControlNet (pattern coherence)
- Purpose-built QR ControlNet models
Known Issues
Training Duration: This model was trained for only 1 epoch. The original SD 1.5 brightness ControlNet was trained for 2 epochs, which may explain the difference in geometric pattern preservation.
Dataset Focus: Trained exclusively on natural aesthetic images, which may limit performance on synthetic/geometric content.
Bias
The model inherits biases from:
- The SDXL base model
- The grayscale_image_aesthetic_3M dataset (aesthetic-focused images)
Available Checkpoints
This repository includes multiple checkpoints from the training run:
- Root directory: Final model (step 782, end of epoch 1)
- checkpoint-390: Mid-training checkpoint (50% completion)
- checkpoint-780: Near-final checkpoint (99% completion)
Planned Improvements
๐ Version 2 (Planned): Training for 2 epochs to match the original SD 1.5 brightness ControlNet methodology, which should improve:
- Geometric pattern preservation
- Sharp edge retention
- Overall structural coherence
Citation
If you use this model, please cite:
@misc{controlnet-brightness-sdxl-100k,
author = {Oysiyl},
title = {SDXL ControlNet - Brightness Control},
year = {2025},
publisher = {HuggingFace},
journal = {HuggingFace Model Hub},
howpublished = {\url{https://huggingface.co/Oysiyl/controlnet-brightness-sdxl-100k}}
}
Acknowledgments
- Training methodology inspired by latentcat's SD 1.5 brightness ControlNet
- Built with ๐ค Diffusers
- Base model: Stable Diffusion XL by Stability AI
- Dataset: grayscale_image_aesthetic_3M by latentcat
License
This model is released under the Apache 2.0 License. See the LICENSE file for details.
The base SDXL model has its own license terms which you should review at stabilityai/stable-diffusion-xl-base-1.0.
- Downloads last month
- 17
Model tree for Oysiyl/controlnet-brightness-sdxl-100k
Base model
stabilityai/stable-diffusion-xl-base-1.0