--- pipeline_tag: text-to-image inference: false library_name: tensorrt license: other license_name: stabilityai-ai-community license_link: LICENSE.md tags: - tensorrt - sd3.5-medium - text-to-image - onnx extra_gated_prompt: >- By clicking "Agree", you agree to the [License Agreement](https://huggingface.co/stabilityai/stable-diffusion-3.5-large/blob/main/LICENSE.md) and acknowledge Stability AI's [Privacy Policy](https://stability.ai/privacy-policy). extra_gated_fields: Name: text Email: text Country: country Organization or Affiliation: text Receive email updates and promotions on Stability AI products, services, and research?: type: select options: - 'Yes' - 'No' What do you intend to use the model for?: type: select options: - Research - Personal use - Creative Professional - Startup - Enterprise I agree to the License Agreement and acknowledge Stability AI's Privacy Policy: checkbox language: - en --- # Stable Diffusion 3.5 Medium TensorRT ## Introduction This repository hosts the **TensorRT-optimized version** of **Stable Diffusion 3.5 Medium**, developed in collaboration between [Stability AI](https://stability.ai) and [NVIDIA](https://huggingface.co/nvidia). This implementation leverages NVIDIA's TensorRT deep learning inference library to deliver significant performance improvements while maintaining the exceptional image quality of the original model. Stable Diffusion 3.5 Medium is a Multimodal Diffusion Transformer (MMDiT) text-to-image model that features improved performance in image quality, typography, complex prompt understanding, and resource-efficiency. The TensorRT optimization makes these capabilities accessible for production deployment and real-time applications. ## Model Details ### Model Description This repository holds the ONNX exports of the T5, MMDiT and VAE models in BF16 precision. ## Performance using TensorRT 10.13 #### Timings for 30 steps at 1024x1024 | Accelerator | Precision | CLIP-G | CLIP-L | T5 | MMDiT x 30 | VAE Decoder | Total | |-------------|-----------|------------|--------------|--------------|-----------------------|---------------------|------------------------| | H100 | BF16 | 16.52 ms | 6.83 ms | 8.46 ms | 2358.34 ms | 72.58 ms | 2496.63 ms | ## Usage Example 1. Follow the [setup instructions](https://github.com/NVIDIA/TensorRT/blob/release/sd35/demo/Diffusion/README.md) on launching a TensorRT NGC container. ```shell git clone https://github.com/NVIDIA/TensorRT.git cd TensorRT git checkout release/sd35 docker run --rm -it --gpus all -v $PWD:/workspace nvcr.io/nvidia/pytorch:25.01-py3 /bin/bash ``` 2. Install libraries and requirements ```shell cd demo/Diffusion python3 -m pip install --upgrade pip pip3 install -r requirements.txt python3 -m pip install --pre --upgrade --extra-index-url https://pypi.nvidia.com tensorrt-cu12 ``` 3. Generate HuggingFace user access token To download model checkpoints for the Stable Diffusion 3.5 checkpoints, please request access on the [Stable Diffusion 3.5 Medium](https://huggingface.co/stabilityai/stable-diffusion-3.5-medium) page. You will then need to obtain a `read` access token to HuggingFace Hub and export as shown below. See [instructions](https://huggingface.co/docs/hub/security-tokens). ```bash export HF_TOKEN= ``` 4. Perform TensorRT optimized inference: - **Stable Diffusion 3.5 Medium in BF16 precision** ``` python3 demo_txt2img_sd35.py \ "a beautiful photograph of Mt. Fuji during cherry blossom" \ --version=3.5-medium \ --bf16 \ --download-onnx-models \ --denoising-steps=30 \ --guidance-scale 3.5 \ --build-static-batch \ --use-cuda-graph \ --hf-token=$HF_TOKEN ```