---
pipeline_tag: text-to-image
inference: false
library_name: tensorrt
license: other
license_name: stabilityai-ai-community
license_link: LICENSE.md
tags:
- tensorrt
- sd3.5-medium
- text-to-image
- onnx
extra_gated_prompt: >-
  By clicking "Agree", you agree to the [License
  Agreement](https://huggingface.co/stabilityai/stable-diffusion-3.5-large/blob/main/LICENSE.md)
  and acknowledge Stability AI's [Privacy
  Policy](https://stability.ai/privacy-policy).
extra_gated_fields:
  Name: text
  Email: text
  Country: country
  Organization or Affiliation: text
  Receive email updates and promotions on Stability AI products, services, and research?:
    type: select
    options:
      - 'Yes'
      - 'No'
  What do you intend to use the model for?:
    type: select
    options:
      - Research
      - Personal use
      - Creative Professional
      - Startup
      - Enterprise
  I agree to the License Agreement and acknowledge Stability AI's Privacy Policy: checkbox
language:
- en
---

# Stable Diffusion 3.5 Medium TensorRT
## Introduction

This repository hosts the **TensorRT-optimized version** of **Stable Diffusion 3.5 Medium**, developed in collaboration between [Stability AI](https://stability.ai) and [NVIDIA](https://huggingface.co/nvidia). This implementation leverages NVIDIA's TensorRT deep learning inference library to deliver significant performance improvements while maintaining the exceptional image quality of the original model.

Stable Diffusion 3.5 Medium is a Multimodal Diffusion Transformer (MMDiT) text-to-image model that features improved performance in image quality, typography, complex prompt understanding, and resource-efficiency. The TensorRT optimization makes these capabilities accessible for production deployment and real-time applications.

## Model Details

### Model Description
This repository holds the ONNX exports of the T5, MMDiT and VAE models in BF16 precision.


## Performance using TensorRT 10.13
#### Timings for 30 steps at 1024x1024

| Accelerator | Precision | CLIP-G     | CLIP-L       | T5           | MMDiT x 30            | VAE Decoder         | Total                  |
|-------------|-----------|------------|--------------|--------------|-----------------------|---------------------|------------------------|
| H100        | BF16      | 16.52 ms   | 6.83 ms      | 8.46 ms      | 2358.34 ms            | 72.58 ms            | 2496.63 ms             |


## Usage Example
1. Follow the [setup instructions](https://github.com/NVIDIA/TensorRT/blob/release/sd35/demo/Diffusion/README.md) on launching a TensorRT NGC container.
```shell
git clone https://github.com/NVIDIA/TensorRT.git
cd TensorRT
git checkout release/sd35
docker run --rm -it --gpus all -v $PWD:/workspace nvcr.io/nvidia/pytorch:25.01-py3 /bin/bash
```


2. Install libraries and requirements
```shell
cd demo/Diffusion
python3 -m pip install --upgrade pip
pip3 install -r requirements.txt
python3 -m pip install --pre --upgrade --extra-index-url https://pypi.nvidia.com tensorrt-cu12
```

3. Generate HuggingFace user access token
To download model checkpoints for the Stable Diffusion 3.5 checkpoints, please request access on the [Stable Diffusion 3.5 Medium](https://huggingface.co/stabilityai/stable-diffusion-3.5-medium) page. 
You will then need to obtain a `read` access token to HuggingFace Hub and export as shown below. See [instructions](https://huggingface.co/docs/hub/security-tokens).

```bash
export HF_TOKEN=<your access token>
```

4. Perform TensorRT optimized inference:

  - **Stable Diffusion 3.5 Medium in BF16 precision**
        
    ```
    python3 demo_txt2img_sd35.py \
      "a beautiful photograph of Mt. Fuji during cherry blossom" \
      --version=3.5-medium \
      --bf16 \
      --download-onnx-models \
      --denoising-steps=30 \
      --guidance-scale 3.5 \
      --build-static-batch \
      --use-cuda-graph \
      --hf-token=$HF_TOKEN
    ```