| license: mit | |
| datasets: | |
| - ILSVRC/imagenet-1k | |
| model-index: | |
| - name: Taming-VQGAN | |
| results: | |
| - task: | |
| type: image-generation | |
| dataset: | |
| name: ILSVRC/imagenet-1k | |
| type: ILSVRC/imagenet-1k | |
| metrics: | |
| - name: rFID | |
| type: rFID | |
| value: 7.96 | |
| - name: InceptionScore | |
| type: InceptionScore | |
| value: 115.9 | |
| - name: LPIPS | |
| type: LPIPS | |
| value: 0.306 | |
| - name: PSNR | |
| type: PSNR | |
| value: 20.2 | |
| - name: SSIM | |
| type: SSIM | |
| value: 0.52 | |
| - name: CodebookUsage | |
| type: CodebookUsage | |
| value: 0.445 | |
| This model is the Taming VQGAN tokenizer with a vocabulary size of 10bits converted into a format for the MaskBit codebase. It uses a downsampling factor of 16 and is trained on ImageNet for images of resolution 256. | |
| You can find more details on the VQGAN in the original [repository](https://github.com/CompVis/taming-transformers) or [paper](https://arxiv.org/abs/2012.09841). All credits for this model belong to Patrick Esser, Robin Rombach and Björn Ommer. |