markweber
/

taming_vqgan

Model card Files Files and versions

taming_vqgan / README.md

markweber's picture

Update README.md

345a906 verified about 1 year ago

|

history blame contribute delete

1.18 kB

	---
	license: mit
	datasets:
	- ILSVRC/imagenet-1k
	model-index:
	- name: Taming-VQGAN
	results:
	- task:
	type: image-generation
	dataset:
	name: ILSVRC/imagenet-1k
	type: ILSVRC/imagenet-1k
	metrics:
	- name: rFID
	type: rFID
	value: 7.96
	- name: InceptionScore
	type: InceptionScore
	value: 115.9
	- name: LPIPS
	type: LPIPS
	value: 0.306
	- name: PSNR
	type: PSNR
	value: 20.2
	- name: SSIM
	type: SSIM
	value: 0.52
	- name: CodebookUsage
	type: CodebookUsage
	value: 0.445
	---

	This model is the Taming VQGAN tokenizer with a vocabulary size of 10bits converted into a format for the MaskBit codebase. It uses a downsampling factor of 16 and is trained on ImageNet for images of resolution 256.

	You can find more details on the VQGAN in the original [repository](https://github.com/CompVis/taming-transformers) or [paper](https://arxiv.org/abs/2012.09841). All credits for this model belong to Patrick Esser, Robin Rombach and Björn Ommer.