Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
| title: xVASynth TTS | |
| emoji: π§ββοΈπ§ββοΈπ§ββοΈ | |
| colorFrom: gray | |
| colorTo: gray | |
| sdk: gradio | |
| python_version: 3.9 | |
| sdk_version: 4.20.0 | |
| models: | |
| - Pendrokar/xvapitch_nvidia | |
| - Pendrokar/xvapitch_expresso | |
| - Pendrokar/TorchMoji | |
| - Pendrokar/xvasynth_lojban | |
| - Pendrokar/xvasynth_cabal | |
| app_file: app.py | |
| app_port: 7860 | |
| tags: | |
| - tts | |
| - t2s | |
| - sts | |
| - s2s | |
| pinned: true | |
| preload_from_hub: | |
| - Pendrokar/xvapitch_nvidia | |
| - Pendrokar/xvapitch_expresso | |
| - Pendrokar/TorchMoji | |
| - Pendrokar/xvasynth_lojban | |
| - Pendrokar/xvasynth_cabal | |
| license: gpl-3.0 | |
| thumbnail: https://huggingface.co/spaces/Pendrokar/xVASynth/raw/main/thumbnail.png | |
| short_description: CPU powered, low RTF, emotional, multilingual TTS | |
| DanRuta's xVASynth, GitHub repo: [https://github.com/DanRuta/xVA-Synth](https://github.com/DanRuta/xVA-Synth) | |
| Papers: | |
| - VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech - https://arxiv.org/abs/2106.06103 | |
| - YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for Everyone - https://arxiv.org/abs/2112.02418 | |
| Referenced papers within code: | |
| - Multi-head attention with Relative Positional embedding - https://arxiv.org/pdf/1809.04281.pdf | |
| - Transformer with Relative Potional Encoding- https://arxiv.org/abs/1803.02155 | |
| - SDP - https://arxiv.org/pdf/2106.06103.pdf | |
| - Spline Flow - https://arxiv.org/abs/1906.04032 | |
| Extra: | |
| - DeepMoji - https://arxiv.org/abs/1708.00524 | |
| xVA FastPitch: | |
| - [1] [FastPitch: Parallel Text-to-speech with Pitch Prediction](https://arxiv.org/abs/2006.06873) | |
| - [2] [One TTS Alignment To Rule Them All](https://arxiv.org/abs/2108.10447) | |
| Used datasets: Unknown/Non-permissiable data |