DC-TTS Geralt Voice Model

A Deep Convolutional Text-to-Speech (DC-TTS) model trained to synthesize speech in the voice of Geralt of Rivia from The Witcher series.

Model Description

This model is part of the Deepstory project, which combines Natural Language Generation, Text-to-Speech, and animation technologies to create interactive storytelling experiences.

The DC-TTS architecture is based on the paper:

Hideyuki Tachibana, Katsuya Uenoyama, Shunsuke Aihara. "Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention" (arXiv:1710.08969)

Model Architecture

This model consists of two components:

Text2Mel Network

Converts text input to mel-spectrograms.

Parameter Value
Embedding Dimension (e) 128
Hidden Unit Dimension (d) 512
Vocabulary PE abcdefghijklmnopqrstuvwxyz'.,!?
Max Characters (N) 259
Max Mel Frames (T) 326
Basic Block Type Gated Convolution
Normalization Layer Normalization
Dropout Rate 0.05

SSRN (Spectrogram Super-Resolution Network)

Upsamples mel-spectrograms to full spectrograms for audio synthesis.

Parameter Value
Hidden Unit Dimension (c) 640 (512 + 128)
Number of Mel Bins (f) 80
FFT Points 2048
Full Spectrogram Dimension 1025
Reduction Rate 4
Basic Block Type Residual
Normalization Weight Normalization
Weight Initialization Kaiming

Audio Parameters

Parameter Value
Sample Rate 22050 Hz
Frame Shift 0.0125s (12.5ms)
Frame Length 0.05s (50ms)
Hop Length 276 samples
Win Length 1102 samples
Power 1.5
Preemphasis 0.97
Max dB 100
Reference dB 20
Griffin-Lim Iterations 50

Files

  • t2m_step-102000_first.pth - Text2Mel model checkpoint
  • ssrn.pth - SSRN model checkpoint

Usage

import torch
from modules.dctts import Text2Mel, SSRN, hp, spectrogram2wav

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Load models
text2mel = Text2Mel(hp.vocab).to(device).eval()
text2mel.load_state_dict(torch.load('t2m_step-102000_first.pth', map_location=device)['state_dict'])

ssrn = SSRN().to(device).eval()
ssrn.load_state_dict(torch.load('ssrn.pth', map_location=device)['state_dict'])

# Synthesize speech
def synthesize(text, timeout=10000):
    normalized_text = normalize_text(text) + "E"  # E: EOS
    L = torch.from_numpy(np.array([[hp.char2idx[char] for char in normalized_text]], np.long)).to(device)
    zeros = torch.from_numpy(np.zeros((1, hp.n_mels, 1), np.float32)).to(device)
    Y = zeros
    
    with torch.no_grad():
        for i in range(timeout):
            _, Y_t, A = text2mel(L, Y, monotonic_attention=True)
            Y = torch.cat((zeros, Y_t), -1)
            _, attention = torch.max(A[0, :, -1], 0)
            if L[0, attention.item()] == hp.vocab.index('E'):
                break
        
        _, Z = ssrn(Y)
        Z = Z.cpu().numpy()
    
    wav = spectrogram2wav(Z[0, :, :].T)
    return wav

Training Data

The model was trained on audio samples of Geralt's voice from The Witcher 3: Wild Hunt video game.

Intended Use

This model is intended for:

  • Research and experimentation in speech synthesis
  • Creative projects and fan content
  • Educational purposes

Limitations

  • The model works best with English text
  • Vocabulary is limited to lowercase letters and basic punctuation
  • Audio quality may vary depending on input text complexity
  • The character voice is based on copyrighted material

Citation

If you use this model, please cite the original DC-TTS paper and the Deepstory project:

@article{tachibana2018efficiently,
  title={Efficiently trainable text-to-speech system based on deep convolutional networks with guided attention},
  author={Tachibana, Hideyuki and Uenoyama, Katsuya and Aihara, Shunsuke},
  journal={arXiv preprint arXiv:1710.08969},
  year={2018}
}

@misc{deepstory,
  author = {Siu King Wai},
  title = {Deepstory},
  year = {2020},
  publisher = {GitHub},
  url = {https://github.com/thetobysiu/deepstory}
}

License

This model is released under the MIT License. Please note that the voice characteristics are based on copyrighted material from The Witcher 3: Wild Hunt.

Acknowledgments

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support