DC-TTS Geralt Voice Model

A Deep Convolutional Text-to-Speech (DC-TTS) model trained to synthesize speech in the voice of Geralt of Rivia from The Witcher series.

Model Description

This model is part of the Deepstory project, which combines Natural Language Generation, Text-to-Speech, and animation technologies to create interactive storytelling experiences.

The DC-TTS architecture is based on the paper:

Hideyuki Tachibana, Katsuya Uenoyama, Shunsuke Aihara. "Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention" (arXiv:1710.08969)

Model Architecture

This model consists of two components:

Text2Mel Network

Converts text input to mel-spectrograms.

Parameter	Value
Embedding Dimension (e)	128
Hidden Unit Dimension (d)	512
Vocabulary	`PE abcdefghijklmnopqrstuvwxyz'.,!?`
Max Characters (N)	259
Max Mel Frames (T)	326
Basic Block Type	Gated Convolution
Normalization	Layer Normalization
Dropout Rate	0.05

SSRN (Spectrogram Super-Resolution Network)

Upsamples mel-spectrograms to full spectrograms for audio synthesis.

Parameter	Value
Hidden Unit Dimension (c)	640 (512 + 128)
Number of Mel Bins (f)	80
FFT Points	2048
Full Spectrogram Dimension	1025
Reduction Rate	4
Basic Block Type	Residual
Normalization	Weight Normalization
Weight Initialization	Kaiming

Audio Parameters

Parameter	Value
Sample Rate	22050 Hz
Frame Shift	0.0125s (12.5ms)
Frame Length	0.05s (50ms)
Hop Length	276 samples
Win Length	1102 samples
Power	1.5
Preemphasis	0.97
Max dB	100
Reference dB	20
Griffin-Lim Iterations	50

Files

t2m_step-102000_first.pth - Text2Mel model checkpoint
ssrn.pth - SSRN model checkpoint

Usage

import torch
from modules.dctts import Text2Mel, SSRN, hp, spectrogram2wav

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Load models
text2mel = Text2Mel(hp.vocab).to(device).eval()
text2mel.load_state_dict(torch.load('t2m_step-102000_first.pth', map_location=device)['state_dict'])

ssrn = SSRN().to(device).eval()
ssrn.load_state_dict(torch.load('ssrn.pth', map_location=device)['state_dict'])

# Synthesize speech
def synthesize(text, timeout=10000):
    normalized_text = normalize_text(text) + "E"  # E: EOS
    L = torch.from_numpy(np.array([[hp.char2idx[char] for char in normalized_text]], np.long)).to(device)
    zeros = torch.from_numpy(np.zeros((1, hp.n_mels, 1), np.float32)).to(device)
    Y = zeros
    
    with torch.no_grad():
        for i in range(timeout):
            _, Y_t, A = text2mel(L, Y, monotonic_attention=True)
            Y = torch.cat((zeros, Y_t), -1)
            _, attention = torch.max(A[0, :, -1], 0)
            if L[0, attention.item()] == hp.vocab.index('E'):
                break
        
        _, Z = ssrn(Y)
        Z = Z.cpu().numpy()
    
    wav = spectrogram2wav(Z[0, :, :].T)
    return wav

Training Data

The model was trained on audio samples of Geralt's voice from The Witcher 3: Wild Hunt video game.

Intended Use

This model is intended for:

Research and experimentation in speech synthesis
Creative projects and fan content
Educational purposes

Limitations

The model works best with English text
Vocabulary is limited to lowercase letters and basic punctuation
Audio quality may vary depending on input text complexity
The character voice is based on copyrighted material

Citation

If you use this model, please cite the original DC-TTS paper and the Deepstory project:

@article{tachibana2018efficiently,
  title={Efficiently trainable text-to-speech system based on deep convolutional networks with guided attention},
  author={Tachibana, Hideyuki and Uenoyama, Katsuya and Aihara, Shunsuke},
  journal={arXiv preprint arXiv:1710.08969},
  year={2018}
}

@misc{deepstory,
  author = {Siu King Wai},
  title = {Deepstory},
  year = {2020},
  publisher = {GitHub},
  url = {https://github.com/thetobysiu/deepstory}
}

License

This model is released under the MIT License. Please note that the voice characteristics are based on copyrighted material from The Witcher 3: Wild Hunt.

Acknowledgments

Original DC-TTS implementation: tugstugi/pytorch-dc-tts
The Witcher 3: Wild Hunt by CD Projekt Red

Downloads last month: -; Downloads are not tracked for this model. How to track