amphion/Emilia-Dataset
Viewer • Updated • 54.8M • 44.4k • 459
VoXtream2 is a zero-shot full-stream TTS model with dynamic speaking-rate control that can be updated mid-utterance on the fly.
python voxtream/run.py \
--prompt-audio assets/audio/english_male.wav \
--text "In general, however, some method is then needed to evaluate each approximation." \
--output "output_stream.wav"
python voxtream/run.py \
--prompt-audio assets/audio/english_female.wav \
--text "Staff do not always do enough to prevent violence." \
--output "full_stream_2sps.wav" \
--full-stream \
--spk-rate 2.0
The model was trained on Emilia and HiFiTTS2 datasets.
Any organization or individual is prohibited from using any technology mentioned in this paper to generate someone's speech without his/her consent, including but not limited to government leaders, political figures, and celebrities. If you do not comply with this item, you could be in violation of copyright laws.