E2/F5 TTS
This is an unofficial E2/F5 TTS demo. This demo supports the following TTS models:
- E2-TTS (Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS)
- F5-TTS (A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching)
This demo is based on the F5-TTS codebase, which is based on an unofficial E2-TTS implementation.
The checkpoints support English and Chinese.
If you're having issues, try converting your reference audio to WAV or MP3, clipping it to 15s, and shortening your prompt. If you're still running into issues, please open a community Discussion.
The model is licensed under the CC-BY-NC license, this demo cannot be used for commercial purposes.
NOTE: Reference text will be automatically transcribed with Whisper if not provided. For best results, keep your reference clips short (<15s). Ensure the audio is fully uploaded before generating.
The model tends to produce silences, especially on longer audio. We can manually remove silences if needed. Note that this is an experimental feature and may produce strange results. This will also increase generation time.
Run Locally
Run this demo locally on CPU, CUDA, or MPS/Apple Silicon (requires macOS >= 14):
First, ensure ffmpeg is installed.
git clone https://huggingface.co/spaces/mrfakename/E2-F5-TTS
cd E2-F5-TTS
python -m pip install -r requirements.txt
python app_local.py
Unofficial demo by mrfakename