Kokoro v1
Model Capabilities
- Text-to-speech โ 24 kHz mono output
- Multilingual โ American/British English, Spanish, French, Hindi, Italian, Japanese, Brazilian Portuguese, Mandarin Chinese
- 54 voices across 9 languages, naming convention
<lang><gender>_<name>(e.g.af_heart,bm_george,jf_alpha)
The full description can be found at the original model page.
Getting Started
Run with NobodyWho (the model is fetched and cached on first use):
use nobodywho::tts::{Tts, TtsConfig};
let tts = Tts::new(TtsConfig::kokoro("NobodyWho/kokoro-v1"))?;
let wav = tts.synthesize("Hello from NobodyWho!")?;
std::fs::write("hello.wav", wav)?;
Benchmarks
Measured with nobodywho on Apple M4 Pro, CPU:
| Input | Audio | Wallclock | Real-time factor |
|---|---|---|---|
| 10 words | 3.3s | ~0.48s | 6.9ร |
| 30 words | 10.9s | ~1.35s | 8.1ร |
| 70 words | 18.7s | ~2.26s | 8.3ร |
Credits
Original model and training by @hexgrad. Thanks!
- Downloads last month
- 33