Kokoro v1

Model Capabilities

Text-to-speech — 24 kHz mono output
Multilingual — American/British English, Spanish, French, Hindi, Italian, Japanese, Brazilian Portuguese, Mandarin Chinese
54 voices across 9 languages, naming convention <lang><gender>_<name> (e.g. af_heart, bm_george, jf_alpha)

The full description can be found at the original model page.

Getting Started

Run with NobodyWho (the model is fetched and cached on first use):

use nobodywho::tts::{Tts, TtsConfig};

let tts = Tts::new(TtsConfig::kokoro("NobodyWho/kokoro-v1"))?;
let wav = tts.synthesize("Hello from NobodyWho!")?;
std::fs::write("hello.wav", wav)?;

Benchmarks

Measured with nobodywho on Apple M4 Pro, CPU:

Input	Audio	Wallclock	Real-time factor
10 words	3.3s	~0.48s	6.9×
30 words	10.9s	~1.35s	8.1×
70 words	18.7s	~2.26s	8.3×

Credits

Original model and training by @hexgrad. Thanks!

Downloads last month: 33

Model tree for NobodyWho/Kokoro-82M

Base model

yl4579/StyleTTS2-LJSpeech

Finetuned

hexgrad/Kokoro-82M

Quantized

(39)

this model