Kokoro v1

Model Capabilities

  • Text-to-speech โ€” 24 kHz mono output
  • Multilingual โ€” American/British English, Spanish, French, Hindi, Italian, Japanese, Brazilian Portuguese, Mandarin Chinese
  • 54 voices across 9 languages, naming convention <lang><gender>_<name> (e.g. af_heart, bm_george, jf_alpha)

The full description can be found at the original model page.

Getting Started

Run with NobodyWho (the model is fetched and cached on first use):

use nobodywho::tts::{Tts, TtsConfig};

let tts = Tts::new(TtsConfig::kokoro("NobodyWho/kokoro-v1"))?;
let wav = tts.synthesize("Hello from NobodyWho!")?;
std::fs::write("hello.wav", wav)?;

Benchmarks

Measured with nobodywho on Apple M4 Pro, CPU:

Input Audio Wallclock Real-time factor
10 words 3.3s ~0.48s 6.9ร—
30 words 10.9s ~1.35s 8.1ร—
70 words 18.7s ~2.26s 8.3ร—

Credits

Original model and training by @hexgrad. Thanks!

Downloads last month
33
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for NobodyWho/Kokoro-82M

Quantized
(39)
this model