Kokoro

텍스트 음성 변환

apache-2.0

기타

Kokoro is an open-weight TTS model with 82 million parameters. Despite its lightweight architecture, it delivers comparable quality to larger models while being significantly faster and more cost-efficient.

Model Facts

Architecture:

StyleTTS 2: https://arxiv.org/abs/2306.07691
ISTFTNet: https://arxiv.org/abs/2203.02395
Decoder only: no diffusion, no encoder release
Architected by: Li et al @ https://github.com/yl4579/StyleTTS2
Trained by: @rzvzn on Discord
Languages: Multiple

Model SHA256 Hash: 496dba118d1a58f5f3db2efc88dbdc216e0483fc89fe6e47ee1f2c53f18ad1e4

Training Details

Data: Kokoro was trained exclusively on permissive/non-copyrighted audio data and IPA phoneme labels. Examples of permissive/non-copyrighted audio include:

Public domain audio
Audio licensed under Apache, MIT, etc
Synthetic audio[1] generated by closed[2] TTS models from large providers [1] https://copyright.gov/ai/ai_policy_guidance.pdf [2] No synthetic audio from open TTS models or "custom voice clones"

Total Dataset Size: A few hundred hours of audio