Kokoro

Getting started
Model
Plan

Kokoro

Text-to-Speech

apache-2.0

Other

the terms and conditions of service use Privacy Policy Refund Policy Customer Center Business Information

English 한국어

GenDive, Inc.Representative ∣ Minhyeok HamBusiness Registration Number ∣ 449-87-02752

Personal Information Manager ∣ Junhyeok HamBusiness Registration Number ∣ 2025-Gwangju-Dong-0120

Email ∣ info@gendata.krTel ∣ 070-4895-5550

310, 3F, GenDive, 8th, Ace High-end Tower, 84, Gasan Digital 1-ro, Geumcheon-gu, Seoul

Kokoro

Text-to-Speech

apache-2.0

Other

Kokoro is an open-weight TTS model with 82 million parameters. Despite its lightweight architecture, it delivers comparable quality to larger models while being significantly faster and more cost-efficient.

Model Facts

Architecture:

StyleTTS 2: https://arxiv.org/abs/2306.07691
ISTFTNet: https://arxiv.org/abs/2203.02395
Decoder only: no diffusion, no encoder release
Architected by: Li et al @ https://github.com/yl4579/StyleTTS2
Trained by: @rzvzn on Discord
Languages: Multiple

Model SHA256 Hash: 496dba118d1a58f5f3db2efc88dbdc216e0483fc89fe6e47ee1f2c53f18ad1e4

Training Details

Data: Kokoro was trained exclusively on permissive/non-copyrighted audio data and IPA phoneme labels. Examples of permissive/non-copyrighted audio include:

Public domain audio
Audio licensed under Apache, MIT, etc
Synthetic audio[1] generated by closed[2] TTS models from large providers [1] https://copyright.gov/ai/ai_policy_guidance.pdf [2] No synthetic audio from open TTS models or "custom voice clones"

Total Dataset Size: A few hundred hours of audio