• Studio
  • Model
  • Plan
  • Console
  • Sign in
  • Sign up
the terms and conditions of service use Privacy PolicyRefund PolicyCustomer CenterBusiness Information
English한국어
GenDive, Inc.Representative ∣ Minhyeok HamBusiness Registration Number ∣ 449-87-02752
Personal Information Manager ∣ Junhyeok HamBusiness Registration Number ∣ 2025-Gwangju-Dong-0120
Email ∣ info@gendata.krTel ∣ 070-4895-5550

310, 3F, GenDive, 8th, Ace High-end Tower, 84, Gasan Digital 1-ro, Geumcheon-gu, Seoul

© Dev Dive, All rights reserved.

Kokoro

Text-to-Speech
apache-2.0
chipOther

Kokoro is an open-weight TTS model with 82 million parameters. Despite its lightweight architecture, it delivers comparable quality to larger models while being significantly faster and more cost-efficient.

Model Facts

Architecture:

  • StyleTTS 2: https://arxiv.org/abs/2306.07691

  • ISTFTNet: https://arxiv.org/abs/2203.02395

  • Decoder only: no diffusion, no encoder release

  • Architected by: Li et al @ https://github.com/yl4579/StyleTTS2

  • Trained by: @rzvzn on Discord

  • Languages: Multiple

Model SHA256 Hash: 496dba118d1a58f5f3db2efc88dbdc216e0483fc89fe6e47ee1f2c53f18ad1e4

Training Details

Data: Kokoro was trained exclusively on permissive/non-copyrighted audio data and IPA phoneme labels. Examples of permissive/non-copyrighted audio include:

  • Public domain audio
  • Audio licensed under Apache, MIT, etc
  • Synthetic audio[1] generated by closed[2] TTS models from large providers [1] https://copyright.gov/ai/ai_policy_guidance.pdf [2] No synthetic audio from open TTS models or "custom voice clones"

Total Dataset Size: A few hundred hours of audio