• 스튜디오
  • 모델
  • 요금제
  • 콘솔
  • 로그인
  • 회원가입
서비스이용약관개인정보처리방침환불 정책고객센터사업자정보
English한국어
(주)젠다이브대표이사 ∣ 함민혁사업자번호 ∣ 449-87-02752
개인정보책임자 ∣ 함준혁통신판매업신고번호 ∣ 2025-광주동구-0120
Email ∣ info@gendata.krTel ∣ 070-4895-5550

서울 금천구 가산디지털1로 84, 에이스하이엔드타워8차 3층 310호 젠다이브 기업부설연구소

© Dev Dive, All rights reserved.

Kokoro

텍스트 음성 변환
apache-2.0
chip기타

Kokoro is an open-weight TTS model with 82 million parameters. Despite its lightweight architecture, it delivers comparable quality to larger models while being significantly faster and more cost-efficient.

Model Facts

Architecture:

  • StyleTTS 2: https://arxiv.org/abs/2306.07691

  • ISTFTNet: https://arxiv.org/abs/2203.02395

  • Decoder only: no diffusion, no encoder release

  • Architected by: Li et al @ https://github.com/yl4579/StyleTTS2

  • Trained by: @rzvzn on Discord

  • Languages: Multiple

Model SHA256 Hash: 496dba118d1a58f5f3db2efc88dbdc216e0483fc89fe6e47ee1f2c53f18ad1e4

Training Details

Data: Kokoro was trained exclusively on permissive/non-copyrighted audio data and IPA phoneme labels. Examples of permissive/non-copyrighted audio include:

  • Public domain audio
  • Audio licensed under Apache, MIT, etc
  • Synthetic audio[1] generated by closed[2] TTS models from large providers [1] https://copyright.gov/ai/ai_policy_guidance.pdf [2] No synthetic audio from open TTS models or "custom voice clones"

Total Dataset Size: A few hundred hours of audio