CLOVA Voice

Getting started
Model
Plan

CLOVA Voice

Text-to-Speech

N/A

Naver

the terms and conditions of service use Privacy Policy Refund Policy Customer Center Business Information

English 한국어

GenDive, Inc.Representative ∣ Minhyeok HamBusiness Registration Number ∣ 449-87-02752

Personal Information Manager ∣ Junhyeok HamBusiness Registration Number ∣ 2025-Gwangju-Dong-0120

Email ∣ info@gendata.krTel ∣ 070-4895-5550

310, 3F, GenDive, 8th, Ace High-end Tower, 84, Gasan Digital 1-ro, Geumcheon-gu, Seoul

CLOVA Voice

Text-to-Speech

N/A

Naver

Overview

CLOVA Voice is Naver's top of the line Text-to-Speech API. When text is sent to the server, the recognized text is returned as an audio file in .mp3 or .wav format. Each request supports up to 2,000 characters. The API also provides additional parameters such as Volume, Speed, Pitch, and Emotion for customized speech synthesis.

Features

High-quality synthesized voice

CLOVA Voice uses CLOVA Artificial intelligence (AI) service that creates a voice synthesizer with new speakers and styles just from a 40-minute long audio and text without the complicated transcription process. This technology enables you to generate synthetic voices that are natural and clean, which sound almost like real human voices.

Natural-sounding voices with emotion

Even when reading from the same text, you can leverage CLOVA’s technologies to make it sound either happy or sad. More voices of different emotions and styles are coming soon, ranging from the serious tones of newscasters to friendly and neutral tones.

Optimal for a wide-ranging content

On top of advanced speech synthesis technology that provides voices infused with emotion, CLOVA Voice supports Korean, English, Chinese, Japanese, and Spanish.