CLOVA Voice
CLOVA Voice is Naver's top of the line Text-to-Speech API. When text is sent to the server, the recognized text is returned as an audio file in .mp3 or .wav format. Each request supports up to 2,000 characters. The API also provides additional parameters such as Volume, Speed, Pitch, and Emotion for customized speech synthesis.
CLOVA Voice uses CLOVA Artificial intelligence (AI) service that creates a voice synthesizer with new speakers and styles just from a 40-minute long audio and text without the complicated transcription process. This technology enables you to generate synthetic voices that are natural and clean, which sound almost like real human voices.
Even when reading from the same text, you can leverage CLOVA’s technologies to make it sound either happy or sad. More voices of different emotions and styles are coming soon, ranging from the serious tones of newscasters to friendly and neutral tones.
On top of advanced speech synthesis technology that provides voices infused with emotion, CLOVA Voice supports Korean, English, Chinese, Japanese, and Spanish.