Speech Synthesis API


The Speech Synthesis API NuGet package allows your applications, tools or devices to take a text in input of the package and convert this text into almost human synthesized natural speech. Choose from standard and neural voices, or create a custom voice specific to your product or brand.

Text-to-speech technology allows content creators to interact with their users in different ways. Text-to-speech can improve accessibility by providing users with an option to interact with content audibly. Whether the user is visually impaired, has learning disabilities, or needs navigation information while driving, text-to-speech can enhance an existing experience. Text-to-speech is also a valuable add-on for bots and voice assistants.

With Text-to-Speech Markup Language (SSML), an XML-based markup language, developers using the text-to-speech service can enable the way input text is converted to synthesized speech. With SSML, you can adjust the tone, pronunciation, speaking speed, volume, and more. For more information, see the SSML page.

Standard voice

Standard voices are created using parametric statistical synthesis and/or concatenative synthesis techniques. These voices are highly intelligible and natural. You can easily configure your apps to speak in more than 45 languages, with a wide range of voice options. These voices are very precise in terms of pronunciation. They support abbreviations, acronym expansions, date/time interpretation, polyphones and much more. Use standard voice to improve the accessibility of your applications and services by allowing users to interact with your content in an audible manner.

Neural voices

Neural voices use deep neural networks to overcome the boundaries of traditional speech synthesis systems, matching accent and intonation patterns in the spoken language, and synthesizing speech units into computer voices.

Standard speech synthesis decomposes prosody into several stages of linguistic analysis and acoustic prediction governed by independent models, which can lead to a muffled synthetic voice. Here, neural capabilities perform prosodic prediction and speech synthesis simultaneously, for a more natural voice and a smoother utterance.

Personalized Voices

Personalization of voices allows you to create a unique voice that is recognizable among all for your brand. To create your custom voice font, record in the studio and load the associated scripts as training data. The service then creates a unique voice model adapted to your recording

To use this NuGet package use CLI:

Install-Package Speech.Synthesis.API -Version 1.0.0