Google Announces General Availability of Cloud Text to Speech, New Features in Cloud Speech-to-Text

Recently, Google has announced the general availability of Cloud 'Text to Speech' and updates to Cloud 'Speech to Text'.

Recently, Google has announced the general availability of Cloud 'Text to Speech' and updates to Cloud 'Speech to Text'.
 
This general availability of Cloud Text-to-Speech will enable developers to facilitate natural sounding speech in their Apps. Cloud 'Text to Speech' was announced in March this year, since then there has been a demand from customers to provide more language support to WaveNet. Here, the WaveNet voice is a technology that mimics human voices and provides more natural experience to listeners.
 
To enable customers to create their apps for other languages also, 13 new languages and 17 new WaveNet voices have been added to the Cloud 'Text to Speech', it now supports 14 languages with 30 standard voices and 26 WaveNet voices. You can find the list of supported languages and voices here.
 
 
Source: Google 
 
Additionally, Google has also added the Audio Profile(beta) with Cloud 'Text to Speech', to enable users to tune the output accordingly on a different kind of hardware.
 
"You can now specify whether audio is intended to be played over phone lines, headphones, or speakers, and we’ll optimize the audio for playback. For example, if the audio your application produces is listened to primarily on headphones, you can create synthetic speech from Cloud Text-to-Speech API that is optimized specifically for headphones." posted Google.
 
Cloud Speech-to-Text with new features:
 
Google Cloud Speech-to-Text has been updated with new features like speaker diarization and multi-channel recognition, language auto-detect, and Word-level confidence.
 
Multi-channel recognition enables transcribing multiple channels of audio, it denotes a separate channel for every word allowing you to identify what is said by which person.
 
Speaker diarization is helpful in cases where an audio isn't separated into channels, the number of speakers can be inputted as an API parameter and via machine learning. Each word is tagged with a speaker number and the attached tags are updated over and over as more data is received. This makes the cloud more correct at identifying who is speaking, and what one said.
 
The feature Language auto-detect allows developers to add language codes (up to four) to each query, the API identifies which language was spoken automatically, and returns the transcript of the audio in that language.
 
Source: Google
 
Another feature added to the 'Cloud Speech-to-Text' is the word-level confidence scores, which facilitates developers to build apps highlighting specific words and then based upon the score, write code to prompt users to repeat those words as needed.
 
To get more details, and test product demos, you can visit Cloud Text-to-Speech and Cloud Speech-to-Text.
 
To learn more about the related technologies, you can follow the C# Corner Cloud section.