SSML in Alexa Skill Kit

Pradeep Yadav
5y
8.1k
0
4
25
Blog

What is SSML?

Speech Synthesis Mark-up Language is a mark-up language that is an XML base for speech synthesis applications. It is recommended for W3C in voice browser applications. Furthermore, ASK supports a subset of SSML tags.

You can use SSM in your skill response, which will help you have additional control over speech generation. Skill automatically handles punctuation, like speaking a sentence ending in a question mark or pausing for a few seconds.

Let's understand how to construct output speech using SSML:

// build the SSML response
var speech = new SsmlOutputSpeech();
speech.Ssml = "<speak>This is SSML respnose.</speak>";
// build the response using ResponseBuilder
var finalResponse = ResponseBuilder.Tell(speech);
return finalResponse;

In the above example we have created an instance of SSMlOutputSpeech. We set the value of SSML property, provided the speak tag in SSML property. You now need to pass an instance of the same to ResponseBuilder.

Now let's go over some of the tags.

Amazon effect: It helps apply effects on speech like whispering. it has name property as shown below,

<speak>
I want to tell you a secret.
<amazon:effect name="whispered">I am not a real human.</amazon:effect>.
Can you believe it?
</speak>
Now lets understand with example,
// build the SSML response
var speech = new SsmlOutputSpeech();
speech.Ssml = @"<speak> I want to tell you a secret. <amazon:effect name = ""whispered"" > I am not a real human.</amazon:effect>. Can you believe it? </ speak >";
// build the response using ResponseBuilder
var finalResponse = ResponseBuilder.Tell(speech);
return finalResponse;

In the above example, you have set the value of name property as whispered. It speaks the first line as usual, however, the line in <amazon: effect> tag will be whispered.

You can try the above sample in Voice & Tone in Test tab of your skill. As shown below:

Audio: For playing mp3 files in response, you need to use an audio tag. It has an src attribute that needs the path of the mp3 file. However, you need to take care of a few points while using mp3 as described below:

Mp3 files must be hosted on internet-accessible https endpoint along with trusted non-self-signed SSL certificate.
Audio file cannot be longer than 240 seconds
Bit rate must be 48 kbps
Sample rate must be 22050Hz, 24000Hz or 16000Hz

You need to use the audio tag inside speak tag as shown below:

// build the SSML response
var speech = new SsmlOutputSpeech();
speech.Ssml = @"<speak>Welcome to Ride Hailer.<audio src=""soundbank://soundlibrary/transportation/amzn_sfx_car_accelerate_01""/>You can order a ride, or request a fare estimate. Which will it be?</speak>";

// build the response using ResponseBuilder
var finalResponse = ResponseBuilder.Tell(speech);
return finalResponse;

As shown in the above example, we have used a sound library. However, you can store mp3 file on S3 and provide the path. You can also use the below library:

https://developer.amazon.com/docs/custom-skills/ask-soundlibrary.html

Break: With the help of this tag you can add a pause in speech. You can set the length of pause with the strength or time attribute as described below
time: This indicates the number of seconds or milliseconds you can specify maximum 10s or 10000ms
strength: you can use the below values
none
weak
x-weak
medium
strong
x- strong

If you will not specify any value, then it will take medium as the default.

Lets' see the below example:

// build the SSML response
var speech = new SsmlOutputSpeech();
speech.Ssml = @"<speak> There is a five second pause here <break time =""3s""/> then the speech continues. However you can keep till ten seconds</speak>";
// build the response using ResponseBuilder
var finalResponse = ResponseBuilder.Tell(speech);
return finalResponse;