SSML in Alexa Skill Kit

In this blog, we will learn about SSML in the Alexa Skill Kit.

What is SSML?

 
Speech Synthesis Mark-up Language is a mark-up language that is an XML base for speech synthesis applications. It is recommended for W3C in voice browser applications. Furthermore, ASK supports a subset of SSML tags.
  
You can use SSM in your skill response, which will help you have additional control over speech generation. Skill automatically handles punctuation, like speaking a sentence ending in a question mark or pausing for a few seconds.
 
Let's understand how to construct output speech using SSML:
  1. // build the SSML response    
  2. var speech = new SsmlOutputSpeech();    
  3. speech.Ssml = "<speak>This is SSML respnose.</speak>";    
  4. // build the response using ResponseBuilder    
  5. var finalResponse = ResponseBuilder.Tell(speech);    
  6. return finalResponse;   
     
    In the above example we have created an instance of SSMlOutputSpeech. We set the value of SSML property, provided the speak tag in SSML property. You now need to pass an instance of the same to ResponseBuilder.
     
    Now let's go over some of the tags.
    • Amazon effect: It helps apply effects on speech like whispering. it has name property as shown below,
      1. <speak>    
      2. I want to tell you a secret.    
      3. <amazon:effect name="whispered">I am not a real human.</amazon:effect>.    
      4. Can you believe it?    
      5. </speak>    
      6.      
      7.      
      8. Now lets understand with example,    
      9. // build the SSML response    
      10. var speech = new SsmlOutputSpeech();    
      11. speech.Ssml = @"<speak> I want to tell you a secret. <amazon:effect name = ""whispered"" > I am not a real human.</amazon:effect>. Can you believe it? </ speak >";    
      12.      
      13.      
      14. // build the response using ResponseBuilder    
      15. var finalResponse = ResponseBuilder.Tell(speech);    
      16. return finalResponse;  
         
      In the above example, you have set the value of name property as whispered. It speaks the first line as usual, however, the line in <amazon: effect> tag will be whispered.
       
      You can try the above sample in Voice & Tone in Test tab of your skill. As shown below:
       
      image1 
       
      Audio: For playing mp3 files in response, you need to use an audio tag. It has an src attribute that needs the path of the mp3 file. However, you need to take care of a few points while using mp3 as described below:
      • Mp3 files must be hosted on internet-accessible https endpoint along with trusted non-self-signed SSL certificate.
      • Audio file cannot be longer than 240 seconds
      • Bit rate must be 48 kbps
      • Sample rate must be 22050Hz, 24000Hz or 16000Hz
      You need to use the audio tag inside speak tag as shown below:
      1. // build the SSML response  
      2. var speech = new SsmlOutputSpeech();  
      3. speech.Ssml = @"<speak>Welcome to Ride Hailer.<audio src=""soundbank://soundlibrary/transportation/amzn_sfx_car_accelerate_01""/>You can order a ride, or request a fare estimate. Which will it be?</speak>"
      1.  // build the response using ResponseBuilder  
      2. var finalResponse = ResponseBuilder.Tell(speech);  
      3. return finalResponse; 
      As shown in the above example, we have used a sound library. However, you can store mp3 file on S3 and provide the path. You can also use the below library:
       
      • Break: With the help of this tag you can add a pause in speech. You can set the length of pause with the strength or time attribute as described below
      • time: This indicates the number of seconds or milliseconds you can specify maximum 10s or 10000ms
      • strength: you can use the below values
      • none
      • weak
      • x-weak
      • medium
      • strong
      • x- strong
      If you will not specify any value, then it will take medium as the default.
       
      Lets' see the below example:
      1.  // build the SSML response  
      2. var speech = new SsmlOutputSpeech();  
      3. speech.Ssml = @"<speak> There is a five second pause here <break time =""3s""/> then the speech continues. However you can keep till ten seconds</speak>";  
      4. // build the response using ResponseBuilder  
      5. var finalResponse = ResponseBuilder.Tell(speech);  
      6. return finalResponse;