Convert Text to Audio using Azure and .NET 8

Microsoft voices

In the ever-evolving landscape of technology, accessibility remains a critical focal point. As developers, it's our responsibility to ensure that our applications are usable by individuals of all abilities.

Text-to-audio conversion, powered by Azure Cognitive Services, presents a robust solution to this challenge. In this article, we delve into the process of integrating text-to-audio functionality into

.NET applications, exploring its implementation, use cases, and the transformative impact it can have on accessibility.

Use Cases and Applications

  1. Narrating Content for Visually Impaired Users: By integrating text-to-audio functionality, applications can audibly convey website content, documents, or educational materials, thereby enhancing accessibility for visually impaired individuals.
  2. Interactive Voice Response Systems: Incorporating natural-sounding speech into IVR systems enriches user experience, offering intuitive navigation through menus, prompts, and feedback mechanisms.
  3. Instructions and Notifications: Text-to-audio conversion serves to alleviate cognitive strain by audibly delivering instructions, alerts, or notifications within applications, thereby reducing reliance on visual interaction.
  4. Language Learning Applications: Text-to-audio capabilities facilitate pronunciation guidance, text passage narration, and listening exercises within language learning applications, fostering enhanced language acquisition.

Setting Up Azure Cognitive Services

The first step is to create an Azure Cognitive Services resource.

  1. Register for an Azure account if you haven't already by signing up for free here.
  2. Access the Azure portal and establish a new Speech Service resource.
    Speech Service
    After the resource is prepared, make a note of Key 1 and the Location/Region from the 'Keys and Endpoint' tab of the resource. These details are necessary for linking your application to the Text-to-Speech service.
    Keys and Endpoint

Integrating Azure Speech service with .NET

Create a .NET 8 Console App project in Visual Studio, and then install Microsoft.CognitiveServices.Speech NuGet package.

Or

dotnet add package Microsoft.Azure.CognitiveServices.Speech

Sample Code Snippet of program.cs

using Microsoft.CognitiveServices.Speech;
using System.Media;

namespace TextToSpeech.Azure.NET8
{
    public class Program
    {
        public static async Task Main()
        {
            try
            {
                await SynthesizeAndPlayAudioAsync();
            }
            catch (Exception ex)
            {
                Console.WriteLine($"An error occurred: {ex.Message}");
            }
        }

        private static async Task SynthesizeAndPlayAudioAsync()
        {
            // Load configuration from secure storage or app settings
            string key= "";
            string region = "";

            var speechConfig = SpeechConfig.FromSubscription(key, region);

            Console.WriteLine("Enter the text to synthesize:");
            string text = Console.ReadLine();

            Console.WriteLine("Choose a voice:");
            Console.WriteLine("1. en-US-GuyNeural");
            Console.WriteLine("2. en-US-JennyNeural");
            Console.WriteLine("3. en-US-AriaNeural");
            string voiceChoice = Console.ReadLine();

            string voiceName;
            switch (voiceChoice)
            {
                case "1":
                    voiceName = "en-US-GuyNeural";
                    break;
                case "2":
                    voiceName = "en-US-JennyNeural";
                    break;
                case "3":
                    voiceName = "en-US-AriaNeural";
                    break;
                default:
                    voiceName = "en-US-GuyNeural"; // Default to GuyNeural
                    break;
            }

            speechConfig.SetProperty(PropertyId.SpeechServiceConnection_SynthVoice, voiceName);

            using var synthesizer = new SpeechSynthesizer(speechConfig);

            using var memoryStream = new MemoryStream();
            synthesizer.SynthesisCompleted += (s, e) =>
            {
                if (e.Result.Reason == ResultReason.SynthesizingAudioCompleted)
                {
                    memoryStream.Seek(0, SeekOrigin.Begin);
                    using var player = new SoundPlayer(memoryStream);
                    player.PlaySync();
                }
                else
                {
                    Console.WriteLine($"Speech synthesis failed: {e.Result.Reason}");
                }
            };

            await synthesizer.SpeakTextAsync(text);
        }
    }
}

Replace the key and region value with your own from 'Keys and Endpoint'.

  1. Main Method: The entry point of the program where asynchronous execution starts. It calls the SynthesizeAndPlayAudioAsync method.
  2. SynthesizeAndPlayAudioAsync Method: This method handles the speech synthesis and audio playback logic asynchronously.
    • It initializes the SpeechConfig object using a subscription key and region.
    • It prompts the user to enter the text to synthesize and select a voice from a predefined list.
    • Based on the user's voice choice, it sets the appropriate voice for synthesis.
    • It creates a SpeechSynthesizer object with the configured SpeechConfig.
    • It subscribes to the SynthesisCompleted event of the SpeechSynthesizer to handle audio playback.
    • When synthesis completes, it plays the synthesized audio using a SoundPlayer.
  3. Exception Handling: The program catches and displays any exceptions that occur during execution.

Testing the conversion

Once you've built and launched the application, it will convert the provided text input into speech. It will play the synthesized speech back immediately or save it to an audio file for later use.

Build and Run the Application, once the application is running, it will convert the input text to speech and either play it back or save it to an audio file.

The following sentence is converted from text to speech

Discover the enchanting beauty of Tunisia! From pristine beaches to bustling souks and ancient ruins, Tunisia offers an unforgettable experience for every traveler. Explore the historic sites of Carthage, savor delicious Mediterranean cuisine, and immerse yourself in Tunisian hospitality. Whether you're a history buff, a water sports enthusiast, or simply seeking relaxation under the sun, Tunisia has something for everyone

 Tunisia offers

You can view the outcome in this video: https://vimeo.com/918035859

Conclusion

The integration of Azure Cognitive Services with .NET presents a powerful solution for converting text to audio seamlessly. By leveraging Azure's robust infrastructure and. NET's flexibility, developers can enhance accessibility and enrich user experience across a wide range of applications.

This integration not only facilitates accessibility for users with visual impairments but also opens up new avenues for delivering content in natural-sounding audio formats, ultimately broadening the reach and impact of digital applications.

References: https://learn.microsoft.com/en-us/azure/ai-services/speech-service/text-to-speech

Thank you for reading, please let me know your questions, thoughts, or feedback in the comments section. I appreciate your feedback and encouragement.

Happy Documenting!