Creating A Voice Assistant Using Python And It's Libraries

Introduction

 
We use several voice assistants by various companies from around the world, like Amazon’s Alexa, Google’s Google Assistant, Apple’s Siri. In this article, we will discuss creating a voice assistant using the Python programming language and its libraries. We are using different libraries which are available in Python. As I said in my earlier articles Python has a lot of library files that are used for different purposes. In this article, we will see how to create a voice assistant using Python.
 

Installing Libraries

 
For creating this program we need to install some library files from the internet for the purpose of our usage. These library files can be installed using the following terminal commands. 
  1. pip install Pyaudio    
  2. pip install speech_recognition     
  3. pip install playsound    
  4. pip install gtts    
  5. pip install ssl    
  6. pip install certify    
  7. pip install webrowser     
Step 1 - Importing libraries
 
At the first stage of programming, we need to import the libraries and functions which are required for the execution of the program. We may import libraries like speech_recognition for recognition of what we are speaking, playground for the purpose of making the system to speak or play sound, gtts (Google text to Speech) for making the system to interact with you and for getting a voice, random for the purpose of giving the random names for the audio files generated by the program, ctime for the purpose of telling you the time if you required, web browser for the purpose for opening and accessing the browser of your system and os for the purpose of destroying the audio files created by the program. 
  1. import speech_recognition as sr # recognise speech     
  2. import playsound # to play an audio file    
  3. from gtts import gTTS # google text to speech    
  4. import random    
  5. from time import ctime # get time details    
  6. import webbrowser # open browser    
  7. import ssl    
  8. import certifi    
  9. import time    
  10. import os # to remove created audio files     
Step 2 - Recognition of Speech
 
The next important step in voice assistant is recognition of speech and the speaker. To make the system understand what we are saying we need to convert the voice to text. For the purpose of learning more about this concept please refer to my friend Naveenkumar Paramasivam's article by clicking here. Then we need to save the name of the person who is interacting with the system.
  1. r = sr.Recognizer() # initialise a recogniser    
  2. # listen for audio and convert it to text:    
  3. def record_audio(ask=False):    
  4.     with sr.Microphone() as source: # microphone as source    
  5.         if ask:    
  6.             speak(ask)    
  7.         audio = r.listen(source)  # listen for the audio via source    
  8.         voice_data = ''    
  9.         try:    
  10.             voice_data = r.recognize_google(audio)  # convert audio to text    
  11.         except sr.UnknownValueError: # error: recognizer does not understand    
  12.             speak('I did not get that')    
  13.         except sr.RequestError:    
  14.             speak('Sorry, the service is down'# error: recognizer is not connected    
  15.         print(f">> {voice_data.lower()}"# print what user said    
  16.         return voice_data.lower()     
Step 3 - Replying to the question
 
To make the system reply during the conversation we are using Google's text to speech library for the purpose of making the computer speak out. First we need to make the system respond in the form of text and then using the gtts library we can make the system read the text. 
  1. def speak(audio_string):    
  2. tts = gTTS(text=audio_string, lang='en'# text to speech(voice)    
  3. r = random.randint(1,20000000)    
  4. audio_file = 'audio' + str(r) + '.mp3'    
  5. tts.save(audio_file) # save as mp3    
  6. playsound.playsound(audio_file) # play the audio file    
  7. print(f"May Day: {audio_string}"# print what app said    
  8. os.remove(audio_file) # remove audio file     
Step 4 - Responding for the questions
 
At least you need to make the system respond to the questions asked by the user. In this program, I just included a few ways of responding to the questions like details about the stocks, searching something on the browser and youtube, and some more basic things. You may customize it based on your requirement.
  1. import speech_recognition as sr # recognise speech  
  2. import playsound # to play an audio file  
  3. from gtts import gTTS # google text to speech  
  4. import random  
  5. from time import ctime # get time details  
  6. import webbrowser # open browser  
  7. import ssl  
  8. import certifi  
  9. import time  
  10. import os # to remove created audio files  
  11. class person:  
  12.     name = ''  
  13.     def setName(self, name):  
  14.         self.name = name  
  15.   
  16. def there_exists(terms):  
  17.     for term in terms:  
  18.         if term in voice_data:  
  19.             return True  
  20.   
  21. r = sr.Recognizer() # initialise a recogniser  
  22. # listen for audio and convert it to text:  
  23. def record_audio(ask=False):  
  24.     with sr.Microphone() as source: # microphone as source  
  25.         if ask:  
  26.             speak(ask)  
  27.         audio = r.listen(source)  # listen for the audio via source  
  28.         voice_data = ''  
  29.         try:  
  30.             voice_data = r.recognize_google(audio)  # convert audio to text  
  31.         except sr.UnknownValueError: # error: recognizer does not understand  
  32.             speak('I did not get that')  
  33.         except sr.RequestError:  
  34.             speak('Sorry, the service is down'# error: recognizer is not connected  
  35.         print(f">> {voice_data.lower()}"# print what user said  
  36.         return voice_data.lower()  
  37.   
  38. # get string and make a audio file to be played  
  39. def speak(audio_string):  
  40.     tts = gTTS(text=audio_string, lang='en'# text to speech(voice)  
  41.     r = random.randint(1,20000000)  
  42.     audio_file = 'audio' + str(r) + '.mp3'  
  43.     tts.save(audio_file) # save as mp3  
  44.     playsound.playsound(audio_file) # play the audio file  
  45.     print(f"May Day: {audio_string}"# print what app said  
  46.     os.remove(audio_file) # remove audio file  
  47.   
  48. def respond(voice_data):  
  49.     speak('How can I help you?')  
  50.     # 1: greeting  
  51.     if there_exists(['hey','hi','hello']):  
  52.         greetings = [f"hey, how can I help you {person_obj.name}", f"hey, what's up? {person_obj.name}", f"I'm listening {person_obj.name}", f"how can I help you? {person_obj.name}", f"hello {person_obj.name}"]  
  53.         greet = greetings[random.randint(0,len(greetings)-1)]  
  54.         speak(greet)  
  55.   
  56.     # 2: name  
  57.     if there_exists(["what is your name","what's your name","tell me your name"]):  
  58.         if person_obj.name:  
  59.             speak("my name is May day")  
  60.         else:  
  61.             speak("my name is May Day. what's your name?")  
  62.   
  63.     if there_exists(["my name is","i am"]):  
  64.         person_name = voice_data.split("is")[-1].strip()  
  65.         speak(f"okay, i will remember that {person_name}")  
  66.         person_obj.setName(person_name) # remember name in person object  
  67.   
  68.     # 3: greeting  
  69.     if there_exists(["how are you","how are you doing"]):  
  70.         speak(f"I'm very well, thanks for asking {person_obj.name}")  
  71.   
  72.     # 4: time  
  73.     if there_exists(["what's the time","tell me the time","what time is it"]):  
  74.         time = ctime().split(" ")[3].split(":")[0:2]  
  75.         if time[0] == "00":  
  76.             hours = '12'  
  77.         else:  
  78.             hours = time[0]  
  79.         minutes = time[1]  
  80.         time = f'{hours} {minutes}'  
  81.         speak(time)  
  82.   
  83.     # 5: search google  
  84.     if there_exists(["search for"]) and 'youtube' not in voice_data:  
  85.         search_term = voice_data.split("for")[-1]  
  86.         url = f"https://google.com/search?q={search_term}"  
  87.         webbrowser.get().open(url)  
  88.         speak(f'Here is what I found for {search_term} on google')  
  89.   
  90.     # 6: search youtube  
  91.     if there_exists(["youtube"]):  
  92.         search_term = voice_data.split("for")[-1]  
  93.         url = f"https://www.youtube.com/results?search_query={search_term}"  
  94.         webbrowser.get().open(url)  
  95.         speak(f'Here is what I found for {search_term} on youtube')  
  96.   
  97.     # 7: get stock price  
  98.     if there_exists(["price of"]):  
  99.         search_term = voice_data.lower().split(" of ")[-1].strip() #strip removes whitespace after/before a term in string  
  100.         stocks = {  
  101.             "apple":"AAPL",  
  102.             "microsoft":"MSFT",  
  103.             "facebook":"FB",  
  104.             "tesla":"TSLA",  
  105.             "bitcoin":"BTC-USD"  
  106.         }  
  107.         try:  
  108.             stock = stocks[search_term]  
  109.             stock = yf.Ticker(stock)  
  110.             price = stock.info["regularMarketPrice"]  
  111.   
  112.             speak(f'price of {search_term} is {price} {stock.info["currency"]} {person_obj.name}')  
  113.         except:  
  114.             speak('oops, something went wrong')  
  115.     if there_exists(["exit""quit""goodbye"]):  
  116.         speak("going offline")  
  117.         exit()  
  118.   
  119.   
  120. time.sleep(1)  
  121.   
  122. person_obj = person()  
  123. while(1):  
  124.     voice_data = record_audio() # get the voice input  
  125.     respond(voice_data) # respond   

Conclusion

 
Even though it is not the way of creating a perfect personal assistant, this may help you to create a personal assistant based on your requirement and your need. You may customize this code based on your requirement for your usage. Thank you.


Similar Articles