🚀Harnessing the Power of Natural Language Processing using Microsoft Azure AI

What is Natural Language Processing?

Natural Language Processing (NLP) is a subfield of computer science and artificial intelligence (AI) that deals with the interaction between computers and humans using natural language. It involves teaching computers to understand, interpret, and generate human language, which is often complex, ambiguous, and context-dependent.

Steps to create Natural Language Processing using Microsoft Azure AI

Here are some steps to create Natural Language Processing using Microsoft Azure AI

Tokenization: Tokenization refers to breaking down a sentence or paragraph into individual words or phrases. These tokens are then used as a basis for further analysis, such as sentiment analysis, part-of-speech tagging, or named entity recognition. Tokenization is divided into two parts namely Sentence Tokenizer and Word Tokenizer.

Python Code for Sentence Tokenizer

#Sentence Tokenizer
from nltk.tokenize import sent_tokenize
my_text="Hello Vikram, how are you I hope everything is going well. Today is a good day, see you dude"
print(sent_tokenize(my_text))

Output

['Hello Mr.Vikram how are you?', 'I hope everything is going well.', 'Today is a good day, see you dude']['Hello Mr.Vikram how are you?', 'I hope everything is going well.', 'Today is a good day, see you dude']

Python Code for Word Tokenizer

#Word Tokenizer
from nltk.tokenize import word_tokenize
my_text="Hello Vikram, how are you? I hope everything is going well. Today is a good day, see you dude"
print(word_tokenize(my_text))

Output

['Hello', ‘Mr.Vikram', ',', 'how', 'are', 'you', '?', 'I', 'hope', 'everything', 'is', 'going', 'well', '.', 'Today', 'is', 'a', 'good', 'day', ',', 'see', 'you', 'dude']

Stop Words: Stop words are common words that are filtered out from text data during natural language processing (NLP) tasks, as they are considered to have little or no importance in determining the meaning of the text. Examples of stop words include "a," "an," "the," "and," "in," "on," "at," and so on.

Python Code for Stop Words

#Stop Words
from nltk.corpus import stopwords
print(stopwords.words('english'))

Output

['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', "you're", "you've", "you'll", "you'd", 'your', 'yours', 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', "she's", 'her', 'hers', 'herself', 'it', "it's", 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves', 'what', 'which', 'who', 'whom', 'this', 'that', "that'll", 'these', 'those', 'am', 'is', 'are', 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does', 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until', 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into', 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down', 'in', 'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here', 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each', 'few', 'more', 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same', 'so', 'than', 'too', 'very', 's', 't', 'can', 'will', 'just', 'don', "don't", 'should', "should've", 'now', 'd', 'll', 'm', 'o', 're', 've', 'y', 'ain', 'aren', "aren't", 'couldn', "couldn't", 'didn', "didn't", 'doesn', "doesn't", 'hadn', "hadn't", 'hasn', "hasn't", 'haven', "haven't", 'isn', "isn't", 'ma', 'mightn', "mightn't", 'mustn', "mustn't", 'needn', "needn't", 'shan', "shan't", 'shouldn', "shouldn't", 'wasn', "wasn't", 'weren', "weren't", 'won', "won't", 'wouldn', "wouldn't"]

Stemming: Stemming is the process of reducing a word to its base or root form, also known as a stem, by removing the suffixes or prefixes. The purpose of stemming in Natural Language Processing (NLP) is to reduce different forms of a word to a common base form, which helps to simplify the analysis of text data.

Python Code for Stemming

#Stemming
from nltk.stem import PorterStemmer
stemmer = PorterStemmer()
words = ["running", "working", "increases", "decreases", "history"]
for word in words:
    stemmed_word = stemmer.stem(word)
    print(f"{word} --> {stemmed_word}")

Output

running --> run

working --> work

increases  -->  increas

decreases --> decreas

history --> histori

Lemmatization: Lemmatization is the process of reducing a word to its base or dictionary form, known as a lemma, by considering the context and morphology of the word. Unlike stemming, which simply removes the suffixes or prefixes to obtain a base form, lemmatization takes into account the part of speech and other linguistic factors to generate the correct lemma for a word.

Python Code for Lemmatization

#Lemmatization
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()
words = ["running", "working", "increases", "decreases", "history"]
for word in words:
    lemmatized_word = lemmatizer.lemmatize(word, pos='v')
    print(f"{word} --> {lemmatized_word}")

Output

running --> run

working --> work

increases --> increase

decreases --> decrease

history --> history

Named Entity Recognition (NER):  Named Entity Recognition (NER) is a task in Natural Language Processing (NLP) that involves identifying and classifying named entities in text. Named entities are objects, people, locations, organizations, or other entities that are referred to by name in a text. The goal of NER is to automatically identify and classify these named entities in a text into predefined categories such as person names, organization names, locations, and other types of entities.

Python Code for Named Entity Recognition (NER)

#Named Entity Recognition (NER)
import nltk
# sample text
text = "Microsoft is looking at buying U.K. startup for $2 billion"
# tokenize text
tokens = nltk.word_tokenize(text)
# tag tokens with parts of speech
tagged_tokens = nltk.pos_tag(tokens)
# perform named entity recognition on tagged tokens
named_entities = nltk.ne_chunk(tagged_tokens)
# print named entities
print(named_entities)

Output

(S
  (ORGANIZATION Microsoft/NNP)
  is/VBZ
  looking/VBG
  at/IN
  buying/VBG
  (GPE U.K./NNP)
  (ORGANIZATION startup/NN)
  for/IN
  $/$
  2/CD
  billion/CD)

Summary

In this article, we learned about Natural Language Processing capabilities and steps to create Natural Language Processing using Microsoft Azure AI. Using NLP bridges the gap between human language and machine language, enabling humans to interact with machines in a more natural and intuitive way, and enabling machines to understand and process human language in a more intelligent and sophisticated way.