Azure Cognitive Services - Text Analytics

Ashirwad Satapathi
5y
10.5k
0
8

Article

Introduction

Azure Cognitive Services are APIs, SDKs, and services available to help developers build intelligent applications without having extensive knowledge of AI or Data Science. Azure Cognitive Services helps developers leverage the experience of Microsoft in building cognitive solutions by using their APIs or SDKs to add cognitive capabilities in their own application without spending years in developing and training accurate models to perform a cognitive task.

Azure Cognitive services are categorized into five main types, namely:

Vision APIs
Speech APIs
Language APIs
Search APIs
Decision APIs

Further, these broad categories have a number of services available. In this article, we will be discussing a particular service of the Language API, called Text Analytics.

What is Text Analytics?

Text Analytics is a cognitive service part of the Language APIs of Azure Cognitive Services. Text Analytics help in discovering insights in unstructured text using natural language processing. This API lets developers add text analytics capabilities in their application with having any prior experience in machine learning or natural language processing.

Text Analytics API provides four key types of analysis, as mentioned below:

Sentiment Analysis
Key Phrase Extraction
Language Detection
Named Entity Recognition

What is Sentiment Analysis?

Sentiment Analysis is a way to find out about a person's opinion about a person, product, or service by analyzing the feedback, they have provided in the form of a raw text. A Sentiment can be categorized as Positive, Negative, and Neutral in Text Analytics. The API returns the sentiment score of a raw text between 0 and 1 where 0 represent the as highly negative and 1 representing highly positive

What is Key Phrase Extraction?

Key Phrase Extraction refers to identifying terms or words that best describe the subject or context of the document. The objective of using these services is to extract quality key phrases for a higher level of summarization of a document or set of textual content.

What is Language Detection?

Language Detection enables your application to detect the language in which the user is sending the text. This helps in a lot of ways. After identifying the language the API would return a single language code for the raw text or document you have submitted. The language code also comes with the confidence score. Confidence score defines, how confident it is that the submitted text is written in the detected language code.

What is Named Entity Recognition?

Named Entity Recognition is used to identify and recognize entities in your text as person, place, quantities, measurements, and many more. This helps in establishing the right context while performing text analytics on subjects varying in fields. For example, If you want your application to find out the entity about whom the model is not trained on then it can get difficult to do analysis on it. This is a really handy feature to solve problems in the real world.

Use Case Scenarios for Text Analytics

There are various use cases of Text Analytics in the real world. Some of the use cases can be used to perform the below-mentioned tasks:

Analyzing Survey Results
Categorizing and Classifying unstructured documents
Performing Opinion mining
Gathering Insights by doing Feedback Analytics

How to consume the Text Analytics API?

Step 1

Create an Azure Resource for Text Analytics. Then get the keys that are generated for you to authenticate your requests as a legitimate one. There are usually two keys available. You can get any one of them to use in your application.

Step 2

Then you can formulate a request to send it to the endpoint of the Text Analytics resource you had created earlier. In this article, we are going to send the request with

the help of a python script to the endpoint of the Text Analytics resource along with the necessary information. We store our API key in the subsription_key variable and the endpoint of our API in the endpoint variable.

import requests
from pprint import pprint
import os
subscription_key = "<enter-your-api-key>"
endpoint = "<enter-your-endpoint>"

To detect language, you would need to append /text/analytics/v3.0/languages to the text analytics endpoint we have declared above. And if you wanted to perform sentiment analysis then you would have append /text/analytics/v3.0/sentiment to the endpoint.

So let’s append the endpoint with /text/analytics/v3.0/languages and store it in a variable langUrl.

langUrl = endpoint + "/text/analytics/v3.0/languages"

Now as we have defined the URL and subscription key above, Let’s prepare the payload for the API and store it in a variable document. It is nothing but the data on which we want the text analytics to process and detect the languages it's written in. You need to structure the payload in the format as the documentData to send it to API in the request body to get a valid response.

documentData= {"documents": [
{"id": "1", "text": "This is a document written in English."},
{"id": "2", "text": "यह हिंदी में लिखा गया है"},
{"id": "3", "text": "ଏହା ଓଡିଆରେ ଲେଖାଯାଇଛି |"}
]}

As all the necessary information required to call and send to process to the API is ready, Let’s use the request library of python to send a request to API and store the return data in JSON format in a variable called languages.

headers = {"Ocp-Apim-Subscription-Key": subscription_key}
response = requests.post(langUrl, headers=headers, json=data)
languages = response.json()
pprint(languages)

Note

We are using print here to beautify the json data and show it in a structured way.

Step 3

Run this script. After the script runs successfully, you can see the desired result as mentioned below in the JSON data.

{'documents': [{'detectedLanguage': {'confidenceScore': 1.0,
'iso6391Name': 'en',
'name': 'English'},
'id': '1',
'warnings': []},
{'detectedLanguage': {'confidenceScore': 1.0,
'iso6391Name': 'hi',
'name': 'Hindi'},
'id': '2',
'warnings': []},
{'detectedLanguage': {'confidenceScore': 1.0,
'iso6391Name': 'or',
'name': 'Oriya'},
'id': '3',
'warnings': []}],
'errors': [],
'modelVersion': '2020-07-01'}

Conclusion

Text analytics makes the work developers easy to add NLP capabilities to their application by making API calls to it, letting to all the hard work of processing and analyzing the text for you, and finally sharing the data it gathered post-processing in the form of a JSON response. There are enormous use cases where text analytics can be used or incorporated with an existing application to add NLP capabilities to them and make them smarter. It is really easy to add it to an existing codebase, you simply need to call an API, but alternatively, you can also use the client libraries to use Text Analytics API of Azure Cognitive Services.

Note

Some of the content were taken from here and samples of Microsoft on Text Analytics