Image Captioning With Azure Cognitive Services

This article is a hands-on tutorial to create an image recognition system using Azure Cognitive Services. Image Analysis and Captioning are non-trivial computer vision problems and are solved with a machine learning approach. With Azure Cognitive Services, it is really easy to integrate image analysis features in any application.   

A lot of learnings are accumulated from events. Events like Azure Summit enables developers, engineers, solutions architects, and enthusiasts to learn new skills. Do check out the website of Azure Summit to be in touch with the recent happenings in Azure.

Computer Vision

Computer Vision is a branch of Artificial Intelligence that mainly deals with the processing of visual data such as images or videos. Multitudes of Machine Learning Models can be implemented using various Algorithms to perform different tasks. Some of the key tasks are performed in Computer Vision are listed below. 

Image Analysis

Image Analysis is the process of analyzing an image such that information is extracted from the image using various image processing mechanisms. This could range from reading bar code tags to face recognition or object detection. 

Image/ Photo Captioning

Image Captioning is the process of the generation of the description of an image in text form after analyzing the image. The captioning is done on the basis of objects, locations, face recognized and actions detected in the image.  

Use of Image Captioning

  • Image description generation for Visually Impaired 
  • Natural language description automation 
  • Recommendation for editing applications 
  • Usage for social media 
  • Analysis for virtual assistants 
  • Key Words generation for search indexing of image 
  • Clustering images on location, color, face detected, the object detected basis and more. 

Azure Cognitive Services

Azure Cognitive Services enables organizations to build cognitive intelligence for applications with client library SDKs and REST APIs. Azure Cognitive Services allows developers to integrate cognitive features into applications with no prior knowledge of Machine Learning, Data Science, and Artificial Intelligence skillset. From Computer Vision to Natural Language Processing and Conversational AI, Azure Cognitive Services enables all kinds of applications.  


Python is one of the easiest and widely used programming languages across the globe, 

  • Taught as a beginning programming language to students 
  • Clear syntax facilitates, ease of understanding and code indentation 
  • Active communities of libraries and modules developers 


Anaconda is a distribution for scientific computing which is an easy-to-install free package manager and environment manager and has a collection of over 720 open-source packages offering free community support for R and Python programming languages. It supports Windows, Linux, and Mac OS and also ships with Jupyter Notebook. 

Jupyter Notebook 

Jupyter Notebook is an amalgamation of an IDE and also an educational tool for presentation which is used extensively and widely mostly for programming for scientific computing. 

Today, we’ll learn to develop an image analysis system which can caption image analyzing the picture using Azure Cognitive Services.  The tutorial is programmed using Python in Jupyter Notebook with Anaconda Environment. To check out the codes and dependencies for your setup, kindly visit Github.    

Step 1 - Create a Cognitive Service in Azure

This is an essential step as Azure Cognitive Service uses AI algorithms to process the data we supply and perform the calculations as we require. With the use of Azure Cognitive Services, the AI calculation part of the system is handled by it. May it be Machine Learning Mechanism or Deep Learning, we can create applications with the output without having to struggle to create models of our own. 

To Create the Cognitive Service, check out the previous article How to Create a Cognitive Service in Azure? 

Step 2

Obtain Key 1 and Endpoint from the running Azure Cognitive Service to replace in the following code for cog_key and cog_endpoint respectively.  

Step 3

cog_key = 'Key1 value' 
cog_endpoint = 'Endpoint value eg.' 
print('Ready to use cognitive services at {} using key {}'.format(cog_endpoint, cog_key))  

Step 4 - Analyzing Image 

With key and endpoints, integrated, we can now proceed ahead for image analysis. Running the code below will help generate the image description from the file “a.jpg”. 

from import ComputerVisionClient 
from msrest.authentication import CognitiveServicesCredentials 
from python_code import vision 
import os 
%matplotlib inline 
# Get the path to an image file 
image_path = os.path.join('data', 'vision', 'a.jpg') 
# Get a client for the computer vision service 
computervision_client = ComputerVisionClient(cog_endpoint, CognitiveServicesCredentials(cog_key)) 
# Get a description from the computer vision service 
image_stream = open(image_path, "rb") 
description = computervision_client.describe_image_in_stream(image_stream) 
# Display image and caption (code in helper_scripts/ 
vision.show_image_caption(image_path, description)

Image Captioning With Azure Cognitive Services

The output caption shows what I could analyze from the image in a description format with the confidence percentage.  

Step 5

Testing with the new image,

# Get the path to an image file 
image_path = os.path.join('data', 'vision', 'fa.jpg')  
# Get a description from the computer vision service 
image_stream = open(image_path, "rb") 
description = computervision_client.describe_image_in_stream(image_stream)   
# Display image and caption (code in helper_scripts/ 
vision.show_image_caption(image_path, description) 

Image Captioning With Azure Cognitive Services

The description looks great, doesn’t it? This is the power of Azure Cognitive Services. The use of the possibilities with Azure Cognitive Services is only limited to your imagination now. Go ahead and create applications with this service to solve problems.  

Step 6

Creation of Tags from the image making it searchable. This technique could impose great benefits for wider applications.  

# Get the path to an image file 
image_path = os.path.join('data', 'vision', 'fa.jpg')   
# Specify the features we want to analyze 
features = ['Description', 'Tags', 'Adult', 'Objects', 'Faces']   
# Get an analysis from the computer vision service 
image_stream = open(image_path, "rb") 
analysis = computervision_client.analyze_image_in_stream(image_stream, visual_features=features) 
# Show the results of analysis (code in helper_scripts/ 
vision.show_image_analysis(image_path, analysis) 

Image Captioning With Azure Cognitive Services

We have given the features we want to analyze and rated what we expect from the image too. Also, multiple tags are generated by the system itself. A huge implication of usage with such minimal programming.  

In this article, we learned about Image Analysis and Image Captioning. We went through a hands-on experience to create our AI program using Azure Cognitive Service. The benefits of image captioning are also discussed. Step by Step process to create this program plus as a reference to learn in case of any issues, Github link is also attached which contains the dependencies guidelines for your setup and a pre-worked Notebook to check out. Try it for yourself, the motive for this article is to empower you with AI-enabled applications. This could be your first step to dive into AI.