Generate Captions For Image Using Python

Nowadays, extracting information from an image has gained too much attention. The reason behind extracting information is not because we love text more than an image. It is because one can do a lot much with this extracted text. Extracted text can be used to perform various things like:

  • Sentiment analysis
  • Text translation
  • Immersive reading, and many more.

This article explains to you, how to generate captions for an image using Azure Computer Vision. For generating captions, one can opt for any of the two options given below:

  • Using SDK
  • using REST API

Whatever the option is selected, it won't impact the outcome or the result. With SDK, then there are bunch of languages, a developer can go with:

  • C#
  • Node.js
  • Python
  • Java
  • Go

Once a language is decided, next thing is to create a computer vision resource in Azure and that can be done by login on to Azure portal at Once logged in, serach for Computer Vision and it will open up a dialog as shown below.

In the above dialog,

  • Azure Subscription - Select the active Azure subscription
  • Resource Group - Create the new resource group or you can select existing resource group
  • Region - Select the region which is closest to you
  • Name - Provide the unique name for this Computer Vision instance
  • Pricing Tier - Select the pricing tier as per your requirements
  • AI Notice - Read the aI notice and turn that checkbox on

Once all these fields are populated, clicking on Review + Create will create an instance of this resource. Once the resource is created, click on key and endpoints and grab all the details pertaining to keys, endpoint and region.

You can use any of the keys from Key 1 and Key 2.

Code to Generate Image Captions using Python

from import ComputerVisionClient
from msrest.authentication import CognitiveServicesCredentials

import os
region = "westus"
key = "KEY"
endpoint = ""

credentials = CognitiveServicesCredentials(key)
client = ComputerVisionClient(endpoint=endpoint,credentials=credentials)

with open(imagePath,'rb') as img:
    result = client.describe_image_in_stream(img)


In the above code, key, region, and endpoint are the values which we grabbed from Azure portal in previous step.

The image input is shown below.

the output is:

a woman with red hair

You can follow this link for complete demonstration of this flow. Hope you enjoyed reading this. Happy learning!