Image Analysis Using Computer Vision API - Cognitive Services

In this article, I am talking about one of the popular Cognitive services, Computer Vision API, for image analysis.

Introduction

 
Computer Vision API is one of the most powerful image analysis APIs provided by Microsoft Azure, that contains a highly trained model to process an image and returns valuable information about that image.
 
This API can analyze an image by either uploading the image or from the image URL. The image can be processed and analyzed for several different aspects and we can choose visual features also that we are interested to analyze.
 
To start with Computer Vision API, we must have a valid subscription key. If we have Azure subscription, then we can log in and generate a subscription. We can also get the trial subscription key from here.
 
If we have Azure subscription and want to generate subscription key, first, we need to login into Azure Portal and create an Azure Resource for Computer Vision API.
 
Click on "Create a resource", then select "AI + Machine Learning", and select "Computer Vision".
 
Image Analysis Using Computer Vision API - Cognitive Services
 
When we click on "Computer Vision", it will open the following screen. We need to fill up some basic information about New Computer Vision API.
 
Image Analysis Using Computer Vision API - Cognitive Services 
  • Name: Name of the Vision API
  • Subscription: Here, we need to select the Azure subscription that we are using for Computer Vision API.
  • Location: This is the location of the resource group. Here, we need to select the nearest location of our customer to avoid latency in response.
  • Pricing tier: We need to select a pricing tier based on our requirement.
  • Resource Group: We can create a new resource group or select an existing one. 
On click of "Create" button, our Vision API can be deployed on Azure portal and we can access this API from the dashboard.
 
Image Analysis Using Computer Vision API - Cognitive Services
 
We can view subscription details and endpoint from "Overview" tab and we can view key details from the "Keys" tab. Here, we have an option to regenerate the keys.
 
Image Analysis Using Computer Vision API - Cognitive Services
 
The endpoint is based on the location that we choose at the time of creating a resource. If we are using free trial subscription, the default endpoint is from West US. In endpoint, we need to define the version that we want to use.
 
This is all about the configuration of our Computer Vision API on Azure Portal. The next step is to consume the analysis service from our C# code and analyze the image.
 
I am using the following endpoint to demonstrate the example.
 
Request URL
  1. https://westus2.api.cognitive.microsoft.com/vision/v2.0/analyze?[visualFeatures][&details][&language]  
There are some prerequisites for image.
  • Image must be in the format: JPEG, PNG, GIF, BMP
  • Image size must be less than 4MB
  • Image dimensions should be at least 50x50
Request Parameter
 
This service accepts three optional parameters.
 
VisualFeatures
 
It is a string type parameter and indicates what visual features we are using. Based on the selected feature, this service will return the value. We can also pass multiple values to this parameter in a comma-separated manner.
 
Following are valid feature types -
  • Adult: It is used to detect a pornographic image and detect sexually suggestive content.
  • Brands: It is used to detect various brands within an image but it is only available in English.
  • Categories: It categorizes the image content according to a taxonomy defined in the documentation.
  • Color: It determines the color within the image.
  • Description: It is used to describe the image in a complete sentence in supported languages.
  • Faces: It is used to detect faces within an image. It also returns coordinates, gender, and age if faces are detected.
  • ImageType: It is used to detect if an image is a clipart or line drawing.
  • Objects: It is used to detect various objects within an image. The object names are available in English only.
  • Tags: It returns the tags related to the uploaded image.
Details 
 
It is a string type parameter and returns domain-specific details. We can also pass multiple values to this parameter and values should be comma-separated.
 
Following are valid feature types
  • Celebrities: It is used to detect celebrities within the image.
  • Landmarks: It is used to detect landmarks within the image.
Language 
 
It is a string type parameter and indicates the language to return value. The default value of this parameter is "en". Following are supported languages
  • en - English, Default
  • es - Spanish
  • ja - Japanese
  • pt - Portuguese
  • zh - Simplified Chinese 
Request headers
 
We must set the following header while requesting the service.
  • Content-Type
    Media type of the body that sent to the API. There are three kinds of a content type that can be selected. If we are sending an image URL, the content type should be "application/json". If we are sending byte data of the image, the content type should be "application/octet-stream" or "multipart/form-data".

  • Ocp-Apim-Subscription-Key
    Subscription key that provides access to the Computer Vision API.
Request body
 
Request body is either raw image binary or image URL. Request must pass as POST method.
 
Response
 
The API returns status code 200, when request is successfully executed and API returns JSON that contains the information about the image. When the API returns status code 400, it means there is a problem with our request. The Possible errors are invalid image URL, invalid image format, image size is too large, not supported format, not supported language and bad argument.
 
When API returns status code 415, it indicates unsupported media type provided with the request. When API returns status code 500, it indicates there is internal server error and possible reasons are server fail to process, timeout and internal server error.
 
Example
 
In the following example code, I have made a call to Computer Vision API using HttpClient and passed image bytes into the request body. If API returns a successful status code, then I will deserialize it into an appropriate model and return the request.
  1. public class AnalyzeImageService  
  2. {  
  3.     public async Task<AnalyzeObjectModel> MakeRequest(string imageFilePath, string subscriptionKey, string endPoint)  
  4.     {  
  5.         AnalyzeObjectModel responeData = new AnalyzeObjectModel();  
  6.         try  
  7.         {  
  8.             HttpClient client = new HttpClient();  
  9.   
  10.             // Request headers.  
  11.             client.DefaultRequestHeaders.Add(  
  12.                 "Ocp-Apim-Subscription-Key", subscriptionKey);  
  13.             // Request parameters.  
  14.             string requestParameters = "visualFeatures=Categories,Description,Objects,Tags";  
  15.   
  16.             // Assemble the URI for the REST API Call.  
  17.             string uri = endPoint + "analyze" + "?" + requestParameters;  
  18.   
  19.             HttpResponseMessage response;  
  20.   
  21.   
  22.             // Request body. Posts a locally stored JPEG image.  
  23.             byte[] byteData = GetImageAsByteArray(imageFilePath);  
  24.   
  25.             using (ByteArrayContent content = new ByteArrayContent(byteData))  
  26.             {  
  27.                 // In this example, I have uses content type "application/octet-stream".  
  28.                 // Alternatively, you can use are "application/json or multipart/form-data".  
  29.                 content.Headers.ContentType =  
  30.                     new MediaTypeHeaderValue("application/octet-stream");  
  31.   
  32.                 // Make the REST API call and wating for response.  
  33.                 response = await client.PostAsync(uri, content);  
  34.             }  
  35.   
  36.             // Get and read the JSON response.  
  37.             string result = await response.Content.ReadAsStringAsync();  
  38.   
  39.             //Do further process if response successfully.  
  40.             if (response.IsSuccessStatusCode)  
  41.             {  
  42.                 if (response.StatusCode == System.Net.HttpStatusCode.OK)  
  43.                 {  
  44.                     responeData = JsonConvert.DeserializeObject<AnalyzeObjectModel>(result);  
  45.                 }  
  46.             }  
  47.         }  
  48.         catch (Exception e)  
  49.         {  
  50.             Console.WriteLine("\n" + e.Message);  
  51.         }  
  52.         return responeData;  
  53.     }  
  54.     internal static byte[] GetImageAsByteArray(string imageFilePath)  
  55.     {  
  56.         using (FileStream fileStream =  
  57.             new FileStream(imageFilePath, FileMode.Open, FileAccess.Read))  
  58.         {  
  59.             BinaryReader binaryReader = new BinaryReader(fileStream, System.Text.Encoding.UTF8);  
  60.             return binaryReader.ReadBytes((int)fileStream.Length);  
  61.         }  
  62.     }  
  63. }  
JSON
  1. {  
  2.   "categories": [  
  3.     {  
  4.       "name""people_many",  
  5.       "score": 0.92578125  
  6.     }  
  7.   ],  
  8.   "tags": [  
  9.     {  
  10.       "name""person",  
  11.       "confidence": 0.9990049004554749  
  12.     },  
  13.     {  
  14.       "name""indoor",  
  15.       "confidence": 0.9931796789169312  
  16.     },  
  17.     {  
  18.       "name""clothing",  
  19.       "confidence": 0.9565780162811279  
  20.     },  
  21.     {  
  22.       "name""computer",  
  23.       "confidence": 0.9460503458976746  
  24.     },  
  25.     {  
  26.       "name""floor",  
  27.       "confidence": 0.9100943207740784  
  28.     },  
  29.     {  
  30.       "name""laptop",  
  31.       "confidence": 0.8889466524124146  
  32.     },  
  33.     {  
  34.       "name""furniture",  
  35.       "confidence": 0.7828801870346069  
  36.     },  
  37.     {  
  38.       "name""chair",  
  39.       "confidence": 0.7779731750488281  
  40.     },  
  41.     {  
  42.       "name""table",  
  43.       "confidence": 0.7184659242630005  
  44.     },  
  45.     {  
  46.       "name""working",  
  47.       "confidence": 0.7051602005958557  
  48.     },  
  49.     {  
  50.       "name""man",  
  51.       "confidence": 0.6698439121246338  
  52.     },  
  53.     {  
  54.       "name""people",  
  55.       "confidence": 0.6486355662345886  
  56.     },  
  57.     {  
  58.       "name""desk",  
  59.       "confidence": 0.611915647983551  
  60.     },  
  61.     {  
  62.       "name""office building",  
  63.       "confidence": 0.5795468688011169  
  64.     },  
  65.     {  
  66.       "name""whiteboard",  
  67.       "confidence": 0.5002871155738831  
  68.     }  
  69.   ],  
  70.   "description": {  
  71.     "tags": [  
  72.       "person",  
  73.       "indoor",  
  74.       "laptop",  
  75.       "man",  
  76.       "table",  
  77.       "people",  
  78.       "sitting",  
  79.       "computer",  
  80.       "room",  
  81.       "small",  
  82.       "group",  
  83.       "woman",  
  84.       "child",  
  85.       "front",  
  86.       "young",  
  87.       "using",  
  88.       "kitchen",  
  89.       "boy",  
  90.       "desk",  
  91.       "living",  
  92.       "doing",  
  93.       "board",  
  94.       "holding",  
  95.       "standing",  
  96.       "riding"  
  97.     ],  
  98.     "captions": [  
  99.       {  
  100.         "text""a group of people sitting at a table using a laptop",  
  101.         "confidence": 0.8144636347294436  
  102.       }  
  103.     ]  
  104.   },  
  105.   "objects": [  
  106.     {  
  107.       "rectangle": {  
  108.         "x": 1120,  
  109.         "y": 453,  
  110.         "w": 154,  
  111.         "h": 186  
  112.       },  
  113.       "object""person",  
  114.       "confidence": 0.54  
  115.     },  
  116.     {  
  117.       "rectangle": {  
  118.         "x": 1386,  
  119.         "y": 439,  
  120.         "w": 155,  
  121.         "h": 173  
  122.       },  
  123.       "object""person",  
  124.       "confidence": 0.524  
  125.     },  
  126.     {  
  127.       "rectangle": {  
  128.         "x": 365,  
  129.         "y": 349,  
  130.         "w": 347,  
  131.         "h": 514  
  132.       },  
  133.       "object""person",  
  134.       "confidence": 0.713  
  135.     },  
  136.     {  
  137.       "rectangle": {  
  138.         "x": 1198,  
  139.         "y": 485,  
  140.         "w": 297,  
  141.         "h": 279  
  142.       },  
  143.       "object""person",  
  144.       "confidence": 0.609  
  145.     },  
  146.     {  
  147.       "rectangle": {  
  148.         "x": 1807,  
  149.         "y": 439,  
  150.         "w": 376,  
  151.         "h": 378  
  152.       },  
  153.       "object""person",  
  154.       "confidence": 0.546  
  155.     },  
  156.     {  
  157.       "rectangle": {  
  158.         "x": 1381,  
  159.         "y": 512,  
  160.         "w": 390,  
  161.         "h": 602  
  162.       },  
  163.       "object""person",  
  164.       "confidence": 0.638  
  165.     },  
  166.     {  
  167.       "rectangle": {  
  168.         "x": 1873,  
  169.         "y": 435,  
  170.         "w": 561,  
  171.         "h": 672  
  172.       },  
  173.       "object""person",  
  174.       "confidence": 0.801  
  175.     },  
  176.     {  
  177.       "rectangle": {  
  178.         "x": 816,  
  179.         "y": 574,  
  180.         "w": 532,  
  181.         "h": 590  
  182.       },  
  183.       "object""person",  
  184.       "confidence": 0.525  
  185.     }  
  186.   ],  
  187.   "requestId""e1963238-cb4e-4ab1-924a-84b4545f9101",  
  188.   "metadata": {  
  189.     "width": 2442,  
  190.     "height": 1166,  
  191.     "format""Jpeg"  
  192.   }  
  193. }  
Image Analysis Using Computer Vision API - Cognitive Services
 

Summary

 
A Computer Vision API analyzes the image and returns information about that image. It also provides features like categorizing the content of image, accent color, faces within image, objects and tags detection, etc. This article is very useful for beginners to Azure Cognitive Services API. Azure provides this kinds of service at a very cheap rate. You can check the pricing here.
 
You can view and download source code from GitHub.