Extracting Text From An Image Using Azure Cognitive Services

In this article, we will learn about how we can read or extract text from an image, irrespective of whether it is handwritten or printed.

In order to read the text, two things come into the picture. The first one is Computer Vision and the second one is NLP, which is short for Natural Language Processing. Computer vision helps us to read the text and then NLP is used to make sense of that identified text. In this article, I’ll mention specifically about text extraction part.

How Computer Vision Performs Text Extraction

To execute this text extraction task, Computer Vision provides us with two APIs,

  • OCR API
  • Read API

OCR API, works with many languages and is very well suited for relatively small text but if you have so much text in any image or say text-dominated image, then Read API is your option.

OCR API provides information in the form of Regions, Lines, and Words. The region in the given image is the area that contains the text. So, the output hierarchy would be - Region, Lines of text in each region, and then Words in each line.

Read API, works very well with an image, that is highly loaded with text. The best example of a text-dominated image is any scanned or printed document. Here output hierarchy is in the form of Pages, Lines, and Words. As this API deals with a high number of lines and words, it works asynchronously. Hence do not block our application until the whole document is read. Whereas OCR API works in a synchronous fashion.

Here is the table depicting, when to use what,

OCR API Read API
Good for relatively small text Good for text-dominated image, i.e Scanned Docs
Output hierarchy would be Regions >> Lines >> Words Output hierarchy would be Pages >> Lines >> Words
Works in a synchronous manner Works in an asynchronous manner.

Sample Code for OCR API

static async Task ExtractTextUsingOCR(string imageFilePath)
{
    try
    {
        HttpClient client = new HttpClient();

        // Request headers.
        client.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", subscriptionKey);                
        string requestParameters = "language=unk&detectOrientation=true";
        
        string uri = uriBase + "?" + requestParameters;
        HttpResponseMessage response;

        // Read image into a byte array
        byte[] dataBytes = GetByteArrayOfImage(imageFilePath);

        // Add the byte array as an octet stream to the request body.
        using (ByteArrayContent content = new ByteArrayContent(dataBytes))
        {
            content.Headers.ContentType = new MediaTypeHeaderValue("application/octet-stream");

            // Call to API
            response = await client.PostAsync(uri, content);
        }
       
        string contentString = await response.Content.ReadAsStringAsync();

        // Display output
        Console.WriteLine("\nResponse:\n\n{0}\n", JToken.Parse(contentString).ToString());
    }
    catch (Exception e)
    {
        Console.WriteLine("\n" + e.Message);
    }
}
    
static byte[] GetByteArrayOfImage(string imageFilePath)
{
    // Open a read-only file stream for the specified file.
    using (FileStream fs = new FileStream(imageFilePath, FileMode.Open, FileAccess.Read))
    {
        BinaryReader binaryReader = new BinaryReader(fs);
        return binaryReader.ReadBytes((int)fs.Length);
    }
}

Sample Code for reading API

static async Task ExtractTextUsingReadAPI(string imageFilePath)
{
    try
    {
        HttpResponseMessage response;
        HttpClient client = new HttpClient();

        // request headers
        client.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", subscriptionKey);
        string url = uriBase;
                        
        // operationLocation stores the URI of the second REST API method,
        // returned by the first REST API method.
        string operationLocation;

        byte[] dataBytes = GetByteArrayOfImage(imageFilePath);               
        using (ByteArrayContent content = new ByteArrayContent(dataBytes))
        {
            content.Headers.ContentType = new MediaTypeHeaderValue("application/octet-stream");

            // first method
            response = await client.PostAsync(url, content);
        }
        
        if (response.IsSuccessStatusCode)
            operationLocation = response.Headers.GetValues("Operation-Location").FirstOrDefault();
        else
        {
            string errorContent = await response.Content.ReadAsStringAsync();
            Console.WriteLine("\n\nResponse:\n{0}\n", JToken.Parse(errorContent).ToString());
            return;
        }
       
        string contentString;
        int i = 0;
        do
        {
            System.Threading.Thread.Sleep(1000);

            // second method
            response = await client.GetAsync(operationLocation);
            contentString = await response.Content.ReadAsStringAsync();
            ++i;
        }
        while (i < 30 && contentString.IndexOf("\"status\":\"succeeded\"") == -1);

        if (i == 30 && contentString.IndexOf("\"status\":\"succeeded\"") == -1)
        {
            Console.WriteLine("\nTimeout happened.\n");
            return;
        }

        // Display output content
        Console.WriteLine("\nResponse:\n\n{0}\n", JToken.Parse(contentString).ToString());
    }
    catch (Exception e)
    {
        Console.WriteLine("\n" + e.Message);
    }
}

static byte[] GetByteArrayOfImage(string imageFilePath)
{
    // Open a read-only file stream for the specified file.
    using (FileStream fileStream =
        new FileStream(imageFilePath, FileMode.Open, FileAccess.Read))
    {
        // Read the file's contents into a byte array.
        BinaryReader binaryReader = new BinaryReader(fileStream);
        return binaryReader.ReadBytes((int)fileStream.Length);
    }
}

Do watch out for the attached video for the demo and code walk-through here.

Happy learning!