Microsoft AI School - Extracting Information from Images and Documents

Microsoft AI school features a perfect learning path that will teach you how to implement text extraction solutions from images and documents using Computer Vision and recognizer services and tools. Follow the given link to dive into this learning path - "Extract text from images and documents."

Before diving into the course, there are few prerequisites that you must fulfill:

  • You must have an active Azure account.
  • You should possess the ability to navigate the Azure portal.
  • Finally, you should have minimal programming knowledge of either Python or C#.

This learning path has two modules. Read ahead to get a quick peek at each module of this learning path.

Module 1: Read Text in Images and Documents with The Computer Vision Service

Organizations around the world have to deal with thousands of images with embedded texts and have to extract these texts and store them in a database. These scanned images can have texts in numerous formats and multiple languages. Thus, without the help of AI services, it would be an extremely challenging task to get the work done. Hence, with the help of AI services like Microsoft Azure's Computer Vision, such complex scenarios can be solved easily with the help of predefined algorithms to process and extract information from images.

In the first module of this learning path, you will learn how to:

  • Use Azure Computer Vision service with SDKs and Rest API.
  • Identify how the computer vision service is used to read texts from images with the help of OCR API and Read API.
  • Develop an application capable of reading handwritten and printed texts.

Computer Vision is an AI service in Azure that analyses image and video content. It offers two API's to read texts from images:

  1. The OCR API - reads small to medium volumes of texts in multiple languages from images.  
  2. The Read API - reads small to large volumes of texts in multiple languages from images and PDF documents. It has higher accuracy than the OCR API.

Thus, in the first few units of this module, you will learn how to access both the technologies via the REST API client library and return a JSON response. Then you will implement the learned concepts in a hands-on exercise. 

Here's an overview of the units covered in this module:

Module 2: Extract Data from Forms with Form Recognizer

Every industry utilizes forms to communicate valuable information. Imagine manually identifying and extracting huge volumes of information from thousands of forms and then re-entering the necessary information elsewhere. This would be an extremely hectic task without the help of AI. This is where the Azure Form Recognizer service comes into the picture. 

Form Recognizer is a cognitive service in Azure that uses Rest APIs and client library SDKs to extract data at scale with high levels of accuracy from forms. It uses Optical Character Recognition capabilities and deep learning models to extract key-value pairs and table data from documents. It is composed of the following services:

  • Layout Service: accepts JPEG, PNG, PDF, and TIFF files as inputs and returns a JSON file with text location in bounding boxes, tables, text content, selection marks, and document structure.
  • Prebuilt Models: detects and extracts information from document images and returns it as a structured JSON output. Currently, Form Recognizer supports the following pre-built models: 
    • Receipts
    • Business Cards
    • Invoices
  • Custom Models: extracts data from forms that are specific to your business needs. Custom Models can be trained by calling the Train Custom Model API using:
    • Unsupervised Learning 
    • Supervised Learning

The Form Recognizer service has the following uses:

  • Process automation
  • Knowledge mining
  • Industry-specific applications

Note: To integrate From Recognizer services in your applications or workflow, you can use REST API or client library SDKs to access it. A user interface known as the Form OCR Test Tool (FOTT) that can perform layout extraction and model training also supports Form Recognizer services.

Thus, the objective of this module is to teach you how to:

  • Automate processes with FormRecognizer's Layout service, prebuilt models, and custom service. 
  • Use the OCR capabilities of Form Recognizer with SDKs, REST API, and Form OCR Test Tool (FOTT).
  • Use supervised and unsupervised training to develop and test custom models.

Here's an overview of the units covered in this module:


If you or your organization has huge amounts of data that needs to be extracted from images and other documents, then using Microsoft Azure's Computer Vision capabilities will serve as the perfect solution to automate your task. Thus, to master the concepts of data extraction from images using computer vision in Azure please follow the given link:

Happy AI coding!