Automatically Extract Text And Data With Amazon Textract

Amazon Textract is a machine learning solution to automatically extract text and data from any document.

Recently, Amazon announced the general availability of Amazon Textract which is a fully managed service that makes use of machine learning to automatically extract text and data, including from tables and forms, in virtually any document.
According to the company, the Textract service can extract data without the need for manual review, custom code, or machine learning experience. It goes beyond simple optical character recognition to identify the contents of fields in forms, information stored in tables, and the context of data as well, for example, a name or social security number from a tax form or the product SKU or quantity in a warehouse from an inventory report.
Source: Amazon 
The company said that Textract’s API supports most of the image formats like scans, PDFs, and photos. You can use it with database and analytics services like Amazon Elasticsearch Service and Amazon DynamoDB, and other machine learning services like Amazon Comprehend.
It takes scanned files stored in an Amazon S3 bucket, reads them, and returns data in the form of JSON text annotated with the page number, section, form labels, and data types. You can load the data into business software, such as spreadsheets and payroll systems, or you can analyze and query the data.
Currently, Amazon Textract is available in US East (Ohio), US East (N. Virginia), US West (Oregon) and EU (Ireland).
