Amazon Textract

Learn about Amazon Textract and pricing model

Amazon (AWS) Textract is a machine learning (ML) service that uses OCR to automatically extract text, handwriting, and data from scanned documents such as PDFs. Major Features

Optical Character Recognition (OCR)

Amazon Textract uses Optical Character Recognition (OCR) technology to automatically detect printed text, handwriting, and numbers in a scan or rendering of a document, such as a legal document or a scan of a book. 

OCR Workflow

Form Extraction

Amazon Textract enables you to detect key-value pairs in document images automatically so that you can retain the inherent context of the document without any manual intervention.

Form Extraction Workflow

Table Extraction

Amazon Textract preserves the composition of data stored in tables during extraction. 

Table Extraction Workflow

Use Cases

Amazon Textract is used for OCR, Document Analysis applications. Some of the use cases are

  • Quickly extract information
  • Scan healthcare and insurance forms
  • Accelerate form processing

Getting Started

Tutorials, Documentation, Hands on Training –

Amazon Textract Pricing

Amazon Textract has three different APIs: Detect Document Text API, Analyze Document API, and Analyze Expense API.

Detect Document Text API uses OCR technology to extract text and handwriting from a provided document.

Analyze Document API has two functions, forms and tables, with different pricing levels.

Analyze Expense API extracts data from invoices and receipts.

For example, if you would like to extract text from 5 million pages of Medical transcripts , following is the calculator

Total pages processed = 5,000,000

Price per page = $0.0015 for first 1 million and $0.0006 for pages after 4 million

Total charge per month = $0.0015*1,000,000 + $0.0006 * 4,000,000 = $1,500 + $2400 = $3,900

More information found in –

