Learn about Amazon Textract and pricing model
Amazon (AWS) Textract is a machine learning (ML) service that uses OCR to automatically extract text, handwriting, and data from scanned documents such as PDFs. Major Features
Optical Character Recognition (OCR)
Amazon Textract uses Optical Character Recognition (OCR) technology to automatically detect printed text, handwriting, and numbers in a scan or rendering of a document, such as a legal document or a scan of a book.
Amazon Textract enables you to detect key-value pairs in document images automatically so that you can retain the inherent context of the document without any manual intervention.
Amazon Textract preserves the composition of data stored in tables during extraction.
Amazon Textract is used for OCR, Document Analysis applications. Some of the use cases are
- Quickly extract information
- Scan healthcare and insurance forms
- Accelerate form processing
Tutorials, Documentation, Hands on Training – https://aws.amazon.com/textract/resources/
Amazon Textract Pricing
Amazon Textract has three different APIs: Detect Document Text API, Analyze Document API, and Analyze Expense API.
Detect Document Text API uses OCR technology to extract text and handwriting from a provided document.
Analyze Document API has two functions, forms and tables, with different pricing levels.
Analyze Expense API extracts data from invoices and receipts.
For example, if you would like to extract text from 5 million pages of Medical transcripts , following is the calculator
Total pages processed = 5,000,000
Price per page = $0.0015 for first 1 million and $0.0006 for pages after 4 million
Total charge per month = $0.0015*1,000,000 + $0.0006 * 4,000,000 = $1,500 + $2400 = $3,900
More information found in – https://aws.amazon.com/textract/pricing/