• 沒有找到結果。

Amazon Textract

N/A
N/A
Protected

Academic year: 2022

Share "Amazon Textract"

Copied!
292
0
0

加載中.... (立即查看全文)

全文

(1)

Amazon Textract

Developer Guide

(2)

Amazon Textract: Developer Guide

Copyright © Amazon Web Services, Inc. and/or its affiliates. All rights reserved.

Amazon's trademarks and trade dress may not be used in connection with any product or service that is not Amazon's, in any manner that is likely to cause confusion among customers, or in any manner that disparages or discredits Amazon. All other trademarks not owned by Amazon are the property of their respective owners, who may or may not be affiliated with, connected to, or sponsored by Amazon.

(3)

Table of Contents

What is Amazon Textract? ... 1

First-Time Amazon Textract Users ... 2

Using Amazon Textract with an AWS SDK ... 2

How It Works ... 3

Detecting Text ... 3

Analyzing Documents ... 4

Analyzing Invoices and Receipts ... 5

Analyzing Identity Documents ... 7

Input Documents ... 8

Amazon Textract Response Objects ... 9

Text Detection and Document Analysis Response Objects ... 10

Invoice and Receipt Response Objects ... 23

Identity Documentation Response Objects ... 25

Item Location on a Document Page ... 26

Bounding Box ... 27

Polygon ... 29

Getting Started ... 30

Step 1: Set Up an Account ... 30

Sign Up for AWS ... 30

Create an IAM User ... 31

Next Step ... 31

Step 2: Set Up the AWS CLI and AWS SDKs ... 31

Next Step ... 33

Step 3: Get Started Using the AWS CLI and AWS SDK API ... 33

Formatting the AWS CLI Examples ... 33

Processing Documents with Synchronous Operations ... 34

Calling Amazon Textract Synchronous Operations ... 34

Request ... 34

Response ... 36

Detecting Document Text ... 84

Analyzing Document Text ... 92

Analyzing Invoice and Receipt Documents ... 100

Analyzing ID Documents ... 109

Processing Documents with Asynchronous Operations ... 113

Calling Asynchronous Operations ... 113

Starting Text Detection ... 114

Getting the Completion Status of an Amazon Textract Analysis Request ... 115

Getting Amazon Textract Text Detection Results ... 116

Configuring Asynchronous Operations ... 123

Giving Amazon Textract Access to Your Amazon SNS Topic ... 124

Detecting or Analyzing Text in a Multipage Document ... 124

Performing Asynchronous Operations ... 125

Amazon Textract Results Notification ... 142

Handling Throttled Calls and Dropped Connections ... 144

Best Practices for Amazon Textract ... 148

Provide an Optimal Input Document ... 148

Use Confidence Scores ... 148

Consider Using Human Review ... 148

Tutorials ... 149

Prerequisites ... 149

Extracting Key-Value Pairs from a Form Document ... 149

Exporting Tables into a CSV File ... 151

Creating an AWS Lambda Function ... 158

To call the DetectDocumentText operation from a Lambda function: ... 158

(4)

Additional Code Samples ... 160

Code examples ... 162

Actions ... 162

Analyze a document ... 163

Detect text in a document ... 164

Get data about a document analysis job ... 167

Start asynchronous analysis of a document ... 168

Start asynchronous text detection ... 170

Cross-service examples ... 171

Create an Amazon Textract explorer application ... 172

Detect entities in text extracted from an image ... 173

Amazon A2I and Amazon Textract ... 174

Core Concepts of Amazon A2I ... 174

Human Review Activation Conditions ... 174

Human review workflow (flow definition) ... 175

Human loops ... 176

Get Started Using Amazon A2I ... 176

Create a Human Review Workflow ... 177

Analyze the Document ... 180

Monitor Human Loop ... 181

View Output Data and Worker Metrics ... 182

Security ... 185

Data Protection ... 185

Encryption in Amazon Textract ... 186

Internetwork Traffic Privacy ... 186

Identity and Access Management ... 186

Audience ... 187

Authenticating With Identities ... 187

Managing Access Using Policies ... 189

How Amazon Textract Works with IAM ... 191

Identity-Based Policy Examples ... 193

Troubleshooting ... 195

Logging and Monitoring ... 197

Monitoring ... 197

CloudWatch Metrics for Amazon Textract ... 200

Logging Amazon Textract API Calls with AWS CloudTrail ... 201

Amazon Textract Information in CloudTrail ... 201

Understanding Amazon Textract Log File Entries ... 203

Compliance Validation ... 204

Resilience ... 205

Infrastructure Security ... 205

Configuration and Vulnerability Analysis ... 205

VPC endpoints (AWS PrivateLink) ... 205

Considerations for Amazon Textract VPC endpoints ... 206

Creating an interface VPC endpoint for Amazon Textract ... 206

Creating a VPC endpoint policy for Amazon Textract ... 206

API Reference ... 208

Actions ... 208

AnalyzeDocument ... 209

AnalyzeExpense ... 214

AnalyzeID ... 219

DetectDocumentText ... 222

GetDocumentAnalysis ... 226

GetDocumentTextDetection ... 231

GetExpenseAnalysis ... 236

StartDocumentAnalysis ... 242

StartDocumentTextDetection ... 247

(5)

StartExpenseAnalysis ... 251

Data Types ... 255

AnalyzeIDDetections ... 256

Block ... 257

BoundingBox ... 261

Document ... 262

DocumentLocation ... 263

DocumentMetadata ... 264

ExpenseDetection ... 265

ExpenseDocument ... 266

ExpenseField ... 267

ExpenseType ... 268

Geometry ... 269

HumanLoopActivationOutput ... 270

HumanLoopConfig ... 271

HumanLoopDataAttributes ... 272

IdentityDocument ... 273

IdentityDocumentField ... 274

LineItemFields ... 275

LineItemGroup ... 276

NormalizedValue ... 277

NotificationChannel ... 278

OutputConfig ... 279

Point ... 280

Relationship ... 281

S3Object ... 282

Warning ... 283

Limits ... 284

Amazon Textract ... 284

Document History ... 286

AWS glossary ... 287

(6)

What is Amazon Textract?

Amazon Textract makes it easy to add document text detection and analysis to your applications. Using Amazon Textract customers can:

• Detect typed and handwritten text in a variety of documents, including financial reports, medical records, and tax forms.

• Extract text, forms, and tables from documents with structured data, using the Amazon Textract Document Analysis API.

• Process invoices and receipts with the AnalyzeExpense API.

• Process ID documents such as drivers licenses and passports issued by U.S. government, using the AnalyzeID API.

Amazon Textract is based on the same proven, highly scalable, deep-learning technology that was developed by Amazon's computer vision scientists to analyze billions of images and videos daily. You don't need any machine learning expertise to use it. Amazon Textract includes simple, easy-to-use APIs that can analyze image files and PDF files. Amazon Textract is always learning from new data, and Amazon is continually adding new features to the service.

The following are common use cases for using Amazon Textract:

Creating an intelligent search index – Using Amazon Textract you can create libraries of text that is detected in image and PDF files.

Using intelligent text extraction for natural language processing (NLP) – Amazon Textract provides you with control over how text is grouped as an input for NLP applications. It can extract text as words and lines. It also groups text by table cells if Amazon Textract document table analysis is enabled.

Accelerating the capture and normalization of data from different sources – Amazon Textract enables text and tabular data extraction from a wide variety of documents, such as financial

documents, research reports, and medical notes. With Amazon Textract Analyze Document APIs, you can easily and quickly extract unstructured and structured data from your documents.

Automating data capture from forms – Amazon Textract enables structured data to be extracted from forms. With Amazon Textract Analysis APIs, you can build extraction capabilities into existing business workflows so that user data submitted through forms can be extracted into a usable format.

Some of the benefits of using Amazon Textract include:

Integration of document text detection into your apps – Amazon Textract removes the complexity of building text detection capabilities into your applications by making powerful and accurate analysis available with a simple API. You don’t need computer vision or deep learning expertise to use Amazon Textract to detect document text. With Amazon Textract Text APIs, you can easily build text detection into any web, mobile, or connected device application.

Scalable document analysis – Amazon Textract enables you to analyze and extract data quickly from millions of documents, which can accelerate decision making.

Low cost – With Amazon Textract, you only pay for the documents you analyze. There are no minimum fees or upfront commitments. You can get started for free, and save more as you grow with our tiered pricing model.

With synchronous processing, Amazon Textract can analyze single-page documents for applications where latency is critical. Amazon Textract also provides asynchronous operations to extend support to multipage documents.

(7)

First-Time Amazon Textract Users

If this is your first time using Amazon Textract, we recommend that you read the following sections in order:

1.How Amazon Textract Works (p. 3) – This section introduces the Amazon Textract components and how they work together for an end-to-end experience.

2.Getting Started with Amazon Textract (p. 30) – In this section, you set up your account and test the Amazon Textract API.

Using Amazon Textract with an AWS SDK

AWS software development kits (SDKs) are available for many popular programming languages. Each SDK provides an API, code examples, and documentation that make it easier for developers to build applications in their preferred language.

SDK documentation Code examples

AWS SDK for C++ AWS SDK for C++ code examples

AWS SDK for Go AWS SDK for Go code examples

AWS SDK for Java AWS SDK for Java code examples

AWS SDK for JavaScript AWS SDK for JavaScript code examples

AWS SDK for .NET AWS SDK for .NET code examples

AWS SDK for PHP AWS SDK for PHP code examples

AWS SDK for Python (Boto3) AWS SDK for Python (Boto3) code examples

AWS SDK for Ruby AWS SDK for Ruby code examples

Example availability

Can't find what you need? Request a code example by using the Provide feedback link at the bottom of this page.

(8)

How Amazon Textract Works

Amazon Textract enables you to detect and analyze text in single or multipage input documents (see Input Documents (p. 8)).

Amazon Textract provides operations for the following actions.

• Detecting text only. For more information see Detecting Text (p. 3).

• Detecting and analyzing relationships between text. For more information see Analyzing Documents (p. 4).

• Detecting and analyzing text in invoices and receipts. For more information see Analyzing Invoices and Receipts (p. 5).

• Detecting and analyzing text in government identity documents. For more information see Analyzing Identity Documents (p. 7).

Amazon Textract provides synchronous operations for processing small, single-page, documents and with near real-time responses. For more information, see Processing Documents with Synchronous Operations (p. 34). Amazon Textract also provides asynchronous operations that you can use to

process larger, multipage documents. Asynchronous responses aren't in real time. For more information, see Processing Documents with Asynchronous Operations (p. 113).

When an Amazon Textract operation processes a document, the results are returned in an array of the section called “Block” (p. 257) objects or an array of the section called “ExpenseDocument” (p. 266) objects. Both objects contain information that's detected about items, including their location on the document and their relationship to other items on the document. For more information, see Amazon Textract Response Objects (p. 9). For examples that show how to use Block objects, see Tutorials (p. 149).

Topics

• Detecting Text (p. 3)

• Analyzing Documents (p. 4)

• Analyzing Invoices and Receipts (p. 5)

• Analyzing Identity Documents (p. 7)

• Input Documents (p. 8)

• Amazon Textract Response Objects (p. 9)

• Item Location on a Document Page (p. 26)

Detecting Text

Amazon Textract provides synchronous and asynchronous operations that return only the text detected in a document. For both sets of operations, the following information is returned in multiple the section called “Block” (p. 257) objects.

• The lines and words of detected text

• The relationships between the lines and words of detected text

• The page that the detected text appears on

(9)

• The location of the lines and words of text on the document page

For more information, see the section called “Lines and Words of Text” (p. 12).

To detect text synchronously, use the DetectDocumentText (p. 222) API operation, and pass a

document file as input. The entire set of results is returned by the operation. For more information and an example, see Processing Documents with Synchronous Operations (p. 34).

Note

The Amazon Rekognition API operation DetectText is different from DetectDocumentText.

You use DetectText to detect text in live scenes, such as posters or road signs.

To detect text asynchronously, use StartDocumentTextDetection (p. 247) to start processing an input document file. To get the results, call GetDocumentTextDetection (p. 231). The results are returned in one or more responses from GetDocumentTextDetection. For more information and an example, see Processing Documents with Asynchronous Operations (p. 113).

Analyzing Documents

Amazon Textract analyzes documents and forms for relationships among detected text. Amazon Textract analysis operations return 3 categories of document extraction — text, forms, and tables. The analysis of invoices and receipts is handled through a different process, for more information see Analyzing Invoices and Receipts (p. 5).

Text Extraction

The raw text extracted from a document. For more information, see Lines and words of text (p. 12).

Form Extraction

Form data is linked to text items extracted from a document. Amazon Textract represents form data as key-value pairs. In the following example, one of the lines of text detected by Amazon Textract is Name:

Jane Doe. Amazon Textract also identifies a key (Name:) and a value (Jane Doe). For more information, see Form data (Key-value pairs) (p. 14).

Name: Jane Doe

Address: 123 Any Street, Anytown, USA Birth date: 12-26-1980

Key-value pairs are also used to represent check boxes or option buttons (radio buttons) that are extracted from forms.

Male: ☑

For more information, see Selection elements (p. 18).

Table Extraction

Amazon Textract can extract tables, table cells, and the items within table cells and may be programmed to return the results in a JSON, .csv, or a .txt file.

Name Address

Ana Carolina 123 Any Town

(10)

For more information, see Tables (p. 16). Selection elements can also be extracted from tables. For more information, see Selection elements (p. 18).

For analyzed items, Amazon Textract returns the following in multiple the section called

“Block” (p. 257) objects:

• The lines and words of detected text

• The content of detected items

• The relationship between detected items

• The page that the item was detected on

• The location of the item on the document page

You can use synchronous or asynchronous operations to analyze text in a document. To analyze text synchronously, use the AnalyzeDocument (p. 209) operation, and pass a document as input.

AnalyzeDocument returns the entire set of results. For more information, see Analyzing Document Text with Amazon Textract (p. 92).

To detect text asynchronously, use StartDocumentAnalysis (p. 242) to start processing. To get the results, call GetDocumentAnalysis (p. 226). The results are returned in one or more responses from GetDocumentAnalysis. For more information and an example, see Detecting or Analyzing Text in a Multipage Document (p. 124).

To specify which type of analysis to perform, you can use the FeatureTypes list input parameter. Add TABLES to the list to return information about the tables that are detected in the input document—for example, table cells, cell text, and selection elements in cells. Add FORMS to return word relationships, such as key-value pairs and selection elements. To perform both types of analysis, add both TABLES and FORMS to FeatureTypes.

All lines and words that are detected in the document are included in the response (including text not related to the value of FeatureTypes).

Analyzing Invoices and Receipts

Amazon Textract extracts relevant data such as contact information, items purchased, and vendor name, from almost any invoice or receipt without the need for any templates or configuration. Invoices and receipts often use various layouts, making it difficult and time-consuming to manually extract data at scale. Amazon Textract uses ML to understand the context of invoices and receipts and automatically extracts data such as invoice or receipt date, invoice or receipt number, item prices, total amount, and payment terms to suit your business needs.

Amazon Textract also identifies vendor names that are critical for your workflows but may not be explicitly labeled. For example, Amazon Textract can find the vendor name on a receipt even if it's only indicated within a logo at the top of the page without an explicit key-value pair combination.

Amazon Textract also makes it easy for you to consolidate input from diverse receipts and invoices that use different words for the same concept. For example, Amazon Textract maps relationships between field names in different documents such as customer no., customer number, and account ID, outputting standard taxonomy as INVOICE_RECEIPT_ID. In this case, Amazon Textract represents data consistently across different document types. Fields that do not align with the standard taxonomy are categorized as OTHER.

The following is a list of the standard fields that AnalyzeExpense currently supports:

• Vendor Name: VENDOR_NAME

• Total: TOTAL

(11)

• Receiver Address: RECEIVER_ADDRESS

• Invoice/Receipt Date: INVOICE_RECEIPT_DATE

• Invoice/Receipt ID: INVOICE_RECEIPT_ID

• Payment Terms: PAYMENT_TERMS

• Subtotal: SUBTOTAL

• Due Date: DUE_DATE

• Tax: TAX

• Invoice Tax Payer ID (SSN/ITIN or EIN): TAX_PAYER_ID

• Item Name: ITEM_NAME

• Item Price: PRICE

• Item Quantity: QUANTITY

The AnalyzeExpense API returns the following elements for a given document page:

• The number of receipts or invoices within a page represented as ExpenseIndex

• The standardized name for individual fields represented as Type

• The actual name of the field as it appears on the document, represented as LabelDetection

• The value of the corresponding field represented as ValueDetection

• The number of pages within the submitted document represented as Pages

• The page number on which the field, value, or line items was detected, represented as PageNumber

• The geometry, which includes the bounding box and coordinates location of the individual field, value, or line items on the page, represented as Geometry

• The confidence score associated with each piece of data detected on the document, represented as Confidence

• The entire row of individual line items purchased, represented as EXPENSE_ROW

The following is a portion of the API output for a receipt processed by AnalyzeExpense that shows the Total: $55.64 in the document extracted as standard field TOTAL, actual text on the document as “Total”, Confidence Score of “97.1”, Page Number “1”, The total value as “$55.64” and the bounding box and polygon coordinates:

{

"Type": {

"Text": "TOTAL",

"Confidence": 99.94717407226562 },

"LabelDetection": { "Text": "Total:", "Geometry": { "BoundingBox": {

"Width": 0.09809663146734238, "Height": 0.0234375,

"Left": 0.36822840571403503, "Top": 0.8017578125

},

"Polygon": [ {

"X": 0.36822840571403503, "Y": 0.8017578125

}, {

"X": 0.466325044631958,

(12)

"Y": 0.8017578125 },

{

"X": 0.466325044631958, "Y": 0.8251953125 },

{

"X": 0.36822840571403503, "Y": 0.8251953125

} ]

},

"Confidence": 97.10792541503906 },

"ValueDetection": { "Text": "$55.64", "Geometry": { "BoundingBox": {

"Width": 0.10395314544439316, "Height": 0.0244140625, "Left": 0.66837477684021, "Top": 0.802734375

},

"Polygon": [ {

"X": 0.66837477684021, "Y": 0.802734375 },

{

"X": 0.7723279595375061, "Y": 0.802734375

}, {

"X": 0.7723279595375061, "Y": 0.8271484375 },

{

"X": 0.66837477684021, "Y": 0.8271484375 }

] },

"Confidence": 99.85165405273438 },

"PageNumber": 1 }

You can use synchronous operations to analyze an invoice or receipt. To analyze these documents, you use the AnalyzeExpense operation and pass a receipt or invoice to it. AnalyzeExpense returns the entire set of results. For more information, see Analyzing Invoices and Receipts with Amazon Textract (p. 100).

To analyze invoices and receipts asynchronously, use StartExpenseAnalysis (p. 251) to start processing an input document file. To get the results, call GetExpenseAnalysis (p. 236). The results for a given call to StartExpenseAnalysis (p. 251) are returned by GetExpenseAnalysis. For more information and an example, see Processing Documents with Asynchronous Operations (p. 113).

Analyzing Identity Documents

Amazon Textract can extract relevant information from passports, driver licenses, and other identity documentation issued by the US Government using the AnalyzeID API. With Analyze ID, businesses can

(13)

quickly, and accurately extract information from IDs such as U.S. driver licenses, state IDs, and passports that have different template or format. AnalyzeID API returns two categories of data types:

• Key-value pairs available on ID such as Date of Birth, Date of Issue, ID #, Class, and Restrictions.

• Implied fields on the document that may not have explicit keys associated with them such as Name, Address, and Issued By.

Key names are standardized within the response. For example, if your driver license says LIC# (license number) and passport says Passport No, Analyze ID response will return the standardized key as

“Document ID” along with the raw key (e.g. LIC#). This standardization lets customers easily combine information across many IDs that use different terms for the same concept.

Analyze ID returns information in the structures called IdentityDocumentFields. These are JSON structures containing two pieces of information: the normalized Type and the Value associated with the Type. These both also have a confidence score. For more information, see Identity Documentation Response Objects (p. 25).

You can use synchronous operations to analyze a driver's license or passport. To analyze these documents, you use the AnalyzeID operation and pass an identity document to it. AnalyzeID returns the entire set of results. For more information, see Analyzing Identity Documentation with Amazon Textract (p. 109).

Note

Some identity documents, such as driver's licenses, have two sides. You can pass the front and back images of driver licenses as separate images within the same Analyze ID API request.

Input Documents

A suitable input for an Amazon Textract operation is a single or multipage document. Some examples are a legal document, a form, an ID, or a letter. A form is a document with questions or prompts for a user to provide answers. Some examples are a patient registration form, a tax form, or an insurance claim form.

A document can be in JPEG, PNG, PDF or TIFF format. With PDF and TIFF format files, you can process multipage documents. For information about how Amazon Textract represents documents as Block objects, see Text Detection and Document Analysis Response Objects (p. 10).

The following is an acceptable input document example.

(14)

For information about document limits, see Hard Limits in Amazon Textract (p. 284).

For Amazon Textract synchronous operations, you can use input documents that are stored in an Amazon S3 bucket, or you can pass base64-encoded image bytes. For more information, see Calling Amazon Textract Synchronous Operations (p. 34). For asynchronous operations, you need to supply input documents in an Amazon S3 bucket. For more information, see Calling Amazon Textract Asynchronous Operations (p. 113).

Amazon Textract Response Objects

Amazon Textract operations return different types of objects depending on the operations run. For detecting text, and analyzing a generic document, the operation returns a Block object. For analyzing an invoice or receipt, the operation returns an ExpenseDocuments object. For analyzing identity documentation, the operation returns an IdentityDocumentFields object. For more information about these response objects, see the following sections:

Topics

• Text Detection and Document Analysis Response Objects (p. 10)

• Invoice and Receipt Response Objects (p. 23)

• Identity Documentation Response Objects (p. 25)

(15)

Text Detection and Document Analysis Response Objects

When Amazon Textract processes a document, it creates a list of Block (p. 257) objects for the detected or analyzed text. Each block contains information about a detected item, where it's located, and the confidence that Amazon Textract has in the accuracy of the processing.

A document is made up from the following types of Block objects.

• Pages (p. 11)

• Lines and words of text (p. 12)

• Form Data (Key-value pairs) (p. 14)

• Tables and Cells (p. 16)

• Selection elements (p. 18)

The contents of a block depend on the operation you call. If you call one of the text detection operations, the pages, lines, and words of detected text are returned. For more information, see Detecting Text (p. 3). If you call one of the document analysis operations, information about

detected pages, key-value pairs, tables, selection elements, and text is returned. For more information, see Analyzing Documents (p. 4).

Some Block object fields are common to both types of processing. For example, each block has a unique identifier.

For examples that show how to use Block objects, see Tutorials (p. 149).

Document Layout

Amazon Textract returns a representation of a document as a list of different types of Block objects that are linked in a parent-to-child relationship or a key-value pair. Metadata that provides the number of pages in a document is also returned. The following is the JSON for a typical Block object of type PAGE.

{

"Blocks": [ {

"Geometry": { "BoundingBox": { "Width": 1.0, "Top": 0.0, "Left": 0.0, "Height": 1.0 },

"Polygon": [ {

"Y": 0.0, "X": 0.0 },

{

"Y": 0.0, "X": 1.0 },

{

"Y": 1.0, "X": 1.0 },

{

"Y": 1.0, "X": 0.0

(16)

} ] },

"Relationships": [ {

"Type": "CHILD", "Ids": [

"2602b0a6-20e3-4e6e-9e46-3be57fd0844b", "82aedd57-187f-43dd-9eb1-4f312ca30042", "52be1777-53f7-42f6-a7cf-6d09bdc15a30", "7ca7caa6-00ef-4cda-b1aa-5571dfed1a7c"

] } ],

"BlockType": "PAGE",

"Id": "8136b2dc-37c1-4300-a9da-6ed8b276ea97"

}...

],

"DocumentMetadata": { "Pages": 1

} }

A document is made from one or more PAGE blocks. Each page contains a list of child blocks for the primary items detected on the page, such as lines of text and tables. For more information, see Pages (p. 11).

You can determine the type of a Block object by inspecting the BlockType field.

A Block object contains a list of related Block objects in the Relationships field, which is an array of Relationship (p. 281) objects. A Relationships array is either of type CHILD or of type VALUE. An array of type CHILD is used to list the items that are children of the current block. For example, if the current block is of type LINE, Relationships contains a list of IDs for the WORD blocks that make up the line of text. An array of type VALUE is used to contain key-value pairs. You can determine the type of the relationship by inspecting the Type field of the Relationship object.

Child blocks don't have information about their parent Block objects.

For examples that show Block information, see Processing Documents with Synchronous Operations (p. 34).

Confidence

Amazon Textract operations return the percentage confidence that Amazon Textract has in the accuracy of the detected item. To get the confidence, use the Confidence field of the Block object. A higher value indicates a higher confidence. Depending on the scenario, detections with a low confidence might need visual confirmation by a human.

Geometry

Amazon Textract operations, with the exception of identity analysis, return location information about the location of detected items on a document page. To get the location, use the Geometry field of the Block object. For more information, see Item Location on a Document Page (p. 26)

Pages

A document consists of one or more pages. A the section called “Block” (p. 257) object of type PAGE exists for each page of the document. A PAGE block object contains a list of the child IDs for the lines of text, key-value pairs, and tables that are detected on the document page.

(17)

The JSON for a PAGE block looks similar to the following.

{

"Geometry": ....

"Relationships": [ {

"Type": "CHILD", "Ids": [

"2602b0a6-20e3-4e6e-9e46-3be57fd0844b", // Line - Hello, world.

"82aedd57-187f-43dd-9eb1-4f312ca30042", // Line - How are you?

"52be1777-53f7-42f6-a7cf-6d09bdc15a30", "7ca7caa6-00ef-4cda-b1aa-5571dfed1a7c"

] } ],

"BlockType": "PAGE",

"Id": "8136b2dc-37c1-4300-a9da-6ed8b276ea97" // Page identifier },

If you're using asynchronous operations with a multipage document that's in PDF format, you can determine the page that a block is located on by inspecting the Page field of the Block object. A scanned image (an image in JPEG, PNG, PDF, or TIFF format) is considered to be a single-page document, even if there's more than one document page on the image. Asynchronous operations always return a Page value of 1 for scanned images.

The total number of pages is returned in the Pages field of DocumentMetadata. DocumentMetadata is returned with each list of Block objects returned by an Amazon Textract operation.

Lines and Words of Text

Detected text that's returned by Amazon Textract operations is returned in a list of the section called

“Block” (p. 257) objects. These objects represent lines of text or textual words that are detected on a document page. The following text shows two lines of text that are made from multiple words.

This is text.

In two separate lines.

Detected text is returned in the Text field of a Block object. The BlockType field determines if the text is a line of text (LINE) or a word (WORD). A WORD is one or more ISO basic Latin script characters that aren't separated by spaces. A LINE is a string of tab-delimited and contiguous words.

Additionally, Amazon Textract will determine if a piece of text was handwritten or printed using the TextTypes field. These return as HANDWRITING and PRINTED respectively.

The other Block properties are common to all block types, such as the ID, confidence, and geometry information. For more information, see the section called “Text Detection and Document Analysis Response Objects” (p. 10).

To detect only lines and words, you can use DetectDocumentText (p. 222) or

StartDocumentTextDetection (p. 247). For more information, see Detecting Text (p. 3). To get the detected text (lines and words) and information about how it relates to other parts of the document, such as tables, you can use AnalyzeDocument (p. 209) or StartDocumentAnalysis (p. 242). For more information, see Analyzing Documents (p. 4).

PAGE, LINE, and WORD blocks are related to each other in a parent-to-child relationship. A PAGE block is the parent for all LINE block objects on a document page. Because a LINE can have one or more words, the Relationships array for a LINE block stores the IDs for child WORD blocks that make up the line of text.

(18)

The following diagram shows how the line Hello, world. in the text Hello, world. How are you? is represented by Block objects.

The following is the JSON output from DetectDocumentText when the sentence Hello, world. How are you? is detected. The first example is the JSON for the document page. Note how the CHILD IDs enable you to navigate through the document.

{

"Geometry": {...}, "Relationships": [ {

"Type": "CHILD", "Ids": [

"d7fbd604-d609-4d69-857d-247a3f591238", // Line - Hello, world.

"b6c19a93-6493-4d8e-958f-853c8f7ca055" // Line - How are you?

] } ],

"BlockType": "PAGE",

"Id": "56ec1d77-171f-4881-9852-2b5b7e761608"

},

The following is the JSON for the LINE blocks that make up the line "Hello, World":

{

"Relationships": [ {

"Type": "CHILD", "Ids": [

"7f97e2ca-063e-47a8-981c-8beee31afc01", // Word - Hello, "4b990aa0-af96-4369-b90f-dbe02538ed21" // Word - world.

] } ],

"Confidence": 99.63229370117188, "Geometry": {...},

"Text": "Hello, world.", "BlockType": "LINE",

"Id": "d7fbd604-d609-4d69-857d-247a3f591238"

},

The following is the JSON for the WORD block for the word Hello,:

{

"Geometry": {...}, "Text": "Hello,", "TextType": "PRINTED", "BlockType": "WORD",

"Confidence": 99.74746704101562,

"Id": "7f97e2ca-063e-47a8-981c-8beee31afc01"

},

The final JSON is the WORD block for the word world.:

(19)

{ "Geometry": {...}, "Text": "world.", "TextType": "PRINTED", "BlockType": "WORD",

"Confidence": 99.5171127319336,

"Id": "4b990aa0-af96-4369-b90f-dbe02538ed21"

},

Form Data (Key-Value Pairs)

Amazon Textract can extract form data from documents as key-value pairs. For example, in the following text, Amazon Textract can identify a key (Name:) and a value (Ana Carolina).

Name: Ana Carolina

Detected key-value pairs are returned as Block (p. 257) objects in the responses from

AnalyzeDocument (p. 209) and GetDocumentAnalysis (p. 226). You can use the FeatureTypes input parameter to retrieve information about key-value pairs, tables, or both. For key-value pairs only, use the value FORMS. For an example, see Extracting Key-Value Pairs from a Form Document (p. 149). For general information about how a document is represented by Block objects, see Text Detection and Document Analysis Response Objects (p. 10).

Block objects with the type KEY_VALUE_SET are the containers for KEY or VALUE Block objects that store information about linked text items detected in a document. You can use the EntityType attribute to determine if a block is a KEY or a VALUE.

• A KEY object contains information about the key for linked text. For example, Name:. A KEY block has two relationship lists. A relationship of type VALUE is a list that contains the ID of the VALUE block associated with the key. A relationship of type CHILD is a list of IDs for the WORD blocks that make up the text of the key.

• A VALUE object contains information about the text associated with a key. In the preceding example, Ana Carolina is the value for the key Name:. A VALUE block has a relationship with a list of CHILD blocks that identify WORD blocks. Each WORD block contains one of the words that make up the text of the value. A VALUE object can also contain information about selected elements. For more information, see Selection Elements (p. 18).

Each instance of a KEY_VALUE_SET Block object is a child of the PAGE Block object that corresponds to the current page.

The following diagram shows how the key-value pair Name: Ana Carolina is represented by Block objects.

The following examples show how the key-value pair Name: Ana Carolina is represented by JSON.

The PAGE block has CHILD blocks of type KEY_VALUE_SET for each KEY and VALUE block detected in the document.

{

"Geometry": ....

"Relationships": [

(20)

{

"Type": "CHILD", "Ids": [

"2602b0a6-20e3-4e6e-9e46-3be57fd0844b", "82aedd57-187f-43dd-9eb1-4f312ca30042",

"52be1777-53f7-42f6-a7cf-6d09bdc15a30", // Key - Name:

"7ca7caa6-00ef-4cda-b1aa-5571dfed1a7c" // Value - Ana Caroline ]

} ],

"BlockType": "PAGE",

"Id": "8136b2dc-37c1-4300-a9da-6ed8b276ea97" // Page identifier },

The following JSON shows that the KEY block (52be1777-53f7-42f6-a7cf-6d09bdc15a30) has a relationship with the VALUE block (7ca7caa6-00ef-4cda-b1aa-5571dfed1a7c). It also has a CHILD block for the WORD block (c734fca6-c4c4-415c-b6c1-30f7510b72ee) that contains the text for the key (Name:).

{ "Relationships": [ {

"Type": "VALUE", "Ids": [

"7ca7caa6-00ef-4cda-b1aa-5571dfed1a7c" // Value identifier ]

}, {

"Type": "CHILD", "Ids": [

"c734fca6-c4c4-415c-b6c1-30f7510b72ee" // Name:

] } ],

"Confidence": 51.55965805053711, "Geometry": ....,

"BlockType": "KEY_VALUE_SET", "EntityTypes": [

"KEY"

],

"Id": "52be1777-53f7-42f6-a7cf-6d09bdc15a30" //Key identifier },

The following JSON shows that VALUE block 7ca7caa6-00ef-4cda-b1aa-5571dfed1a7c has a CHILD list of IDs for the WORD blocks that make up the text of the value (Ana and Carolina).

{ "Relationships": [ {

"Type": "CHILD", "Ids": [

"db553509-64ef-4ecf-ad3c-bea62cc1cd8a", // Ana "e5d7646c-eaa2-413a-95ad-f4ae19f53ef3" // Carolina ]

} ],

"Confidence": 51.55965805053711, "Geometry": ....,

"BlockType": "KEY_VALUE_SET", "EntityTypes": [

"VALUE"

],

"Id": "7ca7caa6-00ef-4cda-b1aa-5571dfed1a7c" // Value identifier

(21)

}

The following JSON shows the Block objects for the words Name:, Ana, and Carolina.

{ "Geometry": {...}, "Text": "Name:", "TextType": "PRINTED".

"BlockType": "WORD",

"Confidence": 99.56285858154297,

"Id": "c734fca6-c4c4-415c-b6c1-30f7510b72ee"

}, {

"Geometry": {...}, "Text": "Ana", "TextType": "PRINTED", "BlockType": "WORD",

"Confidence": 99.52057647705078,

"Id": "db553509-64ef-4ecf-ad3c-bea62cc1cd8a"

}, {

"Geometry": {...}, "Text": "Carolina", "TextType": "PRINTED", "BlockType": "WORD",

"Confidence": 99.84207916259766,

"Id": "e5d7646c-eaa2-413a-95ad-f4ae19f53ef3"

},

Tables

Amazon Textract can extract tables and the cells in a table. For example, when the following table is detected on a form, Amazon Textract detects a table with four cells.

Name Address

Ana Carolina 123 Any Town

Detected tables are returned as Block (p. 257) objects in the responses from

AnalyzeDocument (p. 209) and GetDocumentAnalysis (p. 226). You can use the FeatureTypes input parameter to retrieve information about key-value pairs, tables, or both. For tables only, use the value TABLES. For an example, see Exporting Tables into a CSV File (p. 151). For general information about how a document is represented by Block objects, see Text Detection and Document Analysis Response Objects (p. 10).

The following diagram shows how a single cell in a table is represented by Block objects.

A cell contains WORD blocks for detected words, and SELECTION_ELEMENT blocks for selection elements such as check boxes.

(22)

The following is partial JSON for the preceding table, which has four cells.

The PAGE Block object has a list of CHILD Block IDs for the TABLE block and each LINE of text that's detected.

{

"Geometry": {...}, "Relationships": [ {

"Type": "CHILD", "Ids": [

"f2a4ad7b-f21d-4966-b548-c859b84f66a4", // Line - Name "4dce3516-ffeb-45e0-92a2-60770e9cb744", // Line - Address "ee506578-768f-4696-8f4b-e4917e429f50", // Line - Ana Carolina "33fc7223-411b-4399-8a90-ccd3c5a2c196", // Line - 123 Any Town "3f9665be-379d-4ae7-be44-d02f32b049c2" // Table

] } ],

"BlockType": "PAGE",

"Id": "78c3ce84-ae70-418e-add7-27058418adf6"

},

The TABLE block includes a list of child IDs for the cells within the table. A TABLE block also includes geometry information for the table location in the document. The following JSON shows that the table has four cells, which are listed in the Ids array.

{ "Geometry": {...}, "Relationships": [ {

"Type": "CHILD", "Ids": [

"505e9581-0d1c-42fb-a214-6ff736822e8c", "6fca44d4-d3d3-46ab-b22f-7fca1fbaaf02", "9778bd78-f3fe-4ae1-9b78-e6d29b89e5e9", "55404b05-ae12-4159-9003-92b7c129532e"

] } ],

"BlockType": "TABLE",

"Confidence": 92.5705337524414,

"Id": "3f9665be-379d-4ae7-be44-d02f32b049c2"

},

The Block type for the table cells is CELL. The Block object for each cell includes information about the cell location compared to other cells in the table. It also includes geometry information for the location of the cell on the document. In the preceding example, 505e9581-0d1c-42fb-a214-6ff736822e8c is the child ID for the cell that contains the word Name. The following example is the information for the cell.

{ "Geometry": {...}, "Relationships": [ {

"Type": "CHILD", "Ids": [

"e9108c8e-0167-4482-989e-8b6cd3c3653e"

] } ],

(23)

"Confidence": 100.0, "RowSpan": 1,

"RowIndex": 1, "ColumnIndex": 1, "ColumnSpan": 1, "BlockType": "CELL",

"Id": "505e9581-0d1c-42fb-a214-6ff736822e8c"

},

Each cell has a location in a table, with the first cell being 1,1. In the preceding example, the cell with the value Name is at row 1, column 1. The cell with the value 123 Any Town is at row 2, column 2. A cell block object contains this information in the RowIndex and ColumnIndex fields. The child list contains the IDs for the WORD Block objects that contain the text that's within the cell. The words in the list are in the order in which they're detected, from the top left of the cell to the bottom right of the cell. In the preceding example, the cell has a child ID with the value e9108c8e-0167-4482-989e-8b6cd3c3653e. The following output is for the WORD Block with the ID value of e9108c8e-0167-4482-989e-8b6cd3c3653e:

"Geometry": {...},

"Text": "Name",

"TextType": "Printed",

"BlockType": "WORD",

"Confidence": 99.81139373779297,

"Id": "e9108c8e-0167-4482-989e-8b6cd3c3653e"

},

Selection Elements

Amazon Textract can detect selection elements such as option buttons (radio buttons) and check boxes on a document page. Selection elements can be detected in form data (p. 14) and in tables (p. 16).

For example, when the following table is detected on a form, Amazon Textract detects the check boxes in the table cells.

Agree Neutral Disagree

Good Service ☑ ☐ ☐

Easy to Use ☐ ☑ ☐

Fair Price ☑ ☐ ☐

Detected selection elements are returned as Block (p. 257) objects in the responses from AnalyzeDocument (p. 209) and GetDocumentAnalysis (p. 226).

Note

You can use the FeatureTypes input parameter to retrieve information about key-value pairs, tables, or both. For example, if you filter on tables, the response includes the selection elements that are detected in tables. Selection elements that are detected in key-value pairs aren't included in the response.

Information about a selection element is contained in a Block object of type SELECTION_ELEMENT.

To determine the status of a selectable element, use the SelectionStatus field of the

SELECTION_ELEMENT block. The status can be either SELECTED or NOT_SELECTED. For example, the value of SelectionStatus for the previous image is SELECTED.

A SELECTION_ELEMENT Block object is associated with either a key-value pair or a table cell. A SELECTION_ELEMENT Block object contains bounding box information for a selection element in the Geometry field. A SELECTION_ELEMENT Block object isn't a child of a PAGE Block object.

(24)

Form Data (Key-Value Pairs)

A key-value pair is used to represent a selection element that's detected on a form. The KEY block contains the text for the selection element. The VALUE block contains the SELECTION_ELEMENT block. The following diagram shows how selection elements are represented by the section called

“Block” (p. 257) objects.

For more information about key-value pairs, see Form Data (Key-Value Pairs) (p. 14).

The following JSON snippet shows the key for a key-value pair that contains a selection element (male

☑). The child ID (Id bd14cfd5-9005-498b-a7f3-45ceb171f0ff) is the ID of the WORD block that contains the text for the selection element (male). The value ID (Id 24aaac7f-fcce-49c7-a4f0-3688b05586d4) is the ID of the VALUE block that contains the SELECTION_ELEMENT block object.

{

"Relationships": [ {

"Type": "VALUE", "Ids": [

"24aaac7f-fcce-49c7-a4f0-3688b05586d4" // Value containing Selection Element

] }, {

"Type": "CHILD", "Ids": [

"bd14cfd5-9005-498b-a7f3-45ceb171f0ff" // WORD - male ]

} ],

"Confidence": 94.15619659423828, "Geometry": {

"BoundingBox": {

"Width": 0.022914813831448555, "Top": 0.08072036504745483, "Left": 0.18966935575008392, "Height": 0.014860388822853565 },

"Polygon": [ {

"Y": 0.08072036504745483, "X": 0.18966935575008392 },

{

"Y": 0.08072036504745483, "X": 0.21258416771888733 },

{

"Y": 0.09558075666427612, "X": 0.21258416771888733 },

{

"Y": 0.09558075666427612, "X": 0.18966935575008392 }

] },

"BlockType": "KEY_VALUE_SET", "EntityTypes": [

"KEY"

],

"Id": "a118dc43-d5f7-49a2-a20a-5f876d9ffd79"

(25)

}

The following JSON snippet is the WORD block for the word Male. The WORD block also has a parent LINE block.

{ "Geometry": {

"BoundingBox": {

"Width": 0.022464623674750328, "Top": 0.07842985540628433, "Left": 0.18863198161125183, "Height": 0.01617223583161831 },

"Polygon": [ {

"Y": 0.07842985540628433, "X": 0.18863198161125183 },

{

"Y": 0.07842985540628433, "X": 0.2110965996980667 },

{

"Y": 0.09460209310054779, "X": 0.2110965996980667 },

{

"Y": 0.09460209310054779, "X": 0.18863198161125183 }

] },

"Text": "Male", "BlockType": "WORD",

"Confidence": 54.06439208984375,

"Id": "bd14cfd5-9005-498b-a7f3-45ceb171f0ff"

},

The VALUE block has a child (Id f2f5e8cd-e73a-4e99-a095-053acd3b6bfb) that is the SELECTION_ELEMENT block.

{ "Relationships": [ {

"Type": "CHILD", "Ids": [

"f2f5e8cd-e73a-4e99-a095-053acd3b6bfb" // Selection element ]

} ],

"Confidence": 94.15619659423828, "Geometry": {

"BoundingBox": {

"Width": 0.017281491309404373, "Top": 0.07643391191959381, "Left": 0.2271782010793686, "Height": 0.026274094358086586 },

"Polygon": [ {

"Y": 0.07643391191959381, "X": 0.2271782010793686 },

(26)

{

"Y": 0.07643391191959381, "X": 0.24445968866348267 },

{

"Y": 0.10270800441503525, "X": 0.24445968866348267 },

{

"Y": 0.10270800441503525, "X": 0.2271782010793686 }

] },

"BlockType": "KEY_VALUE_SET", "EntityTypes": [

"VALUE"

],

"Id": "24aaac7f-fcce-49c7-a4f0-3688b05586d4"

}, }

The following JSON is the SELECTION_ELEMENT block. The value of SelectionStatus indicates that the check box is selected.

{

"Geometry": {

"BoundingBox": {

"Width": 0.020316146314144135, "Top": 0.07575977593660355, "Left": 0.22590067982673645, "Height": 0.027631107717752457 },

"Polygon": [ {

"Y": 0.07575977593660355, "X": 0.22590067982673645 },

{

"Y": 0.07575977593660355, "X": 0.2462168186903 },

{

"Y": 0.1033908873796463, "X": 0.2462168186903 },

{

"Y": 0.1033908873796463, "X": 0.22590067982673645 }

] },

"BlockType": "SELECTION_ELEMENT", "SelectionStatus": "SELECTED", "Confidence": 74.14942932128906,

"Id": "f2f5e8cd-e73a-4e99-a095-053acd3b6bfb"

}

Table Cells

Amazon Textract can detect selection elements inside a table cell. For example, the cells in the following table have check boxes.

(27)

Agree Neutral Disagree

Good Service ☑ ☐ ☐

Easy to Use ☐ ☑ ☐

Fair Price ☑ ☐ ☐

A CELL block can contain child SELECTION_ELEMENT objects for selection elements, as well as child WORD blocks for detected text.

For more information about tables, see Tables (p. 16).

The TABLE Block object for the previous table looks similar to this.

{ "Geometry": {...}, "Relationships": [ {

"Type": "CHILD", "Ids": [

"652c09eb-8945-473d-b1be-fa03ac055928", "37efc5cc-946d-42cd-aa04-e68e5ed4741d", "4a44940a-435a-4c5c-8a6a-7fea341fa295", "2de20014-9a3b-4e26-b453-0de755144b1a", "8ed78aeb-5c9a-4980-b669-9e08b28671d2", "1f8e1c68-2c97-47b2-847c-a19619c02ca9", "9927e1d1-6018-4960-ac17-aadb0a94f4d9", "68f0ed8b-a887-42a5-b618-f68b494a6034", "fcba16e0-6bd7-4ea5-b86e-36e8330b68ea", "2250357c-ae34-4ed9-86da-45dac5a5e903",

"c63ad40d-5a14-4646-a8df-2d4304213dbc", // Cell "2b8417dc-e65f-4fcd-aa0f-61a23f1e8cb0",

"26c62932-72f0-4dc2-9893-1ae27829c060", "27f291cc-abf4-4c23-aa24-676abe99cb1e", "7e5ce028-1bcd-4d9f-ad42-15ac181c5b47", "bf32e3d2-efa2-4fc1-b09b-ab9cc52ff734"

] } ],

"BlockType": "TABLE",

"Confidence": 99.99993896484375,

"Id": "f66eac36-2e74-406e-8032-14d1c14e0b86"

}

The CELL BLOCK object (Id c63ad40d-5a14-4646-a8df-2d4304213dbc) for the cell that contains the check box Good Service looks like the following. It includes a child Block (Id = 26d122fd- c5f4-4b53-92c4-0ae92730ee1e) that is the SELECTION_ELEMENT Block object for the check box.

{

"Geometry": {...}, "Relationships": [ {

"Type": "CHILD", "Ids": [

"26d122fd-c5f4-4b53-92c4-0ae92730ee1e" // Selection Element ]

} ],

"Confidence": 79.741689682006836,

(28)

"RowSpan": 1, "RowIndex": 3, "ColumnIndex": 3, "ColumnSpan": 1, "BlockType": "CELL",

"Id": "c63ad40d-5a14-4646-a8df-2d4304213dbc"

}

The SELECTION_ELEMENT Block object for the check box is as follows. The value of SelectionStatus indicates that the check box is selected.

{

"Geometry": {...},

"BlockType": "SELECTION_ELEMENT", "SelectionStatus": "SELECTED", "Confidence": 88.79517364501953,

"Id": "26d122fd-c5f4-4b53-92c4-0ae92730ee1e"

}

Invoice and Receipt Response Objects

When you submit an invoice or a receipt to the AnalyzeExpense API, it returns a series of

ExpenseDocuments objects. Each ExpenseDocument is further separated into LineItemGroups and SummaryFields. Most invoices and receipts contain information such as the vendor name, receipt number, receipt date, or total amount. AnalyzeExpense returns this information under SummaryFields.

Receipts and invoices also contain details about the items purchased. The AnalyzeExpense API returns this information under LineItemGroups. The ExpenseIndex field uniquely identifies the expense, and associates the appropriate SummaryFields and LineItemGroups detected in that expense.

The most granular level of data in the AnalyzeExpense response consists of Type, ValueDetection, and LabelDetection (Optional). The individual entities are:

• Type (p. 24): Refers to what kind of information is detected on a high level.

• LabelDetection (p. 24): Refers to the label of an associated value within the text of the document.

LabelDetection is optional and only returned if the label is written.

• ValueDetection (p. 24): Refers to the value of the label or type returned.

The AnalyzeExpense API also detects ITEM, QUANTITY, and PRICE within line items as normalized fields.

If there is other text in a line item on the receipt image such as SKU or detailed description, it will be included in the JSON as EXPENSE_ROW as shown in the below example:

{

"Type": {

"Text": "EXPENSE_ROW",

"Confidence": 99.95216369628906 },

"ValueDetection": {

"Text": "Banana 5 $2.5", "Geometry": {

… },

"Confidence": 98.11214447021484 }

The example above shows how the AnalyzeExpense API returns the entire row on a receipt that contains line item information about 5 bananas sold for $2.5.

(29)

Type

Following is an example of the standard or normalized type of the key-value pair:

{

"PageNumber": 1, "Type": {

"Text": "VENDOR_NAME", "Confidence": 70.0 },

"ValueDetection": { "Geometry": { ... }, "Text": "AMAZON",

"Confidence": 87.89806365966797 }

}

The receipt did not have “Vendor Name” explicitly listed. However, the Analyze Expense API recognized the document as a receipt and categorized the value “AMAZON” as Type VENDOR_NAME.

LabelDetection

Following is an example of text as it is shown on a customer document page:

{

"PageNumber": 1, "Type": {

"Text": "OTHER", "Confidence": 70.0 },

"LabelDetection": { "Geometry": { ... }, "Text": "CASHIER",

"Confidence": 88.19171142578125 },

"ValueDetection": { "Geometry": { ... }, "Text": "Mina",

"Confidence": 87.89806365966797 }

}

The example document contained “CASHIER Mina”. The Analyze Expense API extracted the as-is value and returns it under LabelDetection. For implied values such as “Vendor Name”, where the “key”

is not explicitly shown in the receipt, LabelDetection will not be included in the AnalyzeExpense element. In such cases, the AnalyzeExpense API does not return LabelDetection.

ValueDetection

The following is an example shows the “value” of the key-value pair.

{

"PageNumber": 1, "Type": {

"Text": "OTHER",

(30)

"Confidence": 70.0 },

"LabelDetection": { "Geometry": { ... }, "Text": "CASHIER",

"Confidence": 88.19171142578125 },

"ValueDetection": { "Geometry": { ... }, "Text": "Mina",

"Confidence": 87.89806365966797 }

}

In the example, the document contained “CASHIER Mina”. The AnalyzeExpense API detected the Cashier value as Mina and returned it under ValueDetection.

Identity Documentation Response Objects

When you submit an identity document to the AnalyzeID API, it returns a series of

IdentityDocumentField objects. Each of these objects contains Type, and Value. Type records the normalized field that Amazon Textract detects, and Value records the text associated with the normalized field.

Below is an example of an IdentityDocumentField, shortened for brevity.

{ "DocumentMetadata": { "Pages": 1

},

"IdentityDocumentFields": [ {

"Type": {

"Text": "first name"

},

"ValueDetection": { "Text": "jennifer",

"Confidence": 99.99908447265625 }

}, {

"Type": {

"Text": "last name"

},

"ValueDetection": { "Text": "sample",

"Confidence": 99.99758911132812 }

},

These are two examples of IdentityDocumentFields cut from a longer response. There is a seperation between the type detected and the the value for that type. Here, it is the first and last name respectively.

This structure repeats with all contained information. If a type is not recognized as a normalized field, it will be listed as "other".

Following is a list of normalized fields for Driver's Licenses:

• first name

(31)

• last name

• middle name

• suffix

• city in address

• zip code in address

• state in address

• county

• document number

• expiration date

• date of birth

• state name

• date of issue

• class

• restrictions

• endorsements

• id type

• veteran

• address

Following is a list of normalized fields for U.S Passports:

• first name

• last name

• middle name

• document number

• expiration date

• date of birth

• place of birth

• date of issue

• id type

Item Location on a Document Page

Amazon Textract operations return the location and geometry of items found on a document page.

DetectDocumentText (p. 222) and GetDocumentTextDetection (p. 231) return the location and geometry for lines and words, while AnalyzeDocument (p. 209) and GetDocumentAnalysis (p. 226) return the location and geometry of key-value pairs, tables, cells, and selection elements.

To determine where an item is on a document page, use the bounding box (Geometry (p. 269)) information returned by the Amazon Textract operation in a Block (p. 257) object. The Geometry object contains two types of location and geometric information for detected items:

• An axis-aligned BoundingBox (p. 261) object that contains the top-left coordinate and the width and height of the item.

• A polygon object that describes the outline of the item, specified as an array of Point (p. 280) objects that contain X (horizontal axis) and Y (vertical axis) document page coordinates of each point.

(32)

The JSON for a Block object looks similar to the following. Note the BoundingBox and Polygon fields.

{ "Geometry": {

"BoundingBox": {

"Width": 0.053907789289951324, "Top": 0.08913730084896088, "Left": 0.11085548996925354, "Height": 0.013171200640499592 },

"Polygon": [ {

"Y": 0.08985357731580734, "X": 0.11085548996925354 },

{

"Y": 0.08913730084896088, "X": 0.16447919607162476 },

{

"Y": 0.10159222036600113, "X": 0.16476328670978546 },

{

"Y": 0.10230850428342819, "X": 0.11113958805799484 }

] },

"Text": "Name:", "TextType": "PRINTED", "BlockType": "WORD",

"Confidence": 99.56285858154297,

"Id": "c734fca6-c4c4-415c-b6c1-30f7510b72ee"

},

You can use geometry information to draw bounding boxes around detected items. For an example that uses BoundingBox and Polygon information to draw boxes around lines and vertical lines at the start and end of each word, see Detecting Document Text with Amazon Textract (p. 84). The example output is similar to the following.

Bounding Box

A bounding box (BoundingBox) has the following properties:

• Height – The height of the bounding box as a ratio of the overall document page height.

• Left – The X coordinate of the top-left point of the bounding box as a ratio of the overall document page width.

• Top – The Y coordinate of the top-left point of the bounding box as a ratio of the overall document page height.

• Width – The width of the bounding box as a ratio of the overall document page width.

Each BoundingBox property has a value between 0 and 1. The value is a ratio of the overall image width (applies to Left and Width) or height (applies to Height and Top). For example, if the input image is 700 x 200 pixels, and the top-left coordinate of the bounding box is (350,50) pixels, the API returns a Left value of 0.5 (350/700) and a Top value of 0.25 (50/200).

The following diagram shows the range of a document page that each BoundingBox property covers.

(33)

To display the bounding box with the correct location and size, you have to multiply the BoundingBox values by the document page width or height (depending on the value you want) to get the pixel values.

You use the pixel values to display the bounding box. An example is using a document page of 608 pixels width x 588 pixels height, and the following bounding box values for analyzed text:

BoundingBox.Left: 0.3922065 BoundingBox.Top: 0.15567766 BoundingBox.Width: 0.284666 BoundingBox.Height: 0.2930403

The location of the text bounding box in pixels is calculated as follows:

Left coordinate = BoundingBox.Left (0.3922065) * document page width (608) = 238

Top coordinate = BoundingBox.Top (0.15567766) * document page height (588) = 91 Bounding box width = BoundingBox.Width (0.284666) * document page width (608) = 173

Bounding box height = BoundingBox.Height (0.2930403) * document page height (588) = 172

You use these values to display a bounding box around the analyzed text. The following Java and Python examples demonstrate how to display a bounding box.

Java

public void ShowBoundingBox(int imageHeight, int imageWidth, BoundingBox box, Graphics2D g2d) {

float left = imageWidth * box.getLeft();

float top = imageHeight * box.getTop();

// Display bounding box.

g2d.setColor(new Color(0, 212, 0));

g2d.drawRect(Math.round(left / scale), Math.round(top / scale), Math.round((imageWidth * box.getWidth()) / scale), Math.round((imageHeight * box.getHeight())) / scale);

}

Python

This Python example takes in the response returned by the DetectDocumentText (p. 222) API operation.

def process_text_detection(response):

# Get the text blocks blocks = response['Blocks']

width, height = image.size draw = ImageDraw.Draw(image) print('Detected Document Text')

# Create image showing bounding box/polygon the detected lines/text for block in blocks:

(34)

draw = ImageDraw.Draw(image) if block['BlockType'] == "LINE":

box=block['Geometry']['BoundingBox']

left = width * box['Left']

top = height * box['Top']

draw.rectangle([left,top, left + (width * box['Width']), top +(height * box['Height'])],outline='black')

# Display the image image.show() return len(blocks)

Polygon

The polygon returned by AnalyzeDocument is an array of Point (p. 280) objects. Each Point has an X and Y coordinate for a specific location on the document page. Like the BoundingBox coordinates, the polygon coordinates are normalized to the document width and height, and are between 0 and 1.

You can use points in the polygon array to display a finer-grain bounding box around a Block object.

You calculate the position of each polygon point on the document page by using the same technique used for BoundingBoxes. Multiply the X coordinate by the document page width, and multiply the Y coordinate by the document page height.

The following example shows how to display the vertical lines of a polygon.

public void ShowPolygonVerticals(int imageHeight, int imageWidth, List <Point> points, Graphics2D g2d) {

g2d.setColor(new Color(0, 212, 0));

Object[] parry = points.toArray();

g2d.setStroke(new BasicStroke(2));

g2d.drawLine(Math.round(((Point) parry[0]).getX() * imageWidth),

Math.round(((Point) parry[0]).getY() * imageHeight), Math.round(((Point) parry[3]).getX() * imageWidth),

Math.round(((Point) parry[3]).getY() * imageHeight));

g2d.setColor(new Color(255, 0, 0));

g2d.drawLine(Math.round(((Point) parry[1]).getX() * imageWidth),

Math.round(((Point) parry[1]).getY() * imageHeight), Math.round(((Point) parry[2]).getX() * imageWidth),

Math.round(((Point) parry[2]).getY() * imageHeight));

}

參考文獻

相關文件

Teachers may encourage students to approach the poem as an unseen text to practise the steps of analysis and annotation, instead of relying on secondary

• X-ray variability correlates with mass transfer rate: H/He disk stability model predicts various states. • X-ray bursts (active and

Let us suppose that the source information is in the form of strings of length k, over the input alphabet I of size r and that the r-ary block code C consist of codewords of

In the related workshop, students will analyse how and why objects in the M+ Collections were made and create their own versions by redesigning everyday objects. Students

(It is also acceptable to have either just an image region or just a text region.) The layout and ordering of the slides is specified in a language called SMIL.. SMIL is covered in

Relevant topics include, but are not limited to: Document Representation and Content Analysis (e.g., text representation, document structure, linguistic analysis, non-English

plotyy 2-D line plots with y-axes on both left and right side loglog Log-log scale plot.. semilogx Semilogarithmic plot semilogy Semilogarithmic plot errorbar Plot error bars

An OFDM signal offers an advantage in a channel that has a frequency selective fading response.. As we can see, when we lay an OFDM signal spectrum against the