Amazon Textract

(1)

Amazon Textract

Developer Guide

(2)

Amazon Textract: Developer Guide

Copyright © Amazon Web Services, Inc. and/or its aﬃliates. All rights reserved.

Amazon's trademarks and trade dress may not be used in connection with any product or service that is not Amazon's, in any manner that is likely to cause confusion among customers, or in any manner that disparages or discredits Amazon. All other trademarks not owned by Amazon are the property of their respective owners, who may or may not be aﬃliated with, connected to, or sponsored by Amazon.

(3)

What is Amazon Textract?

Amazon Textract makes it easy to add document text detection and analysis to your applications. Using Amazon Textract customers can:

• Detect typed and handwritten text in a variety of documents, including ﬁnancial reports, medical records, and tax forms.

• Extract text, forms, and tables from documents with structured data, using the Amazon Textract Document Analysis API.

• Process invoices and receipts with the AnalyzeExpense API.

• Process ID documents such as drivers licenses and passports issued by U.S. government, using the AnalyzeID API.

Amazon Textract is based on the same proven, highly scalable, deep-learning technology that was developed by Amazon's computer vision scientists to analyze billions of images and videos daily. You don't need any machine learning expertise to use it. Amazon Textract includes simple, easy-to-use APIs that can analyze image ﬁles and PDF ﬁles. Amazon Textract is always learning from new data, and Amazon is continually adding new features to the service.

The following are common use cases for using Amazon Textract:

• Creating an intelligent search index – Using Amazon Textract you can create libraries of text that is detected in image and PDF ﬁles.

• Using intelligent text extraction for natural language processing (NLP) – Amazon Textract provides you with control over how text is grouped as an input for NLP applications. It can extract text as words and lines. It also groups text by table cells if Amazon Textract document table analysis is enabled.

• Accelerating the capture and normalization of data from diﬀerent sources – Amazon Textract enables text and tabular data extraction from a wide variety of documents, such as ﬁnancial

documents, research reports, and medical notes. With Amazon Textract Analyze Document APIs, you can easily and quickly extract unstructured and structured data from your documents.

• Automating data capture from forms – Amazon Textract enables structured data to be extracted from forms. With Amazon Textract Analysis APIs, you can build extraction capabilities into existing business workﬂows so that user data submitted through forms can be extracted into a usable format.

Some of the beneﬁts of using Amazon Textract include:

• Integration of document text detection into your apps – Amazon Textract removes the complexity of building text detection capabilities into your applications by making powerful and accurate analysis available with a simple API. You don’t need computer vision or deep learning expertise to use Amazon Textract to detect document text. With Amazon Textract Text APIs, you can easily build text detection into any web, mobile, or connected device application.

• Scalable document analysis – Amazon Textract enables you to analyze and extract data quickly from millions of documents, which can accelerate decision making.

• Low cost – With Amazon Textract, you only pay for the documents you analyze. There are no minimum fees or upfront commitments. You can get started for free, and save more as you grow with our tiered pricing model.

With synchronous processing, Amazon Textract can analyze single-page documents for applications where latency is critical. Amazon Textract also provides asynchronous operations to extend support to multipage documents.

(7)

First-Time Amazon Textract Users

If this is your ﬁrst time using Amazon Textract, we recommend that you read the following sections in order:

1.How Amazon Textract Works (p. 3) – This section introduces the Amazon Textract components and how they work together for an end-to-end experience.

2.Getting Started with Amazon Textract (p. 30) – In this section, you set up your account and test the Amazon Textract API.

Using Amazon Textract with an AWS SDK

AWS software development kits (SDKs) are available for many popular programming languages. Each SDK provides an API, code examples, and documentation that make it easier for developers to build applications in their preferred language.

SDK documentation Code examples

AWS SDK for C++ AWS SDK for C++ code examples

AWS SDK for Go AWS SDK for Go code examples

AWS SDK for Java AWS SDK for Java code examples

AWS SDK for JavaScript AWS SDK for JavaScript code examples

AWS SDK for .NET AWS SDK for .NET code examples

AWS SDK for PHP AWS SDK for PHP code examples

AWS SDK for Python (Boto3) AWS SDK for Python (Boto3) code examples

AWS SDK for Ruby AWS SDK for Ruby code examples

Example availability

Can't ﬁnd what you need? Request a code example by using the Provide feedback link at the bottom of this page.

(8)

How Amazon Textract Works

Amazon Textract enables you to detect and analyze text in single or multipage input documents (see Input Documents (p. 8)).

Amazon Textract provides operations for the following actions.

• Detecting text only. For more information see Detecting Text (p. 3).

• Detecting and analyzing relationships between text. For more information see Analyzing Documents (p. 4).

• Detecting and analyzing text in invoices and receipts. For more information see Analyzing Invoices and Receipts (p. 5).

• Detecting and analyzing text in government identity documents. For more information see Analyzing Identity Documents (p. 7).

Amazon Textract provides synchronous operations for processing small, single-page, documents and with near real-time responses. For more information, see Processing Documents with Synchronous Operations (p. 34). Amazon Textract also provides asynchronous operations that you can use to

process larger, multipage documents. Asynchronous responses aren't in real time. For more information, see Processing Documents with Asynchronous Operations (p. 113).

When an Amazon Textract operation processes a document, the results are returned in an array of the section called “Block” (p. 257) objects or an array of the section called “ExpenseDocument” (p. 266) objects. Both objects contain information that's detected about items, including their location on the document and their relationship to other items on the document. For more information, see Amazon Textract Response Objects (p. 9). For examples that show how to use Block objects, see Tutorials (p. 149).

Topics

• Detecting Text (p. 3)

• Analyzing Documents (p. 4)

• Analyzing Invoices and Receipts (p. 5)

• Analyzing Identity Documents (p. 7)

• Input Documents (p. 8)

• Amazon Textract Response Objects (p. 9)

• Item Location on a Document Page (p. 26)

Detecting Text

Amazon Textract provides synchronous and asynchronous operations that return only the text detected in a document. For both sets of operations, the following information is returned in multiple the section called “Block” (p. 257) objects.

• The lines and words of detected text

• The relationships between the lines and words of detected text

• The page that the detected text appears on

(9)

• The location of the lines and words of text on the document page

For more information, see the section called “Lines and Words of Text” (p. 12).

To detect text synchronously, use the DetectDocumentText (p. 222) API operation, and pass a

document ﬁle as input. The entire set of results is returned by the operation. For more information and an example, see Processing Documents with Synchronous Operations (p. 34).

Note

The Amazon Rekognition API operation DetectText is diﬀerent from DetectDocumentText.

You use DetectText to detect text in live scenes, such as posters or road signs.

To detect text asynchronously, use StartDocumentTextDetection (p. 247) to start processing an input document ﬁle. To get the results, call GetDocumentTextDetection (p. 231). The results are returned in one or more responses from GetDocumentTextDetection. For more information and an example, see Processing Documents with Asynchronous Operations (p. 113).

Analyzing Documents

Amazon Textract analyzes documents and forms for relationships among detected text. Amazon Textract analysis operations return 3 categories of document extraction — text, forms, and tables. The analysis of invoices and receipts is handled through a diﬀerent process, for more information see Analyzing Invoices and Receipts (p. 5).

Text Extraction

The raw text extracted from a document. For more information, see Lines and words of text (p. 12).

Form Extraction

Form data is linked to text items extracted from a document. Amazon Textract represents form data as key-value pairs. In the following example, one of the lines of text detected by Amazon Textract is Name:

Jane Doe. Amazon Textract also identiﬁes a key (Name:) and a value (Jane Doe). For more information, see Form data (Key-value pairs) (p. 14).

Name: Jane Doe

Address: 123 Any Street, Anytown, USA Birth date: 12-26-1980

Key-value pairs are also used to represent check boxes or option buttons (radio buttons) that are extracted from forms.

Male: ☑

For more information, see Selection elements (p. 18).

Table Extraction

Amazon Textract can extract tables, table cells, and the items within table cells and may be programmed to return the results in a JSON, .csv, or a .txt ﬁle.

Name Address

Ana Carolina 123 Any Town

(10)

For more information, see Tables (p. 16). Selection elements can also be extracted from tables. For more information, see Selection elements (p. 18).

For analyzed items, Amazon Textract returns the following in multiple the section called

“Block” (p. 257) objects:

• The lines and words of detected text

• The content of detected items

• The relationship between detected items

• The page that the item was detected on

• The location of the item on the document page

You can use synchronous or asynchronous operations to analyze text in a document. To analyze text synchronously, use the AnalyzeDocument (p. 209) operation, and pass a document as input.

AnalyzeDocument returns the entire set of results. For more information, see Analyzing Document Text with Amazon Textract (p. 92).

To detect text asynchronously, use StartDocumentAnalysis (p. 242) to start processing. To get the results, call GetDocumentAnalysis (p. 226). The results are returned in one or more responses from GetDocumentAnalysis. For more information and an example, see Detecting or Analyzing Text in a Multipage Document (p. 124).

To specify which type of analysis to perform, you can use the FeatureTypes list input parameter. Add TABLES to the list to return information about the tables that are detected in the input document—for example, table cells, cell text, and selection elements in cells. Add FORMS to return word relationships, such as key-value pairs and selection elements. To perform both types of analysis, add both TABLES and FORMS to FeatureTypes.

All lines and words that are detected in the document are included in the response (including text not related to the value of FeatureTypes).

Analyzing Invoices and Receipts

Amazon Textract extracts relevant data such as contact information, items purchased, and vendor name, from almost any invoice or receipt without the need for any templates or conﬁguration. Invoices and receipts often use various layouts, making it diﬃcult and time-consuming to manually extract data at scale. Amazon Textract uses ML to understand the context of invoices and receipts and automatically extracts data such as invoice or receipt date, invoice or receipt number, item prices, total amount, and payment terms to suit your business needs.

Amazon Textract also identifies vendor names that are critical for your workflows but may not be explicitly labeled. For example, Amazon Textract can find the vendor name on a receipt even if it's only indicated within a logo at the top of the page without an explicit key-value pair combination.

Amazon Textract also makes it easy for you to consolidate input from diverse receipts and invoices that use different words for the same concept. For example, Amazon Textract maps relationships between field names in different documents such as customer no., customer number, and account ID, outputting standard taxonomy as INVOICE_RECEIPT_ID. In this case, Amazon Textract represents data consistently across different document types. Fields that do not align with the standard taxonomy are categorized as OTHER.

The following is a list of the standard ﬁelds that AnalyzeExpense currently supports:

• Vendor Name: VENDOR_NAME

• Total: TOTAL

(11)

• Receiver Address: RECEIVER_ADDRESS

• Invoice/Receipt Date: INVOICE_RECEIPT_DATE

• Invoice/Receipt ID: INVOICE_RECEIPT_ID

• Payment Terms: PAYMENT_TERMS

• Subtotal: SUBTOTAL

• Due Date: DUE_DATE

• Tax: TAX

• Invoice Tax Payer ID (SSN/ITIN or EIN): TAX_PAYER_ID

• Item Name: ITEM_NAME

• Item Price: PRICE

• Item Quantity: QUANTITY

The AnalyzeExpense API returns the following elements for a given document page:

• The number of receipts or invoices within a page represented as ExpenseIndex

• The standardized name for individual ﬁelds represented as Type

• The actual name of the ﬁeld as it appears on the document, represented as LabelDetection

• The value of the corresponding ﬁeld represented as ValueDetection

• The number of pages within the submitted document represented as Pages

• The page number on which the ﬁeld, value, or line items was detected, represented as PageNumber

• The geometry, which includes the bounding box and coordinates location of the individual ﬁeld, value, or line items on the page, represented as Geometry

• The conﬁdence score associated with each piece of data detected on the document, represented as Confidence

• The entire row of individual line items purchased, represented as EXPENSE_ROW

The following is a portion of the API output for a receipt processed by AnalyzeExpense that shows the Total: $55.64 in the document extracted as standard ﬁeld TOTAL, actual text on the document as “Total”, Conﬁdence Score of “97.1”, Page Number “1”, The total value as “$55.64” and the bounding box and polygon coordinates:

{

"Type": {

"Text": "TOTAL",

"Confidence": 99.94717407226562 },

"LabelDetection": { "Text": "Total:", "Geometry": { "BoundingBox": {

"Width": 0.09809663146734238, "Height": 0.0234375,

"Left": 0.36822840571403503, "Top": 0.8017578125

},

"Polygon": [ {

"X": 0.36822840571403503, "Y": 0.8017578125

}, {

"X": 0.466325044631958,

(12)

"Y": 0.8017578125 },

{

"X": 0.466325044631958, "Y": 0.8251953125 },

{

"X": 0.36822840571403503, "Y": 0.8251953125

} ]

},

"Confidence": 97.10792541503906 },

"ValueDetection": { "Text": "$55.64", "Geometry": { "BoundingBox": {

"Width": 0.10395314544439316, "Height": 0.0244140625, "Left": 0.66837477684021, "Top": 0.802734375

},

"Polygon": [ {

"X": 0.66837477684021, "Y": 0.802734375 },

{

"X": 0.7723279595375061, "Y": 0.802734375

}, {

"X": 0.7723279595375061, "Y": 0.8271484375 },

{

"X": 0.66837477684021, "Y": 0.8271484375 }

] },

"Confidence": 99.85165405273438 },

"PageNumber": 1 }

You can use synchronous operations to analyze an invoice or receipt. To analyze these documents, you use the AnalyzeExpense operation and pass a receipt or invoice to it. AnalyzeExpense returns the entire set of results. For more information, see Analyzing Invoices and Receipts with Amazon Textract (p. 100).

To analyze invoices and receipts asynchronously, use StartExpenseAnalysis (p. 251) to start processing an input document ﬁle. To get the results, call GetExpenseAnalysis (p. 236). The results for a given call to StartExpenseAnalysis (p. 251) are returned by GetExpenseAnalysis. For more information and an example, see Processing Documents with Asynchronous Operations (p. 113).

Analyzing Identity Documents

Amazon Textract can extract relevant information from passports, driver licenses, and other identity documentation issued by the US Government using the AnalyzeID API. With Analyze ID, businesses can

(13)

quickly, and accurately extract information from IDs such as U.S. driver licenses, state IDs, and passports that have diﬀerent template or format. AnalyzeID API returns two categories of data types:

• Key-value pairs available on ID such as Date of Birth, Date of Issue, ID #, Class, and Restrictions.

• Implied ﬁelds on the document that may not have explicit keys associated with them such as Name, Address, and Issued By.

Key names are standardized within the response. For example, if your driver license says LIC# (license number) and passport says Passport No, Analyze ID response will return the standardized key as

“Document ID” along with the raw key (e.g. LIC#). This standardization lets customers easily combine information across many IDs that use diﬀerent terms for the same concept.

Analyze ID returns information in the structures called IdentityDocumentFields. These are JSON structures containing two pieces of information: the normalized Type and the Value associated with the Type. These both also have a conﬁdence score. For more information, see Identity Documentation Response Objects (p. 25).

You can use synchronous operations to analyze a driver's license or passport. To analyze these documents, you use the AnalyzeID operation and pass an identity document to it. AnalyzeID returns the entire set of results. For more information, see Analyzing Identity Documentation with Amazon Textract (p. 109).

Note

Some identity documents, such as driver's licenses, have two sides. You can pass the front and back images of driver licenses as separate images within the same Analyze ID API request.

Input Documents

A suitable input for an Amazon Textract operation is a single or multipage document. Some examples are a legal document, a form, an ID, or a letter. A form is a document with questions or prompts for a user to provide answers. Some examples are a patient registration form, a tax form, or an insurance claim form.

A document can be in JPEG, PNG, PDF or TIFF format. With PDF and TIFF format ﬁles, you can process multipage documents. For information about how Amazon Textract represents documents as Block objects, see Text Detection and Document Analysis Response Objects (p. 10).

The following is an acceptable input document example.

(14)

For information about document limits, see Hard Limits in Amazon Textract (p. 284).

For Amazon Textract synchronous operations, you can use input documents that are stored in an Amazon S3 bucket, or you can pass base64-encoded image bytes. For more information, see Calling Amazon Textract Synchronous Operations (p. 34). For asynchronous operations, you need to supply input documents in an Amazon S3 bucket. For more information, see Calling Amazon Textract Asynchronous Operations (p. 113).

Amazon Textract Response Objects

Amazon Textract operations return diﬀerent types of objects depending on the operations run. For detecting text, and analyzing a generic document, the operation returns a Block object. For analyzing an invoice or receipt, the operation returns an ExpenseDocuments object. For analyzing identity documentation, the operation returns an IdentityDocumentFields object. For more information about these response objects, see the following sections:

Topics

• Text Detection and Document Analysis Response Objects (p. 10)

• Invoice and Receipt Response Objects (p. 23)

• Identity Documentation Response Objects (p. 25)

(15)

Text Detection and Document Analysis Response Objects

When Amazon Textract processes a document, it creates a list of Block (p. 257) objects for the detected or analyzed text. Each block contains information about a detected item, where it's located, and the conﬁdence that Amazon Textract has in the accuracy of the processing.

A document is made up from the following types of Block objects.

• Pages (p. 11)

• Lines and words of text (p. 12)

• Form Data (Key-value pairs) (p. 14)

• Tables and Cells (p. 16)

• Selection elements (p. 18)

The contents of a block depend on the operation you call. If you call one of the text detection operations, the pages, lines, and words of detected text are returned. For more information, see Detecting Text (p. 3). If you call one of the document analysis operations, information about

detected pages, key-value pairs, tables, selection elements, and text is returned. For more information, see Analyzing Documents (p. 4).

Some Block object ﬁelds are common to both types of processing. For example, each block has a unique identiﬁer.

For examples that show how to use Block objects, see Tutorials (p. 149).

Document Layout

Amazon Textract returns a representation of a document as a list of diﬀerent types of Block objects that are linked in a parent-to-child relationship or a key-value pair. Metadata that provides the number of pages in a document is also returned. The following is the JSON for a typical Block object of type PAGE.

{

"Blocks": [ {

"Geometry": { "BoundingBox": { "Width": 1.0, "Top": 0.0, "Left": 0.0, "Height": 1.0 },

"Polygon": [ {

"Y": 0.0, "X": 0.0 },

{

"Y": 0.0, "X": 1.0 },

{

"Y": 1.0, "X": 1.0 },

{

"Y": 1.0, "X": 0.0

(16)

} ] },

"Relationships": [ {

"Type": "CHILD", "Ids": [

"2602b0a6-20e3-4e6e-9e46-3be57fd0844b", "82aedd57-187f-43dd-9eb1-4f312ca30042", "52be1777-53f7-42f6-a7cf-6d09bdc15a30", "7ca7caa6-00ef-4cda-b1aa-5571dfed1a7c"

] } ],

"BlockType": "PAGE",

"Id": "8136b2dc-37c1-4300-a9da-6ed8b276ea97"

}...

],

"DocumentMetadata": { "Pages": 1

} }

A document is made from one or more PAGE blocks. Each page contains a list of child blocks for the primary items detected on the page, such as lines of text and tables. For more information, see Pages (p. 11).

You can determine the type of a Block object by inspecting the BlockType ﬁeld.

A Block object contains a list of related Block objects in the Relationships ﬁeld, which is an array of Relationship (p. 281) objects. A Relationships array is either of type CHILD or of type VALUE. An array of type CHILD is used to list the items that are children of the current block. For example, if the current block is of type LINE, Relationships contains a list of IDs for the WORD blocks that make up the line of text. An array of type VALUE is used to contain key-value pairs. You can determine the type of the relationship by inspecting the Type ﬁeld of the Relationship object.

Child blocks don't have information about their parent Block objects.

For examples that show Block information, see Processing Documents with Synchronous Operations (p. 34).

Conﬁdence

Amazon Textract operations return the percentage confidence that Amazon Textract has in the accuracy of the detected item. To get the confidence, use the Confidence field of the Block object. A higher value indicates a higher confidence. Depending on the scenario, detections with a low confidence might need visual confirmation by a human.

Geometry

Amazon Textract operations, with the exception of identity analysis, return location information about the location of detected items on a document page. To get the location, use the Geometry ﬁeld of the Block object. For more information, see Item Location on a Document Page (p. 26)

Pages

A document consists of one or more pages. A the section called “Block” (p. 257) object of type PAGE exists for each page of the document. A PAGE block object contains a list of the child IDs for the lines of text, key-value pairs, and tables that are detected on the document page.

(17)

The JSON for a PAGE block looks similar to the following.

{

"Geometry": ....

"2602b0a6-20e3-4e6e-9e46-3be57fd0844b", // Line - Hello, world.

"82aedd57-187f-43dd-9eb1-4f312ca30042", // Line - How are you?

"52be1777-53f7-42f6-a7cf-6d09bdc15a30", "7ca7caa6-00ef-4cda-b1aa-5571dfed1a7c"

] } ],

"Id": "8136b2dc-37c1-4300-a9da-6ed8b276ea97" // Page identifier },

If you're using asynchronous operations with a multipage document that's in PDF format, you can determine the page that a block is located on by inspecting the Page ﬁeld of the Block object. A scanned image (an image in JPEG, PNG, PDF, or TIFF format) is considered to be a single-page document, even if there's more than one document page on the image. Asynchronous operations always return a Page value of 1 for scanned images.

The total number of pages is returned in the Pages ﬁeld of DocumentMetadata. DocumentMetadata is returned with each list of Block objects returned by an Amazon Textract operation.

Lines and Words of Text

Detected text that's returned by Amazon Textract operations is returned in a list of the section called

“Block” (p. 257) objects. These objects represent lines of text or textual words that are detected on a document page. The following text shows two lines of text that are made from multiple words.

This is text.

In two separate lines.

Detected text is returned in the Text ﬁeld of a Block object. The BlockType ﬁeld determines if the text is a line of text (LINE) or a word (WORD). A WORD is one or more ISO basic Latin script characters that aren't separated by spaces. A LINE is a string of tab-delimited and contiguous words.

Additionally, Amazon Textract will determine if a piece of text was handwritten or printed using the TextTypes ﬁeld. These return as HANDWRITING and PRINTED respectively.

The other Block properties are common to all block types, such as the ID, conﬁdence, and geometry information. For more information, see the section called “Text Detection and Document Analysis Response Objects” (p. 10).

To detect only lines and words, you can use DetectDocumentText (p. 222) or

StartDocumentTextDetection (p. 247). For more information, see Detecting Text (p. 3). To get the detected text (lines and words) and information about how it relates to other parts of the document, such as tables, you can use AnalyzeDocument (p. 209) or StartDocumentAnalysis (p. 242). For more information, see Analyzing Documents (p. 4).

PAGE, LINE, and WORD blocks are related to each other in a parent-to-child relationship. A PAGE block is the parent for all LINE block objects on a document page. Because a LINE can have one or more words, the Relationships array for a LINE block stores the IDs for child WORD blocks that make up the line of text.

(18)

The following diagram shows how the line Hello, world. in the text Hello, world. How are you? is represented by Block objects.

The following is the JSON output from DetectDocumentText when the sentence Hello, world. How are you? is detected. The ﬁrst example is the JSON for the document page. Note how the CHILD IDs enable you to navigate through the document.

{

"Geometry": {...}, "Relationships": [ {

"d7fbd604-d609-4d69-857d-247a3f591238", // Line - Hello, world.

"b6c19a93-6493-4d8e-958f-853c8f7ca055" // Line - How are you?

] } ],

"Id": "56ec1d77-171f-4881-9852-2b5b7e761608"

},

The following is the JSON for the LINE blocks that make up the line "Hello, World":

{

"7f97e2ca-063e-47a8-981c-8beee31afc01", // Word - Hello, "4b990aa0-af96-4369-b90f-dbe02538ed21" // Word - world.

] } ],

"Confidence": 99.63229370117188, "Geometry": {...},

"Text": "Hello, world.", "BlockType": "LINE",

"Id": "d7fbd604-d609-4d69-857d-247a3f591238"

},

The following is the JSON for the WORD block for the word Hello,:

{

"Geometry": {...}, "Text": "Hello,", "TextType": "PRINTED", "BlockType": "WORD",

"Confidence": 99.74746704101562,

"Id": "7f97e2ca-063e-47a8-981c-8beee31afc01"

},

The ﬁnal JSON is the WORD block for the word world.:

(19)

{ "Geometry": {...}, "Text": "world.", "TextType": "PRINTED", "BlockType": "WORD",

"Confidence": 99.5171127319336,

"Id": "4b990aa0-af96-4369-b90f-dbe02538ed21"

},

Form Data (Key-Value Pairs)

Amazon Textract can extract form data from documents as key-value pairs. For example, in the following text, Amazon Textract can identify a key (Name:) and a value (Ana Carolina).

Name: Ana Carolina

Detected key-value pairs are returned as Block (p. 257) objects in the responses from

AnalyzeDocument (p. 209) and GetDocumentAnalysis (p. 226). You can use the FeatureTypes input parameter to retrieve information about key-value pairs, tables, or both. For key-value pairs only, use the value FORMS. For an example, see Extracting Key-Value Pairs from a Form Document (p. 149). For general information about how a document is represented by Block objects, see Text Detection and Document Analysis Response Objects (p. 10).

Block objects with the type KEY_VALUE_SET are the containers for KEY or VALUE Block objects that store information about linked text items detected in a document. You can use the EntityType attribute to determine if a block is a KEY or a VALUE.

• A KEY object contains information about the key for linked text. For example, Name:. A KEY block has two relationship lists. A relationship of type VALUE is a list that contains the ID of the VALUE block associated with the key. A relationship of type CHILD is a list of IDs for the WORD blocks that make up the text of the key.

• A VALUE object contains information about the text associated with a key. In the preceding example, Ana Carolina is the value for the key Name:. A VALUE block has a relationship with a list of CHILD blocks that identify WORD blocks. Each WORD block contains one of the words that make up the text of the value. A VALUE object can also contain information about selected elements. For more information, see Selection Elements (p. 18).

Each instance of a KEY_VALUE_SET Block object is a child of the PAGE Block object that corresponds to the current page.

The following diagram shows how the key-value pair Name: Ana Carolina is represented by Block objects.

The following examples show how the key-value pair Name: Ana Carolina is represented by JSON.

The PAGE block has CHILD blocks of type KEY_VALUE_SET for each KEY and VALUE block detected in the document.

{

"Geometry": ....

"Relationships": [

(20)

{

"2602b0a6-20e3-4e6e-9e46-3be57fd0844b", "82aedd57-187f-43dd-9eb1-4f312ca30042",

"52be1777-53f7-42f6-a7cf-6d09bdc15a30", // Key - Name:

"7ca7caa6-00ef-4cda-b1aa-5571dfed1a7c" // Value - Ana Caroline ]

} ],

"Id": "8136b2dc-37c1-4300-a9da-6ed8b276ea97" // Page identifier },

The following JSON shows that the KEY block (52be1777-53f7-42f6-a7cf-6d09bdc15a30) has a relationship with the VALUE block (7ca7caa6-00ef-4cda-b1aa-5571dfed1a7c). It also has a CHILD block for the WORD block (c734fca6-c4c4-415c-b6c1-30f7510b72ee) that contains the text for the key (Name:).

{ "Relationships": [ {

"Type": "VALUE", "Ids": [

"7ca7caa6-00ef-4cda-b1aa-5571dfed1a7c" // Value identifier ]

}, {

"c734fca6-c4c4-415c-b6c1-30f7510b72ee" // Name:

] } ],

"Confidence": 51.55965805053711, "Geometry": ....,

"BlockType": "KEY_VALUE_SET", "EntityTypes": [

"KEY"

],

"Id": "52be1777-53f7-42f6-a7cf-6d09bdc15a30" //Key identifier },

The following JSON shows that VALUE block 7ca7caa6-00ef-4cda-b1aa-5571dfed1a7c has a CHILD list of IDs for the WORD blocks that make up the text of the value (Ana and Carolina).

"db553509-64ef-4ecf-ad3c-bea62cc1cd8a", // Ana "e5d7646c-eaa2-413a-95ad-f4ae19f53ef3" // Carolina ]

} ],

"Confidence": 51.55965805053711, "Geometry": ....,

"VALUE"

],

"Id": "7ca7caa6-00ef-4cda-b1aa-5571dfed1a7c" // Value identifier

(21)

}

The following JSON shows the Block objects for the words Name:, Ana, and Carolina.

{ "Geometry": {...}, "Text": "Name:", "TextType": "PRINTED".

"BlockType": "WORD",

"Confidence": 99.56285858154297,

"Id": "c734fca6-c4c4-415c-b6c1-30f7510b72ee"

}, {

"Geometry": {...}, "Text": "Ana", "TextType": "PRINTED", "BlockType": "WORD",

"Confidence": 99.52057647705078,

"Id": "db553509-64ef-4ecf-ad3c-bea62cc1cd8a"

}, {

"Geometry": {...}, "Text": "Carolina", "TextType": "PRINTED", "BlockType": "WORD",

"Confidence": 99.84207916259766,

"Id": "e5d7646c-eaa2-413a-95ad-f4ae19f53ef3"

},

Tables

Amazon Textract can extract tables and the cells in a table. For example, when the following table is detected on a form, Amazon Textract detects a table with four cells.

Name Address

Ana Carolina 123 Any Town

Detected tables are returned as Block (p. 257) objects in the responses from

AnalyzeDocument (p. 209) and GetDocumentAnalysis (p. 226). You can use the FeatureTypes input parameter to retrieve information about key-value pairs, tables, or both. For tables only, use the value TABLES. For an example, see Exporting Tables into a CSV File (p. 151). For general information about how a document is represented by Block objects, see Text Detection and Document Analysis Response Objects (p. 10).

The following diagram shows how a single cell in a table is represented by Block objects.

A cell contains WORD blocks for detected words, and SELECTION_ELEMENT blocks for selection elements such as check boxes.

(22)

The following is partial JSON for the preceding table, which has four cells.

The PAGE Block object has a list of CHILD Block IDs for the TABLE block and each LINE of text that's detected.

{

"f2a4ad7b-f21d-4966-b548-c859b84f66a4", // Line - Name "4dce3516-ffeb-45e0-92a2-60770e9cb744", // Line - Address "ee506578-768f-4696-8f4b-e4917e429f50", // Line - Ana Carolina "33fc7223-411b-4399-8a90-ccd3c5a2c196", // Line - 123 Any Town "3f9665be-379d-4ae7-be44-d02f32b049c2" // Table

] } ],

"Id": "78c3ce84-ae70-418e-add7-27058418adf6"

},

The TABLE block includes a list of child IDs for the cells within the table. A TABLE block also includes geometry information for the table location in the document. The following JSON shows that the table has four cells, which are listed in the Ids array.

{ "Geometry": {...}, "Relationships": [ {

"505e9581-0d1c-42fb-a214-6ff736822e8c", "6fca44d4-d3d3-46ab-b22f-7fca1fbaaf02", "9778bd78-f3fe-4ae1-9b78-e6d29b89e5e9", "55404b05-ae12-4159-9003-92b7c129532e"

] } ],

"BlockType": "TABLE",

"Confidence": 92.5705337524414,

"Id": "3f9665be-379d-4ae7-be44-d02f32b049c2"

},

The Block type for the table cells is CELL. The Block object for each cell includes information about the cell location compared to other cells in the table. It also includes geometry information for the location of the cell on the document. In the preceding example, 505e9581-0d1c-42fb-a214-6ff736822e8c is the child ID for the cell that contains the word Name. The following example is the information for the cell.

"e9108c8e-0167-4482-989e-8b6cd3c3653e"

] } ],

(23)

"Confidence": 100.0, "RowSpan": 1,

"RowIndex": 1, "ColumnIndex": 1, "ColumnSpan": 1, "BlockType": "CELL",

"Id": "505e9581-0d1c-42fb-a214-6ff736822e8c"

},

Each cell has a location in a table, with the ﬁrst cell being 1,1. In the preceding example, the cell with the value Name is at row 1, column 1. The cell with the value 123 Any Town is at row 2, column 2. A cell block object contains this information in the RowIndex and ColumnIndex ﬁelds. The child list contains the IDs for the WORD Block objects that contain the text that's within the cell. The words in the list are in the order in which they're detected, from the top left of the cell to the bottom right of the cell. In the preceding example, the cell has a child ID with the value e9108c8e-0167-4482-989e-8b6cd3c3653e. The following output is for the WORD Block with the ID value of e9108c8e-0167-4482-989e-8b6cd3c3653e:

"Geometry": {...},

"Text": "Name",

"TextType": "Printed",

"BlockType": "WORD",

"Confidence": 99.81139373779297,

"Id": "e9108c8e-0167-4482-989e-8b6cd3c3653e"

},

Selection Elements

Amazon Textract can detect selection elements such as option buttons (radio buttons) and check boxes on a document page. Selection elements can be detected in form data (p. 14) and in tables (p. 16).

For example, when the following table is detected on a form, Amazon Textract detects the check boxes in the table cells.

Agree Neutral Disagree

Good Service ☑ ☐ ☐

Easy to Use ☐ ☑ ☐

Fair Price ☑ ☐ ☐

Detected selection elements are returned as Block (p. 257) objects in the responses from AnalyzeDocument (p. 209) and GetDocumentAnalysis (p. 226).

Note

You can use the FeatureTypes input parameter to retrieve information about key-value pairs, tables, or both. For example, if you ﬁlter on tables, the response includes the selection elements that are detected in tables. Selection elements that are detected in key-value pairs aren't included in the response.

Information about a selection element is contained in a Block object of type SELECTION_ELEMENT.

To determine the status of a selectable element, use the SelectionStatus ﬁeld of the

SELECTION_ELEMENT block. The status can be either SELECTED or NOT_SELECTED. For example, the value of SelectionStatus for the previous image is SELECTED.

A SELECTION_ELEMENT Block object is associated with either a key-value pair or a table cell. A SELECTION_ELEMENT Block object contains bounding box information for a selection element in the Geometry ﬁeld. A SELECTION_ELEMENT Block object isn't a child of a PAGE Block object.

(24)

Form Data (Key-Value Pairs)

A key-value pair is used to represent a selection element that's detected on a form. The KEY block contains the text for the selection element. The VALUE block contains the SELECTION_ELEMENT block. The following diagram shows how selection elements are represented by the section called

“Block” (p. 257) objects.

For more information about key-value pairs, see Form Data (Key-Value Pairs) (p. 14).

The following JSON snippet shows the key for a key-value pair that contains a selection element (male

☑). The child ID (Id bd14cfd5-9005-498b-a7f3-45ceb171f0ﬀ) is the ID of the WORD block that contains the text for the selection element (male). The value ID (Id 24aaac7f-fcce-49c7-a4f0-3688b05586d4) is the ID of the VALUE block that contains the SELECTION_ELEMENT block object.

{

"Type": "VALUE", "Ids": [

"24aaac7f-fcce-49c7-a4f0-3688b05586d4" // Value containing Selection Element

] }, {

"bd14cfd5-9005-498b-a7f3-45ceb171f0ff" // WORD - male ]

} ],

"Confidence": 94.15619659423828, "Geometry": {

"BoundingBox": {

"Width": 0.022914813831448555, "Top": 0.08072036504745483, "Left": 0.18966935575008392, "Height": 0.014860388822853565 },

"Polygon": [ {

"Y": 0.08072036504745483, "X": 0.18966935575008392 },

{

"Y": 0.08072036504745483, "X": 0.21258416771888733 },

{

"Y": 0.09558075666427612, "X": 0.21258416771888733 },

{

"Y": 0.09558075666427612, "X": 0.18966935575008392 }

] },

"KEY"

],

"Id": "a118dc43-d5f7-49a2-a20a-5f876d9ffd79"

(25)

}

The following JSON snippet is the WORD block for the word Male. The WORD block also has a parent LINE block.

{ "Geometry": {

"BoundingBox": {

"Polygon": [ {

"Y": 0.07842985540628433, "X": 0.18863198161125183 },

{

"Y": 0.07842985540628433, "X": 0.2110965996980667 },

{

"Y": 0.09460209310054779, "X": 0.2110965996980667 },

{

"Y": 0.09460209310054779, "X": 0.18863198161125183 }

] },

"Text": "Male", "BlockType": "WORD",

"Confidence": 54.06439208984375,

"Id": "bd14cfd5-9005-498b-a7f3-45ceb171f0ff"

},

The VALUE block has a child (Id f2f5e8cd-e73a-4e99-a095-053acd3b6bfb) that is the SELECTION_ELEMENT block.

"f2f5e8cd-e73a-4e99-a095-053acd3b6bfb" // Selection element ]

} ],

"Confidence": 94.15619659423828, "Geometry": {

"BoundingBox": {

"Polygon": [ {

"Y": 0.07643391191959381, "X": 0.2271782010793686 },

(26)

{

"Y": 0.07643391191959381, "X": 0.24445968866348267 },

{

"Y": 0.10270800441503525, "X": 0.24445968866348267 },

{

"Y": 0.10270800441503525, "X": 0.2271782010793686 }

] },

"VALUE"

],

"Id": "24aaac7f-fcce-49c7-a4f0-3688b05586d4"

}, }

The following JSON is the SELECTION_ELEMENT block. The value of SelectionStatus indicates that the check box is selected.

{

"Geometry": {

"BoundingBox": {

"Polygon": [ {

"Y": 0.07575977593660355, "X": 0.22590067982673645 },

{

"Y": 0.07575977593660355, "X": 0.2462168186903 },

{

"Y": 0.1033908873796463, "X": 0.2462168186903 },

{

"Y": 0.1033908873796463, "X": 0.22590067982673645 }

] },

"BlockType": "SELECTION_ELEMENT", "SelectionStatus": "SELECTED", "Confidence": 74.14942932128906,

"Id": "f2f5e8cd-e73a-4e99-a095-053acd3b6bfb"

}

Table Cells

Amazon Textract can detect selection elements inside a table cell. For example, the cells in the following table have check boxes.

(27)

Agree Neutral Disagree

Good Service ☑ ☐ ☐

Easy to Use ☐ ☑ ☐

Fair Price ☑ ☐ ☐

A CELL block can contain child SELECTION_ELEMENT objects for selection elements, as well as child WORD blocks for detected text.

For more information about tables, see Tables (p. 16).

The TABLE Block object for the previous table looks similar to this.

"652c09eb-8945-473d-b1be-fa03ac055928", "37efc5cc-946d-42cd-aa04-e68e5ed4741d", "4a44940a-435a-4c5c-8a6a-7fea341fa295", "2de20014-9a3b-4e26-b453-0de755144b1a", "8ed78aeb-5c9a-4980-b669-9e08b28671d2", "1f8e1c68-2c97-47b2-847c-a19619c02ca9", "9927e1d1-6018-4960-ac17-aadb0a94f4d9", "68f0ed8b-a887-42a5-b618-f68b494a6034", "fcba16e0-6bd7-4ea5-b86e-36e8330b68ea", "2250357c-ae34-4ed9-86da-45dac5a5e903",

"c63ad40d-5a14-4646-a8df-2d4304213dbc", // Cell "2b8417dc-e65f-4fcd-aa0f-61a23f1e8cb0",

"26c62932-72f0-4dc2-9893-1ae27829c060", "27f291cc-abf4-4c23-aa24-676abe99cb1e", "7e5ce028-1bcd-4d9f-ad42-15ac181c5b47", "bf32e3d2-efa2-4fc1-b09b-ab9cc52ff734"

] } ],

"BlockType": "TABLE",

"Confidence": 99.99993896484375,

"Id": "f66eac36-2e74-406e-8032-14d1c14e0b86"

}

The CELL BLOCK object (Id c63ad40d-5a14-4646-a8df-2d4304213dbc) for the cell that contains the check box Good Service looks like the following. It includes a child Block (Id = 26d122fd- c5f4-4b53-92c4-0ae92730ee1e) that is the SELECTION_ELEMENT Block object for the check box.

{

"26d122fd-c5f4-4b53-92c4-0ae92730ee1e" // Selection Element ]

} ],

"Confidence": 79.741689682006836,

(28)

"RowSpan": 1, "RowIndex": 3, "ColumnIndex": 3, "ColumnSpan": 1, "BlockType": "CELL",

"Id": "c63ad40d-5a14-4646-a8df-2d4304213dbc"

}

The SELECTION_ELEMENT Block object for the check box is as follows. The value of SelectionStatus indicates that the check box is selected.

{

"Geometry": {...},

"BlockType": "SELECTION_ELEMENT", "SelectionStatus": "SELECTED", "Confidence": 88.79517364501953,

"Id": "26d122fd-c5f4-4b53-92c4-0ae92730ee1e"

}

Invoice and Receipt Response Objects

When you submit an invoice or a receipt to the AnalyzeExpense API, it returns a series of

ExpenseDocuments objects. Each ExpenseDocument is further separated into LineItemGroups and SummaryFields. Most invoices and receipts contain information such as the vendor name, receipt number, receipt date, or total amount. AnalyzeExpense returns this information under SummaryFields.

Receipts and invoices also contain details about the items purchased. The AnalyzeExpense API returns this information under LineItemGroups. The ExpenseIndex ﬁeld uniquely identiﬁes the expense, and associates the appropriate SummaryFields and LineItemGroups detected in that expense.

The most granular level of data in the AnalyzeExpense response consists of Type, ValueDetection, and LabelDetection (Optional). The individual entities are:

• Type (p. 24): Refers to what kind of information is detected on a high level.

• LabelDetection (p. 24): Refers to the label of an associated value within the text of the document.

LabelDetection is optional and only returned if the label is written.

• ValueDetection (p. 24): Refers to the value of the label or type returned.

The AnalyzeExpense API also detects ITEM, QUANTITY, and PRICE within line items as normalized ﬁelds.

If there is other text in a line item on the receipt image such as SKU or detailed description, it will be included in the JSON as EXPENSE_ROW as shown in the below example:

{

"Type": {

"Text": "EXPENSE_ROW",

"Confidence": 99.95216369628906 },

"ValueDetection": {

"Text": "Banana 5 $2.5", "Geometry": {

… },

"Confidence": 98.11214447021484 }

The example above shows how the AnalyzeExpense API returns the entire row on a receipt that contains line item information about 5 bananas sold for $2.5.

(29)

Type

Following is an example of the standard or normalized type of the key-value pair:

{

"PageNumber": 1, "Type": {

"Text": "VENDOR_NAME", "Confidence": 70.0 },

"ValueDetection": { "Geometry": { ... }, "Text": "AMAZON",

"Confidence": 87.89806365966797 }

}

The receipt did not have “Vendor Name” explicitly listed. However, the Analyze Expense API recognized the document as a receipt and categorized the value “AMAZON” as Type VENDOR_NAME.

LabelDetection

Following is an example of text as it is shown on a customer document page:

{

"Text": "OTHER", "Confidence": 70.0 },

"LabelDetection": { "Geometry": { ... }, "Text": "CASHIER",

"Confidence": 88.19171142578125 },

"ValueDetection": { "Geometry": { ... }, "Text": "Mina",

"Confidence": 87.89806365966797 }

}

The example document contained “CASHIER Mina”. The Analyze Expense API extracted the as-is value and returns it under LabelDetection. For implied values such as “Vendor Name”, where the “key”

is not explicitly shown in the receipt, LabelDetection will not be included in the AnalyzeExpense element. In such cases, the AnalyzeExpense API does not return LabelDetection.

ValueDetection

The following is an example shows the “value” of the key-value pair.

{

"Text": "OTHER",

(30)

"Confidence": 70.0 },

"LabelDetection": { "Geometry": { ... }, "Text": "CASHIER",

"Confidence": 88.19171142578125 },

"ValueDetection": { "Geometry": { ... }, "Text": "Mina",

"Confidence": 87.89806365966797 }

}

In the example, the document contained “CASHIER Mina”. The AnalyzeExpense API detected the Cashier value as Mina and returned it under ValueDetection.

Identity Documentation Response Objects

When you submit an identity document to the AnalyzeID API, it returns a series of

IdentityDocumentField objects. Each of these objects contains Type, and Value. Type records the normalized ﬁeld that Amazon Textract detects, and Value records the text associated with the normalized ﬁeld.

Below is an example of an IdentityDocumentField, shortened for brevity.

{ "DocumentMetadata": { "Pages": 1

},

"IdentityDocumentFields": [ {

"Type": {

"Text": "first name"

},

"ValueDetection": { "Text": "jennifer",

"Confidence": 99.99908447265625 }

}, {

"Type": {

"Text": "last name"

},

"ValueDetection": { "Text": "sample",

"Confidence": 99.99758911132812 }

},

These are two examples of IdentityDocumentFields cut from a longer response. There is a seperation between the type detected and the the value for that type. Here, it is the ﬁrst and last name respectively.

This structure repeats with all contained information. If a type is not recognized as a normalized ﬁeld, it will be listed as "other".

Following is a list of normalized ﬁelds for Driver's Licenses:

• ﬁrst name

(31)

• last name

• middle name

• suﬃx

• city in address

• zip code in address

• state in address

• county

• document number

• expiration date

• date of birth

• state name

• date of issue

• class

• restrictions

• endorsements

• id type

• veteran

• address

Following is a list of normalized ﬁelds for U.S Passports:

• ﬁrst name

• last name

• middle name

• document number

• expiration date

• date of birth

• place of birth

• date of issue

• id type

Item Location on a Document Page

Amazon Textract operations return the location and geometry of items found on a document page.

DetectDocumentText (p. 222) and GetDocumentTextDetection (p. 231) return the location and geometry for lines and words, while AnalyzeDocument (p. 209) and GetDocumentAnalysis (p. 226) return the location and geometry of key-value pairs, tables, cells, and selection elements.

To determine where an item is on a document page, use the bounding box (Geometry (p. 269)) information returned by the Amazon Textract operation in a Block (p. 257) object. The Geometry object contains two types of location and geometric information for detected items:

• An axis-aligned BoundingBox (p. 261) object that contains the top-left coordinate and the width and height of the item.

• A polygon object that describes the outline of the item, speciﬁed as an array of Point (p. 280) objects that contain X (horizontal axis) and Y (vertical axis) document page coordinates of each point.

(32)

The JSON for a Block object looks similar to the following. Note the BoundingBox and Polygon ﬁelds.

{ "Geometry": {

"BoundingBox": {

"Polygon": [ {

"Y": 0.08985357731580734, "X": 0.11085548996925354 },

{

"Y": 0.08913730084896088, "X": 0.16447919607162476 },

{

"Y": 0.10159222036600113, "X": 0.16476328670978546 },

{

"Y": 0.10230850428342819, "X": 0.11113958805799484 }

] },

"Text": "Name:", "TextType": "PRINTED", "BlockType": "WORD",

"Confidence": 99.56285858154297,

"Id": "c734fca6-c4c4-415c-b6c1-30f7510b72ee"

},

You can use geometry information to draw bounding boxes around detected items. For an example that uses BoundingBox and Polygon information to draw boxes around lines and vertical lines at the start and end of each word, see Detecting Document Text with Amazon Textract (p. 84). The example output is similar to the following.

Bounding Box

A bounding box (BoundingBox) has the following properties:

• Height – The height of the bounding box as a ratio of the overall document page height.

• Left – The X coordinate of the top-left point of the bounding box as a ratio of the overall document page width.

• Top – The Y coordinate of the top-left point of the bounding box as a ratio of the overall document page height.

• Width – The width of the bounding box as a ratio of the overall document page width.

Each BoundingBox property has a value between 0 and 1. The value is a ratio of the overall image width (applies to Left and Width) or height (applies to Height and Top). For example, if the input image is 700 x 200 pixels, and the top-left coordinate of the bounding box is (350,50) pixels, the API returns a Left value of 0.5 (350/700) and a Top value of 0.25 (50/200).

The following diagram shows the range of a document page that each BoundingBox property covers.

(33)

To display the bounding box with the correct location and size, you have to multiply the BoundingBox values by the document page width or height (depending on the value you want) to get the pixel values.

You use the pixel values to display the bounding box. An example is using a document page of 608 pixels width x 588 pixels height, and the following bounding box values for analyzed text:

BoundingBox.Left: 0.3922065 BoundingBox.Top: 0.15567766 BoundingBox.Width: 0.284666 BoundingBox.Height: 0.2930403

The location of the text bounding box in pixels is calculated as follows:

Left coordinate = BoundingBox.Left (0.3922065) * document page width (608) = 238

Top coordinate = BoundingBox.Top (0.15567766) * document page height (588) = 91 Bounding box width = BoundingBox.Width (0.284666) * document page width (608) = 173

Bounding box height = BoundingBox.Height (0.2930403) * document page height (588) = 172

You use these values to display a bounding box around the analyzed text. The following Java and Python examples demonstrate how to display a bounding box.

Java

public void ShowBoundingBox(int imageHeight, int imageWidth, BoundingBox box, Graphics2D g2d) {

float left = imageWidth * box.getLeft();

float top = imageHeight * box.getTop();

// Display bounding box.

g2d.setColor(new Color(0, 212, 0));

g2d.drawRect(Math.round(left / scale), Math.round(top / scale), Math.round((imageWidth * box.getWidth()) / scale), Math.round((imageHeight * box.getHeight())) / scale);

}

Python

This Python example takes in the response returned by the DetectDocumentText (p. 222) API operation.

def process_text_detection(response):

# Get the text blocks blocks = response['Blocks']

width, height = image.size draw = ImageDraw.Draw(image) print('Detected Document Text')

# Create image showing bounding box/polygon the detected lines/text for block in blocks:

(34)

draw = ImageDraw.Draw(image) if block['BlockType'] == "LINE":

box=block['Geometry']['BoundingBox']

left = width * box['Left']

top = height * box['Top']

draw.rectangle([left,top, left + (width * box['Width']), top +(height * box['Height'])],outline='black')

# Display the image image.show() return len(blocks)

Polygon

The polygon returned by AnalyzeDocument is an array of Point (p. 280) objects. Each Point has an X and Y coordinate for a speciﬁc location on the document page. Like the BoundingBox coordinates, the polygon coordinates are normalized to the document width and height, and are between 0 and 1.

You can use points in the polygon array to display a ﬁner-grain bounding box around a Block object.

You calculate the position of each polygon point on the document page by using the same technique used for BoundingBoxes. Multiply the X coordinate by the document page width, and multiply the Y coordinate by the document page height.

The following example shows how to display the vertical lines of a polygon.

public void ShowPolygonVerticals(int imageHeight, int imageWidth, List <Point> points, Graphics2D g2d) {

Object[] parry = points.toArray();

g2d.setStroke(new BasicStroke(2));

g2d.drawLine(Math.round(((Point) parry[0]).getX() * imageWidth),

Math.round(((Point) parry[0]).getY() * imageHeight), Math.round(((Point) parry[3]).getX() * imageWidth),

Math.round(((Point) parry[3]).getY() * imageHeight));

g2d.drawLine(Math.round(((Point) parry[1]).getX() * imageWidth),

Math.round(((Point) parry[1]).getY() * imageHeight), Math.round(((Point) parry[2]).getX() * imageWidth),

Math.round(((Point) parry[2]).getY() * imageHeight));

}

Amazon Textract

Amazon Textract

Developer Guide

Amazon Textract: Developer Guide

Copyright © Amazon Web Services, Inc. and/or its aﬃliates. All rights reserved.

Table of Contents

What is Amazon Textract?

First-Time Amazon Textract Users

Using Amazon Textract with an AWS SDK

Example availability

How Amazon Textract Works

Detecting Text

Note

Analyzing Documents

Analyzing Invoices and Receipts

Analyzing Identity Documents

Note

Input Documents

Amazon Textract Response Objects

Text Detection and Document Analysis Response Objects

Document Layout

Conﬁdence

Geometry

Pages

Lines and Words of Text

Form Data (Key-Value Pairs)

Tables

Selection Elements

Note

Form Data (Key-Value Pairs)

Table Cells

Invoice and Receipt Response Objects

Type

LabelDetection

ValueDetection

Identity Documentation Response Objects

Item Location on a Document Page

Bounding Box

Polygon