• 沒有找到結果。

Amazon Comprehend

N/A
N/A
Protected

Academic year: 2022

Share "Amazon Comprehend"

Copied!
550
0
0

加載中.... (立即查看全文)

全文

(1)

Amazon Comprehend

Developer Guide

(2)

Amazon Comprehend: Developer Guide

Copyright © Amazon Web Services, Inc. and/or its affiliates. All rights reserved.

Amazon's trademarks and trade dress may not be used in connection with any product or service that is not Amazon's, in any manner that is likely to cause confusion among customers, or in any manner that disparages or discredits Amazon. All other trademarks not owned by Amazon are the property of their respective owners, who may or may not be affiliated with, connected to, or sponsored by Amazon.

(3)

Table of Contents

What Is Amazon Comprehend? ... 1

Amazon Comprehend Insights ... 1

Comprehend Custom ... 1

Document Clustering (Topic Modeling) ... 2

Examples ... 2

Benefits ... 2

Are You a First-time User of Amazon Comprehend? ... 3

How It Works ... 4

Supported Languages ... 5

Supported Languages ... 5

Languages Supported by Amazon Comprehend Features ... 5

Getting Started ... 7

Step 1: Set Up an Account ... 7

Sign Up for AWS ... 7

Create an IAM User ... 7

Next Step ... 8

Step 2: Set Up the AWS CLI ... 8

Next Step ... 9

Step 3: Getting Started Using the Console ... 9

Analyzing Documents Using the Console ... 9

Creating and Using Custom Entity Recognizer ... 16

Creating and Using Custom Classifiers ... 20

Model Versioning with Amazon Comprehend ... 25

Creating a Topic Modeling Job Using the Console ... 27

Creating an Events Detection Job Using the Console ... 29

Step 4: Getting Started Using the API ... 30

Detecting the Dominant Language ... 30

Detecting Named Entities ... 33

Detecting Key Phrases ... 35

Detecting PII ... 37

Labeling Documents with PII ... 38

Detecting Sentiment ... 39

Detecting Syntax ... 41

Using Custom Classification ... 44

Detecting Custom Entities ... 48

Detecting Events ... 52

Topic Modeling ... 54

Using the Batch APIs ... 60

Solution: Analyzing Text with OpenSearch ... 65

Tutorial: Analyzing Insights from Reviews ... 66

Prerequisites ... 67

Step 1: Adding Documents to Amazon S3 ... 68

Prerequisites ... 69

Download Sample Data ... 69

Create an Amazon S3 Bucket ... 69

(Console Only) Create Folders ... 70

Upload the Input Data ... 70

Step 2: (CLI Only) Creating an IAM Role ... 71

Prerequisites ... 72

Create an IAM Role ... 72

Attach an IAM Policy to the IAM Role ... 73

Step 3: Running Analysis Jobs ... 74

Prerequisites ... 74

Analyze Sentiment and Entities ... 74

(4)

Step 4: Preparing the Output ... 77

Prerequisites ... 77

Download the Output ... 77

Extract the Output Files ... 78

Upload the Extracted Files ... 79

Load the Data into an AWS Glue Data Catalog ... 80

Prepare the Data for Analysis ... 83

Step 5: Visualizing the Output ... 85

Prerequisites ... 85

Give Amazon QuickSight Access ... 85

Import the Datasets ... 86

Create a Sentiment Visualization ... 86

Create an Entities Visualization ... 87

Publish a Dashboard ... 88

Clean Up ... 89

Custom Classification ... 90

Multi-Class and Multi-Label Modes ... 90

Asynchronous Classification ... 91

Training a Custom Classifier ... 91

Creating Training Data ... 92

Multi-Class Mode ... 92

Multi-Label Mode ... 94

Testing the Training Data ... 96

Running an Asynchronous Classification Job ... 96

The Classification Job ... 98

Real-time Analysis with Custom Classification ... 100

Creating an Endpoint for Custom Classification ... 100

Running Real-Time Custom Classification ... 101

Metrics ... 102

Metrics ... 103

Improving Your Custom Classifier's Performance ... 106

Confusion Matrix ... 106

Custom Entity Recognition ... 109

Training custom entity recognizers ... 109

Annotations ... 111

Entity lists ... 120

Detecting Custom Entities with a Batch Job ... 122

Detecting Custom Entities in Real Time ... 125

Creating an Endpoint ... 126

Running Entity Detection ... 127

Metrics ... 127

Improving Performance ... 129

Managing Endpoints ... 131

Endpoints Overview ... 131

Monitoring an Endpoint ... 132

Updating an Endpoint ... 133

Using Trusted Advisor ... 134

Amazon Comprehend Underutilized Endpoints ... 135

Amazon Comprehend Endpoint Access Risk ... 136

Deleting an Endpoint ... 137

Auto Scaling with Endpoints ... 137

Target Tracking ... 138

Scheduled Scaling ... 140

Copying Custom Models Between AWS Accounts ... 143

Sharing a Custom Model ... 143

Before You Begin ... 144

Resource-Based Policies for Custom Models ... 147

(5)

Step 1: Add a Resource-Based Policy to a Custom Model ... 147

Step 2: Provide the Details That Others Need to Import ... 149

Importing a Custom Model ... 150

Before You Begin ... 150

Importing a Custom Model ... 152

Text Analysis APIs ... 155

Detect Entities ... 155

Detect Events ... 157

... 157

Supported Types for Entities, Events, and Arguments ... 158

Detect Key Phrases ... 162

Detect the Dominant Language ... 163

Detect PII ... 167

PII Entity Types ... 167

Locate PII Entities ... 169

Redact PII Entities ... 170

Label Documents with PII ... 171

Label Documents with PII Entity Types ... 171

Determine Sentiment ... 172

Analyze Syntax ... 173

Topic Modeling ... 175

Document Processing Modes ... 179

Single-Document Processing ... 179

Asynchronous Batch Processing ... 179

Prerequisites ... 180

Starting an Analysis Job ... 180

Monitoring Analysis Jobs ... 181

Getting Analysis Results ... 181

Multiple Document Synchronous Processing ... 184

Using S3 Object Lambda Access Points for PII ... 187

Controlling Access to Documents with PII ... 187

Creating an Amazon S3 Object Lambda Access Point to Control Access to Documents ... 188

Invoking an Amazon S3 Object Lambda Access Point to Control Access to Documents ... 188

Redacting PII from Documents ... 189

Creating an Amazon S3 Object Lambda Access Point to Redact PII from Documents ... 188

Invoking an Amazon S3 Object Lambda Access Point to Redact PII from Documents ... 188

Tagging ... 191

Tagging a new resource ... 191

Viewing, editing, and deleting tags ... 192

Security ... 194

Data Protection ... 194

KMS Encryption in Amazon Comprehend ... 195

Cross-service Confused Deputy Prevention ... 197

Using a Virtual Private Cloud (VPC) ... 199

VPC endpoints (AWS PrivateLink) ... 202

Authentication and Access Control ... 203

Authentication ... 204

Access Control ... 204

Overview of Managing Access ... 205

Using Identity-Based Policies (IAM Policies) for Amazon Comprehend ... 207

Amazon Comprehend API Permissions Reference ... 212

AWS Managed Policies ... 212

Logging Amazon Comprehend API Calls with AWS CloudTrail ... 215

Amazon Comprehend Information in CloudTrail ... 215

Examples: Amazon Comprehend Log File Entries ... 217

Compliance Validation ... 223

Resilience ... 224

(6)

Infrastructure Security ... 224

Permissions Required for a Custom Asynchronous Analysis Job ... 225

Guidelines and Quotas ... 226

Supported Regions ... 226

Overall Quotas ... 226

Throttling When Using Single Transactions ... 226

Multiple Document Operations ... 226

Asynchronous Operations ... 227

Document Classification ... 227

Language Detection ... 228

Events ... 229

Topic Modeling ... 229

Entity Recognition ... 229

API Reference ... 232

Actions ... 232

BatchDetectDominantLanguage ... 234

BatchDetectEntities ... 237

BatchDetectKeyPhrases ... 240

BatchDetectSentiment ... 243

BatchDetectSyntax ... 246

ClassifyDocument ... 249

ContainsPiiEntities ... 252

CreateDocumentClassifier ... 254

CreateEndpoint ... 260

CreateEntityRecognizer ... 264

DeleteDocumentClassifier ... 270

DeleteEndpoint ... 272

DeleteEntityRecognizer ... 274

DeleteResourcePolicy ... 276

DescribeDocumentClassificationJob ... 278

DescribeDocumentClassifier ... 281

DescribeDominantLanguageDetectionJob ... 284

DescribeEndpoint ... 287

DescribeEntitiesDetectionJob ... 289

DescribeEntityRecognizer ... 292

DescribeEventsDetectionJob ... 295

DescribeKeyPhrasesDetectionJob ... 297

DescribePiiEntitiesDetectionJob ... 300

DescribeResourcePolicy ... 303

DescribeSentimentDetectionJob ... 306

DescribeTopicsDetectionJob ... 309

DetectDominantLanguage ... 312

DetectEntities ... 315

DetectKeyPhrases ... 319

DetectPiiEntities ... 322

DetectSentiment ... 324

DetectSyntax ... 327

ImportModel ... 330

ListDocumentClassificationJobs ... 334

ListDocumentClassifiers ... 337

ListDocumentClassifierSummaries ... 340

ListDominantLanguageDetectionJobs ... 342

ListEndpoints ... 345

ListEntitiesDetectionJobs ... 348

ListEntityRecognizers ... 351

ListEntityRecognizerSummaries ... 355

ListEventsDetectionJobs ... 357

(7)

ListKeyPhrasesDetectionJobs ... 360

ListPiiEntitiesDetectionJobs ... 363

ListSentimentDetectionJobs ... 366

ListTagsForResource ... 369

ListTopicsDetectionJobs ... 371

PutResourcePolicy ... 374

StartDocumentClassificationJob ... 377

StartDominantLanguageDetectionJob ... 382

StartEntitiesDetectionJob ... 387

StartEventsDetectionJob ... 392

StartKeyPhrasesDetectionJob ... 396

StartPiiEntitiesDetectionJob ... 401

StartSentimentDetectionJob ... 405

StartTopicsDetectionJob ... 410

StopDominantLanguageDetectionJob ... 415

StopEntitiesDetectionJob ... 417

StopEventsDetectionJob ... 419

StopKeyPhrasesDetectionJob ... 421

StopPiiEntitiesDetectionJob ... 423

StopSentimentDetectionJob ... 425

StopTrainingDocumentClassifier ... 427

StopTrainingEntityRecognizer ... 429

TagResource ... 431

UntagResource ... 433

UpdateEndpoint ... 435

Data Types ... 437

AugmentedManifestsListItem ... 439

BatchDetectDominantLanguageItemResult ... 441

BatchDetectEntitiesItemResult ... 442

BatchDetectKeyPhrasesItemResult ... 443

BatchDetectSentimentItemResult ... 444

BatchDetectSyntaxItemResult ... 445

BatchItemError ... 446

ClassifierEvaluationMetrics ... 447

ClassifierMetadata ... 449

DocumentClass ... 450

DocumentClassificationJobFilter ... 451

DocumentClassificationJobProperties ... 452

DocumentClassifierFilter ... 455

DocumentClassifierInputDataConfig ... 456

DocumentClassifierOutputDataConfig ... 458

DocumentClassifierProperties ... 459

DocumentClassifierSummary ... 463

DocumentLabel ... 465

DocumentReaderConfig ... 466

DominantLanguage ... 467

DominantLanguageDetectionJobFilter ... 468

DominantLanguageDetectionJobProperties ... 469

EndpointFilter ... 472

EndpointProperties ... 473

EntitiesDetectionJobFilter ... 476

EntitiesDetectionJobProperties ... 477

Entity ... 480

EntityLabel ... 482

EntityRecognizerAnnotations ... 483

EntityRecognizerDocuments ... 484

EntityRecognizerEntityList ... 485

(8)

EntityRecognizerEvaluationMetrics ... 486

EntityRecognizerFilter ... 487

EntityRecognizerInputDataConfig ... 488

EntityRecognizerMetadata ... 490

EntityRecognizerMetadataEntityTypesListItem ... 491

EntityRecognizerProperties ... 492

EntityRecognizerSummary ... 495

EntityTypesEvaluationMetrics ... 497

EntityTypesListItem ... 498

EventsDetectionJobFilter ... 499

EventsDetectionJobProperties ... 500

InputDataConfig ... 503

KeyPhrase ... 505

KeyPhrasesDetectionJobFilter ... 506

KeyPhrasesDetectionJobProperties ... 507

OutputDataConfig ... 510

PartOfSpeechTag ... 511

PiiEntitiesDetectionJobFilter ... 512

PiiEntitiesDetectionJobProperties ... 513

PiiEntity ... 516

PiiOutputDataConfig ... 518

RedactionConfig ... 519

SentimentDetectionJobFilter ... 520

SentimentDetectionJobProperties ... 521

SentimentScore ... 524

SyntaxToken ... 525

Tag ... 526

TopicsDetectionJobFilter ... 527

TopicsDetectionJobProperties ... 528

VpcConfig ... 531

Common Errors ... 531

Common Parameters ... 533

Document History ... 536

(9)

Amazon Comprehend Insights

What Is Amazon Comprehend?

Amazon Comprehend uses natural language processing (NLP) to extract insights about the content of documents. Amazon Comprehend processes any text file in UTF-8 format, image files (JPG, PNG, or TIFF), and semi-structured documents (PDF or Word files). It develops insights by recognizing the entities, key phrases, language, sentiments, and other common elements in a document. Use Amazon Comprehend to create new products based on understanding the structure of documents. For example, using Amazon Comprehend you can search social networking feeds for mentions of products or scan an entire document repository for key phrases.

Topics

• Amazon Comprehend Insights (p. 1)

• Comprehend Custom (p. 1)

• Document Clustering (Topic Modeling) (p. 2)

• Examples (p. 2)

• Benefits (p. 2)

• Are You a First-time User of Amazon Comprehend? (p. 3)

Amazon Comprehend Insights

You work with one or more documents at a time to evaluate their content and gain insights about them.

Some of the insights that Amazon Comprehend develops about a document include:

Entities – Amazon Comprehend returns a list of entities, such as people, places, and locations, identified in a document. For more information, see Detect Entities (p. 155).

Key phrases – Amazon Comprehend extracts key phrases that appear in a document. For example, a document about a basketball game might return the names of the teams, the name of the venue, and the final score. For more information, see Detect Key Phrases (p. 162).

PII – Amazon Comprehend analyzes documents to detect personal data that could be used to identify an individual, such as an address, bank account number, or phone number. For more information, see Detect Personally Identifiable Information (PII) (p. 167).

Language – Amazon Comprehend identifies the dominant language in a document. Amazon Comprehend can identify 100 languages. For more information, see Detect the Dominant Language (p. 163).

Sentiment – Amazon Comprehend determines the emotional sentiment of a document. Sentiment can be positive, neutral, negative, or mixed. For more information, see Determine Sentiment (p. 172).

Syntax – Amazon Comprehend parses each word in your document and determines the part of speech for the word. For example, in the sentence "It is raining today in Seattle," "it" is identified as a pronoun,

"raining" is identified as a verb, and "Seattle" is identified as a proper noun. For more information, see Analyze Syntax (p. 173).

Comprehend Custom

Customize Comprehend for your specific requirements without the skillset required to build machine learning-based NLP solutions. Using automatic machine learning, or AutoML, Comprehend Custom builds customized NLP models on your behalf, using data you already have.

(10)

Document Clustering (Topic Modeling)

Custom Classification – Create custom document classifiers to organize your documents into your own categories. For each classification label, provide a set of documents that best represent that label and train your classifier on it. Once trained, a classifier can be used on any number of unlabeled document sets. You can use the console for a code-free experience or install the latest AWS SDK. For more information, see Custom Classification (p. 90).

Custom Entities – Create custom entity types that analyze text for your specific terms and noun-based phrases. You can train custom entities to extract terms like policy numbers, or phrases that imply a customer escalation. To train the model, you provide a list of the entities and a set of documents that contain them. Once the model is trained, you can submit analysis jobs against it to extract their custom entities. For more information, see Custom Entity Recognition (p. 109).

Document Clustering (Topic Modeling)

You can also use Amazon Comprehend to examine a corpus of documents to organize them based on similar keywords within them. Document clustering (topic modeling) is useful to organize a large corpus of documents into topics or clusters that are similar based on the frequency of words within them.

Topic modeling is a asynchronous process, you submit a set of documents for processing and then later get the results when processing is complete. Amazon Comprehend does topic modeling on large document sets, for best results you should include at least 1,000 documents when you submit a topic modeling job. For more information, see Topic Modeling (p. 175).

Examples

The following examples show how you might use the Amazon Comprehend operations in your applications.

Example 1: Find documents about a subject

Find the documents about a particular subject using Amazon Comprehend topic modeling. Scan a set of documents to determine the topics discussed, and to find the documents associated with each topic. You can specify the number of topics that Amazon Comprehend should return from the document set.

Example 2: Find out how customers feel about your products

If your company publishes a catalog, let Amazon Comprehend tell you what customers think of your products. Send each customer comment to the DetectSentiment operation and it will tell you whether customers feel positive, negative, neutral, or mixed about a product.

Example 3: Discover what matters to your customers

Use Amazon Comprehend topic modeling to discover the topics that your customers are talking about on your forums and message boards, then use entity detection to determine the people, places, and things that they associate with the topic. Finally, use sentiment analysis to determine how your customers feel about a topic.

Benefits

Some of the benefits of using Amazon Comprehend include:

Integrate powerful natural language processing into your apps—Amazon Comprehend removes the complexity of building text analysis capabilities into your applications by making powerful and

(11)

Are You a First-time User of Amazon Comprehend?

accurate natural language processing available with a simple API. You don't need textual analysis expertise to take advantage of the insights that Amazon Comprehend produces.

Deep learning based natural language processing—Amazon Comprehend uses deep learning technology to accurately analyze text. Our models are constantly trained with new data across multiple domains to improve accuracy.

Scalable natural language processing—Amazon Comprehend enables you to analyze millions of documents so that you can discover the insights that they contain.

Integrate with other AWS services—Amazon Comprehend is designed to work seamlessly with other AWS services like Amazon S3, AWS KMS, and AWS Lambda. Store your documents in Amazon S3, or analyze real-time data with Kinesis Data Firehose. Support for AWS Identity and Access Management (IAM) makes it easy to securely control access to Amazon Comprehend operations. Using IAM, you can create and manage AWS users and groups to grant the appropriate access to your developers and end users.

Encryption of output results and volume data —Amazon S3 already enables you to encrypt your input documents, and Amazon Comprehend extends this even farther. By using your own KMS key, you can not only encrypt the output results of your job, but also the data on the storage volume attached to the compute instance that processes the analysis job. The result is significantly enhanced security.

Low cost—With Amazon Comprehend, you only pay for the documents that you analyze. There are no minimum fees or upfront commitments.

Are You a First-time User of Amazon Comprehend?

If you are a first-time user of Amazon Comprehend, we recommend that you read the following sections in order:

1.How It Works (p. 4) – This section introduces Amazon Comprehend concepts.

2.Getting Started with Amazon Comprehend (p. 7) – In this section, you set up your account and test Amazon Comprehend.

3.API Reference (p. 232) – In this section you'll find reference documentation for Amazon Comprehend operations.

(12)

How It Works

Amazon Comprehend uses a pre-trained model to examine and analyze a document or set of documents to gather insights about it. This model is continuously trained on a large body of text so that there is no need for you to provide training data.

Amazon Comprehend can examine and analyze documents in a variety of languages, depending on the specific feature. For more information, see Amazon Comprehend supported languages.

Additionally, Amazon Comprehend's Detect the Dominant Language (p. 163) operation can examine documents and determine the dominant language out of a far wider variety of different languages. For more information, see Languages Supported in Amazon Comprehend (p. 5).

With Amazon Comprehend, you can perform the following on your documents:

• Detect the Dominant Language (p. 163) — Examine text to determine the dominant language.

• Detect Entities (p. 155) — Detect textual references to the names of people, places, and items as well as references to dates and quantities.

• Detect Key Phrases (p. 162) — Find key phrases such as "good morning" in a document or set of documents.

• Detect Personally Identifiable Information (PII) (p. 167) — Analyze documents to detect personal data that could be used to identify an individual, such as an address, bank account number, or phone number.

• Determine Sentiment (p. 172) — Analyze documents and determine the dominant sentiment of the text.

• Analyze Syntax (p. 173) — Parse the words in your text and show the speech syntax for each word and enable you to understand the content of the document.

• Topic Modeling (p. 175) — Search the content of documents to determine common themes and topics.

Each operation can be processed in several ways:

• Single-Document Processing (p. 179) — You call Amazon Comprehend with a single document and receive a synchronous response.

• Multiple Document Synchronous Processing (p. 184) — You call Amazon Comprehend with a collection of up to 25 documents and receive a synchronous response.

• Asynchronous Batch Processing (p. 179) — You put a collection of documents into an Amazon S3 bucket and start an asynchronous operation to analyze the documents. The results of the analysis are returned in an S3 bucket.

Each operation can be encrypted both during communication and processing.

By using the integrated AWS KMS encryption, you maintain control over who can access to your encrypted data.

You can optionally provide a custom KMS key when you create your analysis job and your data will be encrypted on the storage volume attached to the ML compute instance processing the job. You can also provide a key to encrypt your output results as it's sent to the S3 bucket. If you have set up encryption on the S3 bucket that holds your input documents, this can provide you with end-to-end security.

For more information, see KMS Encryption in Amazon Comprehend (p. 195).

(13)

Supported Languages

Languages Supported in Amazon Comprehend

Amazon Comprehend supports a wide variety of languages for its various features. The languages supported and the features that support them can be seen in the following tables.

Topics

• Supported Languages (p. 5)

• Languages Supported by Amazon Comprehend Features (p. 5)

Supported Languages

Amazon Comprehend (except the Detect Dominant Language feature) supports the following languages for one or more features.

Code Language

de German

en English

es Spanish

it Italian

pt Portuguese

fr French

ja Japanese

ko Korean

hi Hindi

ar Arabic

zh Chinese (simplified)

zh-TW Chinese (traditional)

NoteAmazon Comprehend identifies the language using identifiers from RFC 5646 — if there is a 2- letter ISO 639-1 identifier, with a regional subtag if necessary, it uses that. Otherwise, it uses the ISO 639-2 3-letter code. For more information about RFC 5646, see the IETF Tools web site.

Languages Supported by Amazon Comprehend Features

Feature Supported Languages

Detect the Dominant Language (p. 163) See Detect the Dominant Language (p. 163).

(14)

Languages Supported by Amazon Comprehend Features

Feature Supported Languages

Detect Entities (p. 155) All supported languages.

Detect Key Phrases (p. 162) All supported languages.

Detect Personally Identifiable Information

(PII) (p. 167) English

Label Documents with Personally Identifiable

Information (PII) (p. 171) English

Determine Sentiment (p. 172) All supported languages.

Analyze Syntax (p. 173) German (de), English (en), Spanish (es), French (fr), Italian (it), and Portuguese (pt).

Topic Modeling (p. 175) Not dependent on the language used. Does not support character-based languages such as Chinese, Japanese, and Korean.

Custom Classification (p. 90) German (de), English (en), Spanish (es), French (fr), Italian (it), and Portuguese (pt).

Custom Entity Recognition (p. 109) German (de), English (en), Spanish (es), French (fr), Italian (it), and Portuguese (pt).

(15)

Step 1: Set Up an Account

Getting Started with Amazon Comprehend

To get started using Amazon Comprehend, set up an AWS account and create an AWS Identity and Access Management (IAM) user. To use the Amazon Comprehend (AWS CLI), download and configure it.

Topics

• Step 1: Set Up an AWS Account and Create an Administrator User (p. 7)

• Step 2: Set Up the AWS Command Line Interface (AWS CLI) (p. 8)

• Step 3: Getting Started Using the Amazon Comprehend Console (p. 9)

• Step 4: Getting Started Using the Amazon Comprehend API (p. 30)

• Solution: Analyzing Text with Amazon Comprehend and Amazon OpenSearch (p. 65)

Step 1: Set Up an AWS Account and Create an Administrator User

Before you use Amazon Comprehend for the first time, complete the following tasks:

1.Sign Up for AWS (p. 7) 2.Create an IAM User (p. 7)

Sign Up for AWS

When you sign up for Amazon Web Services (AWS), your AWS account is automatically signed up for all AWS services, including Amazon Comprehend. You are charged only for the services that you use.

With Amazon Comprehend, you pay only for the resources that you use. If you are a new AWS customer, you can get started with Amazon Comprehend for free. For more information, see AWS Free Usage Tier.

If you already have an AWS account, skip to the next section.

To create an AWS account

1. Open https://portal.aws.amazon.com/billing/signup.

2. Follow the online instructions.

Part of the sign-up procedure involves receiving a phone call and entering a verification code on the phone keypad.

Record your AWS account ID because you'll need it for the next task.

Create an IAM User

Services in AWS, such as Amazon Comprehend, require that you provide credentials when you access them. This allows the service to determine whether you have permissions to access the service's resources.

(16)

Next Step

We strongly recommend that you access AWS using AWS Identity and Access Management (IAM), not the credentials for your AWS account. To use IAM to access AWS, create an IAM user, add the user to an IAM group with administrative permissions, and then grant administrative permissions to the IAM user. You can then access AWS using a special URL and the IAM user's credentials.

The Getting Started exercises in this guide assume that you have a user with administrator privileges, adminuser.

To create an administrator user and sign in to the console

1. Create an administrator user called adminuser in your AWS account. For instructions, see Creating Your First IAM User and Administrators Group in the IAM User Guide.

2. Sign in to the AWS Management Console using a special URL. For more information, see How Users Sign In to Your Account in the IAM User Guide.

For more information about IAM, see the following:

• AWS Identity and Access Management (IAM)

• Getting started

• IAM User Guide

Next Step

Step 2: Set Up the AWS Command Line Interface (AWS CLI) (p. 8)

Step 2: Set Up the AWS Command Line Interface (AWS CLI)

You don't need the AWS CLI to perform the steps in the Getting Started exercises. However, some of the other exercises in this guide do require it. If you prefer, you can skip this step and go to Step 3: Getting Started Using the Amazon Comprehend Console (p. 9), and set up the AWS CLI later.

To set up the AWS CLI

1. Download and configure the AWS CLI. For instructions, see the following topics in the AWS Command Line Interface User Guide:

• Getting Set Up with the AWS Command Line Interface

• Configuring the AWS Command Line Interface

2. In the AWS CLI config file, add a named profile for the administrator user:.

[profile adminuser]

aws_access_key_id = adminuser access key ID

aws_secret_access_key = adminuser secret access key region = aws-region

You use this profile when executing the AWS CLI commands. For more information about named profiles, see Named Profiles in the AWS Command Line Interface User Guide. For a list of AWS Regions, see Regions and Endpoints in the Amazon Web Services General Reference.

(17)

Next Step

3. Verify the setup by typing the following help command at the command prompt:

aws help

Next Step

Step 3: Getting Started Using the Amazon Comprehend Console (p. 9)

Step 3: Getting Started Using the Amazon Comprehend Console

The easiest way to get started using Amazon Comprehend is to use the console to analyze a short text file. If you haven't reviewed the concepts and terminology in How It Works (p. 4), we recommend that you do that before proceeding.

Topics

• Analyzing Documents Using the Console (p. 9)

• Creating and Using Custom Entity Recognizer (p. 16)

• Creating and Using Custom Classifiers (p. 20)

• Model Versioning with Amazon Comprehend (p. 25)

• Creating a Topic Modeling Job Using the Console (p. 27)

• Creating an Events Detection Job Using the Console (p. 29)

Analyzing Documents Using the Console

The Amazon Comprehend console enables you to analyze the contents of documents up to 5,000 characters long. The results are shown in the console so that you can review the analysis.

To start analyzing documents, sign in to the AWS Management Console and open the Amazon Comprehend console.

You can replace the sample text with your own text either in English or one of the other languages supported by Amazon Comprehend and then choose Analyze to get an analysis of your text. Below the text being analyzed, the Results pane shows more information about the text.

Entities

The Entities tab lists each entity, its category, and the level of confidence that Amazon Comprehend has detected in the input text. The results are color-coded to indicate different entity types such as organizations, locations, dates, and persons. For more information, see Detect Entities (p. 155).

(18)

Analyzing Documents Using the Console

Key phrases

The Key phrases tab lists key noun phrases that Amazon Comprehend detected in the input text and the associated confidence level. For more information, see Detect Key Phrases (p. 162).

(19)

Analyzing Documents Using the Console

Language

The Language tab shows the dominant language of the text and Amazon Comprehend's level of confidence that it has detected the dominant language correctly. Amazon Comprehend can recognize 100 languages. For more information, see Detect the Dominant Language (p. 163).

(20)

Analyzing Documents Using the Console

PII

The PII tab lists entities in your input text that contain personally identifiable information (PII). A PII entity is a textual reference to personal data that could be used to identify an individual, such as an address, bank account number, or phone number. For more information, see Detect Personally Identifiable Information (PII) (p. 167).

The PII tab provides two analysis modes:

• Offsets

• Labels

Offsets

The Offsets analysis mode identifies the location of PII in your text documents. For more information, see Locate PII Entities (p. 169).

(21)

Analyzing Documents Using the Console

Labels

The Labels analysis mode checks for the presence of PII in your text document and returns the labels of identified PII entity types. For more information, see Label Documents with PII Entity Types (p. 171).

(22)

Analyzing Documents Using the Console

Sentiment

The Sentiment tab shows the overall emotional sentiment of the text. Sentiment can be rated neutral, positive, negative, or mixed. In this case, each emotional sentiment has a confidence rating, providing an estimate by Amazon Comprehend for that sentiment being dominant. For more information, see Determine Sentiment (p. 172).

(23)

Analyzing Documents Using the Console

Syntax

The Syntax tab shows a breakdown of each element in the text, along with its part of speech and the associated confidence score. For more information, see Analyze Syntax (p. 173).

(24)

Creating and Using Custom Entity Recognizer

Creating and Using Custom Entity Recognizer

You can create custom entity recognizers using the Amazon Comprehend console. This section shows you how to create and train a custom entity recognizer and then how to create an entity recognizer job.

Creating a Custom Entity Recognizer Using the Console - CSV Format

To create the custom entity recognizer, first provide a dataset to train your model. With this dataset, include one of the following: a set of annotated documents or a list of entities and their type label, along with a set of documents containing those entities. For more information, see Custom Entity Recognition (p. 109)

To train a custom entity recognizer with a CSV file

1. Sign in to the AWS Management Console and open the Amazon Comprehend console.

(25)

Creating and Using Custom Entity Recognizer

2. From the left menu, choose Customization and then choose Custom entity recognition.

3. Choose Create new model.

4. Give the recognizer a name. The name must be unique within the Region and account.

5. Select the language.

6. Under Custom entity type, enter a custom label that you want the recognizer to find in the dataset.

The entity type must be uppercase, and if it consists of more than one word, separate the words with an underscore.

7. Choose Add type.

8. If you want to add an additional entity type, enter it, and then choose Add type. If you want to remove one of the entity types you've added, choose Remove type and then choose the entity type to remove from the list. A maximum of 25 entity types can be listed.

9. To encrypt your training job, choose Recognizer encryption and then choose whether to use a KMS key associated with the current account, or one from another account.

• If you are using a key associated with the current account, for KMS key ID choose the key ID.

• If you are using a key associated with a different account, for KMS key ARN enter the ARN for the key ID.

NoteFor more information on creating and using KMS keys and the associated encryption, see Key Management Service (KMS).

10. Under Data specifications, choose the format of your training documents:

CSV file — A CSV file that supplements your training documents. The CSV file contains

information about the custom entities that your trained model will detect. The required format of the file depends on whether you are providing annotations or an entity list.

Augmented manifest — A labeled dataset that is produced by Amazon SageMaker Ground Truth.

This file is in JSON lines format. Each line is a complete JSON object that contains a training document and its labels. Each label annotates a named entity in the training document. You can provide up to 5 augmented manifest files.

For more information about available formats, and for examples, see Training custom entity recognizers (p. 109).

11. Under Training type, choose the training type to use:

Using annotations and training docs

Using entity list and training docs

If choosing annotations, enter the URL of the annotations file in Amazon S3. You can also navigate to the bucket or folder in Amazon S3 where the annotation files are located and choose Browse S3.

If choosing entity list, enter the URL of the entity list in Amazon S3. You can also navigate to the bucket or folder in Amazon S3 where the entity list is located and choose Browse S3.

12. Enter the URL of an input dataset containing the training documents in Amazon S3. You can also navigate to the bucket or folder in Amazon S3 where the training documents are located and choose Select folder.

13. Under Test dataset select how you want to evaluate the performance of your trained model - you can do this for both annotations and entity list training types.

Autosplit: Autosplit automatically selects 10% of your provided training data to use as testing data

(26)

Creating and Using Custom Entity Recognizer

• (Optional) Customer provided: When you select customer provided, you can specify exactly what test data you want to use.

14. If you select Customer provided test dataset, enter the URL of the annotations file in Amazon S3.

You can also navigate to the bucket or folder in Amazon S3 where the annotation files are located and choose Select folder.

15. In the Choose an IAM role section, either select an existing IAM role or create a new one.

Choose an existing IAM role – Select this option if you already have an IAM role with permissions to access the input and output Amazon S3 buckets.

Create a new IAM role – Select this option when you want to create a new IAM role with the proper permissions for Amazon Comprehend to access the input and output buckets.

NoteIf the input documents are encrypted, the IAM role used must have kms:Decrypt permission. For more information, see Permissions Required to Use KMS

Encryption (p. 207).

16. (Optional) To launch your resources into Amazon Comprehend from a VPC, enter the VPC ID under VPC or choose the ID from the drop-down list.

1. Choose the subnet under Subnet(s). After you select the first subnet, you can choose additional ones.

2. Under Security Group(s), choose the security group to use if you specified one. After you select the first security group, you can choose additional ones.

NoteWhen you use a VPC with your custom entity recognition job, the DataAccessRole used for the Create and Start operations must have permissions to the VPC from which the input documents and the output bucket are accessed.

17. (Optional) To add a tag to the custom entity recognizer, enter a key-value pair under Tags. Choose Add tag. To remove this pair before creating the recognizer, choose Remove tag.

18. Choose Train.

The new recognizer will then appear in the list, showing its status. It will first show as Submitted. It will then show Training for a classifier that is processing training documents, Trained for a classifier that is ready to use, and In error for a classifier that has an error. You can click on a job to get more information about the recognizer, including any error messages.

Creating a Custom Entity Recognizer Using the Console - Augmented Manifest

To train a custom entity recognizer with a Plain text, PDF, or Word Document 1. Sign in to the AWS Management Console and open the Amazon Comprehend console.

2. From the left menu, choose Customization and then choose Custom entity recognition.

3. Choose Train recognizer.

4. Give the recognizer a name. The name must be unique within the Region and account.

5. Select the language. Note: If you're training a PDF or Word document, English is the supported language.

6. Under Custom entity type, enter a custom label that you want the recognizer to find in the dataset.

The entity type must be uppercase, and if it consists of more than one word, separate the words with an underscore.

7. Choose Add type.

(27)

Creating and Using Custom Entity Recognizer

8. If you want to add an additional entity type, enter it, and then choose Add type. If you want to remove one of the entity types you've added, choose Remove type and then choose the entity type to remove from the list. A maximum of 25 entity types can be listed.

9. To encrypt your training job, choose Recognizer encryption and then choose whether to use a KMS key associated with the current account, or one from another account.

• If you are using a key associated with the current account, for KMS key ID choose the key ID.

• If you are using a key associated with a different account, for KMS key ARN enter the ARN for the key ID.

NoteFor more information on creating and using KMS keys and the associated encryption, see Key Management Service (KMS).

10. Under Training data, choose Augmented manifest as your data format:

Augmented manifest — is a labeled dataset that is produced by Amazon SageMaker Ground Truth. This file is in JSON lines format. Each line in the file is a complete JSON object that contains a training document and its labels. Each label annotates a named entity in the training document.

You can provide up to 5 augmented manifest files. If you are using PDF documents for training data, you must select Augmented manifest. You can provide up to 5 augmented manifest files.

For each file, you can name up to 5 attributes to use as training data.

For more information about available formats, and for examples, see Training custom entity recognizers (p. 109).

11. Select the training model type.

If you selected Plain text documents, under Input location, enter the Amazon S3URL of the Amazon SageMakerGround Truth augmented manifest file. You can also navigate to the bucket or folder in Amazon S3 where the augmented manifest(s) is located and choose Select folder.

12. Under Attribute name, enter the name of the attribute that contains your annotations. If the file contains annotations from multiple chained labeling jobs, add an attribute for each job. In this case, each attribute contains the set of annotations from a labeling job. Note: You can provide up to 5 attribute names for each file.

13. Select Add.

14. If you selected PDF, Word documents under Input location, enter the Amazon S3URL of the Amazon SageMaker Ground Truth augmented manifest file. You can also navigate to the bucket or folder in Amazon S3 where the augmented manifest(s) is located and choose Select folder.

15. Enter the S3 prefix for your Annotation data files. These are the PDF documents that you labled.

16. Enter the S3 prefix for your Source documents. These are the original PDF documents (data objects) that you provided to Ground Truth for your labeling job.

17. Enter the attribute names that contain your annotations. Note: You can provide up to 5 attribute names for each file. Any attributes in your file that you don't specify are ignored.

18. In the IAM role section, either select an existing IAM role or create a new one.

Choose an existing IAM role – Select this option if you already have an IAM role with permissions to access the input and output Amazon S3 buckets.

Create a new IAM role – Select this option when you want to create a new IAM role with the proper permissions for Amazon Comprehend to access the input and output buckets.

NoteIf the input documents are encrypted, the IAM role used must have kms:Decrypt permission. For more information, see Permissions Required to Use KMS

Encryption (p. 207).

(28)

Creating and Using Custom Classifiers

19. (Optional) To launch your resources into Amazon Comprehend from a VPC, enter the VPC ID under VPC or choose the ID from the drop-down list.

1. Choose the subnet under Subnet(s). After you select the first subnet, you can choose additional ones.

2. Under Security Group(s), choose the security group to use if you specified one. After you select the first security group, you can choose additional ones.

Note

When you use a VPC with your custom entity recognition job, the DataAccessRole used for the Create and Start operations must have permissions to the VPC from which the input documents and the output bucket are accessed.

20. (Optional) To add a tag to the custom entity recognizer, enter a key-value pair under Tags. Choose Add tag. To remove this pair before creating the recognizer, choose Remove tag.

21. Choose Train.

The new recognizer will then appear in the list, showing its status. It will first show as Submitted. It will then show Training for a classifier that is processing training documents, Trained for a classifier that is ready to use, and In error for a classifier that has an error. You can click on a job to get more information about the recognizer, including any error messages.

Creating and Using Custom Classifiers

You can create and train custom classifiers using the console, and then run asynchronous classification jobs to analyze your documents. You can also use the same custom model and add an endpoint to it to run custom classification requests to gain real-time (synchronous) insights about your text. This section shows you how to create a classifier using the console and then both how to use it to run an asynchronous classification job, or how to create an endpoint for it and run a real-time classification request.

Topics

• Creating a Custom Classifier (Console) (p. 20)

• Running an Asynchronous Custom Classification Job (p. 22)

• Creating a Real-time Custom Classification Request (p. 24)

Creating a Custom Classifier (Console)

Create a custom document classifier to identify the categories of a set of documents.

To train the classifier, you need a set of training documents. You label these documents with the categories that you want the document classifier to recognize. For more information on these training documents, see Custom Classification (p. 90).

To train a document classifier

1. Sign in to the AWS Management Console and open the Amazon Comprehend console.

2. From the left menu, choose Customization and then choose Custom Classification.

3. Choose Create new model.

4. Give the classifier a name. The name must be unique within your account and current Region.

5. Select the language of the training documents. You can train a document classifier using any of the languages that work with Amazon Comprehend. However, you can only train the classifier in one language. To learn more, see Languages Supported by Amazon Comprehend. (p. 5)

(29)

Creating and Using Custom Classifiers

6. (Optional) If you want to encrypt the data in the storage volume while your training job is being processed, choose Classifier encryption and then choose whether to use a KMS key associated with your current account, or one from another account.

• If you are using a key associated with the current account, choose the key ID for KMS key ID.

• If you are using a key associated with a different account, enter the ARN for the key ID under KMS key ARN.

Note

For more information on creating and using KMS keys and the associated encryption, see Key Management Service (KMS).

7. Under Data specifications, choose which classifier mode to use.

Single-label mode: Choose this option if the categories you are assigning to documents are mutually exclusive and you are training your classifier to assign one and only one label to each document.

Multi-label mode: Choose this option if multiple categories can applied to a document at the same time and you are training your classifier to assign one, many, all, or no label to each document.

8. If you chose Multi-label mode, choose the character delimiter you want to use to separate labels when there are more than one label per line from Delimiter for labels.

9. Under Data format, choose the format of your training documents:

CSV file — A two-column CSV file, where labels are provided in the first column, and documents are provided in the second.

Augmented manifest — A labeled dataset that is produced by Amazon SageMaker Ground Truth.

This file is in JSON lines format. Each line is a complete JSON object that contains a training document and its associated labels.

For more information about these formats, and for examples, see Training a Custom Classifier (p. 91).

10. Under Training dataset, enter the location of the Amazon S3 bucket that contains your training documents or navigate to it by choosing Select folder. The IAM role you're using for access permissions for the training job must have reading permissions for the S3 bucket.

11. Under Test dataset select how you want to evaluate the performance of your trained model - you can do this for both annotations and entity list training types.

Autosplit: Autosplit automatically selects 10% of your provided training data to use as testing data

• (Optional) Customer provided: When you select customer provided, you can specify exactly what test data you want to use. If you select Customer provided test dataset, enter the URL of the annotations file in Amazon S3. You can also navigate to the bucket or folder in Amazon S3 where the annotation files are located and choose Select folder.

12. (Optional) If you want Amazon Comprehend to create a confusion matrix that provides metrics on how well the classifier performed during training, enter the location of an Amazon S3 bucket where it will be saved. For more information, see Confusion Matrix (p. 106).

(Optional) If you choose to encrypt the output result from your training job, choose Encryption and then choose whether to use a KMS key associated with the current account, or one from another account.

• If you are using a key associated with the current account, choose the key alias for KMS key ID.

• If you are using a key associated with a different account, enter the ARN for the key alias or ID under KMS key ID.

(30)

Creating and Using Custom Classifiers

13. Choose Choose an existing IAM role, and then choose an existing IAM role that has read permissions for the S3 bucket that contains your training documents. Only roles that have a trust policy that begins with comprehend.amazonaws.com are valid.

If you don't already have an IAM role with these permissions, choose Create an IAM role to make one. Choose the access permissions to grant this role, and then choose a name suffix to distinguish the role from IAM roles in your account.

NoteIf the input documents are encrypted, the IAM role used must also have kms:Decrypt permission. For more information, see Permissions Required to Use KMS

Encryption (p. 207).

14. (Optional) To launch your resources into Amazon Comprehend from a VPC, enter the VPC ID under VPC or choose the ID from the drop-down list.

1. Choose the subnet under Subnets(s). After you select the first subnet, you can choose additional ones.

2. Under Security Group(s), choose the security group to use if you specified one. After you select the first security group, you can choose additional ones.

NoteWhen you use a VPC with your classification job, the DataAccessRole used for the Create and Start operations must have permissions to the VPC from which the input documents and the output bucket are accessed.

15. (Optional) To add a tag to the custom classifier, enter a key-value pair under Tags. Choose Add tag.

To remove this pair before creating the classifier, choose Remove tag. For more information, see Tagging your resources (p. 191).

16. Choose Create.

The new classifier will then appear in the list, showing its status. It will first show as Submitted. It will then show Training for a classifier that is processing training documents, Trained for a classifier that is ready to use, and In error for a classifier that has an error. You can click on a job to get more information about the classifier, including any error messages.

Running an Asynchronous Custom Classification Job

Once you have created a custom document classifier, you can use it to categorize a group of documents.

(31)

Creating and Using Custom Classifiers

To create a custom asynchronous classification job

1. Sign in to the AWS Management Console and open the Amazon Comprehend console.

2. From the left menu, choose Customization and then choose Custom classification.

3. Choose Create job.

4. Give the classification job a name. The name must be unique your account and current Region.

5. Under Analysis type, choose Custom classification.

6. From Select classifier, choose the custom classifier to use.

7. (Optional) If you choose to encrypt the data in the storage volume while your classification job is processed, choose Job encryption and then choose whether to use a KMS key associated with the current account, or one from another account.

• If you are using a key associated with the current account, choose the key ID for KMS key ID.

• If you are using a key associated with a different account, enter the ARN for the key ID under KMS key ARN.

NoteFor more information on creating and using KMS keys and the associated encryption, see Key Management Service (KMS).

8. Under Input data, enter the location of the Amazon S3 bucket that contains your input documents or navigate to it by choosing Select folder. This bucket must be in the same region as the API that you are calling. The IAM role you're using for access permissions for the classification job must have reading permissions for the S3 bucket.

9. (Optional) Choose the format of the documents to be classified under Input format. These can be one document per file, or one document per line in a single file.

10. Under Output data, enter the location of the Amazon S3 bucket where Amazon Comprehend should write the job's output data or navigate to it by choosing Select folder. This bucket must be in the same region as the API that you are calling. The IAM role you're using for access permissions for the classification job must have write permissions for the S3 bucket.

11. (Optional) If you choose to encrypt the output result from your job, choose Encryption and then choose whether to use a KMS key associated with the current account, or one from another account.

• If you are using a key associated with the current account, choose the key alias or ID for KMS key ID.

• If you are using a key associated with a different account, enter the ARN for the key alias or ID under KMS key ID.

12. (Optional) To launch your resources into Amazon Comprehend from a VPC, enter the VPC ID under VPC or choose the ID from the drop-down list.

1. Choose the subnet under Subnet(s). After you select the first subnet, you can choose additional ones.

2. Under Security Group(s), choose the security group to use if you specified one. After you select the first security group, you can choose additional ones.

NoteWhen you use a VPC with your classification job, the DataAccessRole used for the Create and Start operations must have permissions to the VPC from which the output bucket are accessed.

13. Choose Create job to create the document classification job.

(32)

Creating and Using Custom Classifiers

Creating a Real-time Custom Classification Request

In addition to using the custom document classifier to run asynchronous jobs, you can also use it to run synchronous custom classification requests to gain real-time insight into the categories in your document. This requires first that you create an endpoint and set the level of data throughput for it, and then to run the real-time analysis.

Note

Using real-time analysis will result in additional cost to your account. This cost is determined by how long the endpoint is operating and the level of throughput you determine.

The level of throughput assigned to an endpoint is measured in Inference units, each of which represents data throughput of 100 characters per second. You can provision the endpoint with up to 10 inference units. This level of throughput can be adjust to meet your needs by updating the endpoint.

Once you have completed your real-time analysis, you should delete the endpoint because the charge for it will continue as long as it's active. You can easily create another endpoint whenever you need it.

To create an endpoint

1. Sign in to the AWS Management Console and open the Amazon Comprehend console.

2. From the left menu, choose Customization and then choose Custom Classification.

3. From the Classifiers list, choose the name of the custom model for which you want to create the endpoint and follow the link. The Endpoints list on the custom model details page is displayed.

NotePreviously created endpoints are shown on the models detail page, along with the model with which they're associated.

4. Under Endpoints, choose Create endpoint.

5. Give the endpoint a name. The name must be unique within the AWS Region and account.

6. Enter the number of inference units to assign to the endpoint. Each unit represents a throughput of 100 characters per second. You can assign up to a maximum of 10 inference units per endpoint.

7. (Optional) To add a tag to the endpoint, enter a key-value pair under Tags and choose Add tag. To remove this pair before creating the endpoint, choose Remove tag.

8. Choose Create endpoint. The Endpoints list is displayed, with the new endpoint showing Creating.

Once it shows Ready, the endpoint can be used for real-time analysis.

To run a real-time custom classification request

1. Sign in to the AWS Management Console and open the Amazon Comprehend console.

2. From the left menu, choose Real-time analysis.

3. Under Input type, choose Custom for Analysis type.

4. For Select endpoint, choose the endpoint that you want to use. This endpoint is linked to a specific custom model.

5. Enter the text you want to analyze.

6. Choose Analyze. The text analysis based on your custom model is displayed, along with a confidence assessment of the analysis.

To update your endpoint

1. Sign in to the AWS Management Console and open the Amazon Comprehend console.

(33)

Model Versioning with Amazon Comprehend

2. From the left menu, choose Customization and then choose Custom classification.

3. From the Classifiers list, choose the name of the custom model for which you want to update the endpoint and follow the link. The custom model details page is displayed.

4. Navigate to the Endpoints list, choose the name of the endpoint you want to update and follow the link.

5. Choose Edit.

6. Enter the updated number of inference units to assign to the endpoint. Each unit represents a throughput of 100 characters per second. You can assign up to a maximum of 10 inference units per endpoint.

NoteThe cost of using an endpoint is based on the amount of time operating and the

throughput (based on the number of inference units. Increasing the number of inference units will thus increase the cost of operation. For more information, see Amazon

Comprehend Pricing.

7. Choose Edit endpoint. The endpoint details page is displayed.

8. Confirm that the endpoint is updating by choosing the model name from the breadcrumbs at the top of the page. On the custom model details page, navigate to the Endpoints list and verify that it shows Updating next to the endpoint. When the update is complete, it will show Ready.

To delete your endpoint

1. Sign in to the AWS Management Console and open the Amazon Comprehend console.

2. From the left menu, choose Customization and then choose Custom classification.

3. From the Classifiers list, choose the name of the custom model associated with the endpoint you want to delete and follow the link. The custom model details page is displayed.

4. Navigate to the Endpoints list, choose the name of the endpoint to delete and follow the link.

5. Choose Delete.

Note

All endpoints associated with a custom model must be deleted before that model itself can be removed.

6. Choose Delete again to confirm the deletion. The custom model details page is displayed. Confirm that the endpoint you deleted shows deleting next to it. When it's deleted, the endpoint is removed from the Endpoints list.

Model Versioning with Amazon Comprehend

Artifical intelligence and machine learning (AI/ML) is all about rapid experimentation. With Amazon Comprehend, you train and build out models which you use to gain insight on your data. With model versioning you can keep track of your modeling history and scores associated with running results of your models as you provide more or different sets of data. You can use versioning with your custom classification models or your custom entity recognition models. Taking a look at your different versions over time you can gain insight on how successful they've performed and gain insight on what parameters you used to get to your state of success.

When you train a new version of an existing custom classifier model or entity recognition model, all you need to do is create a new version from the model details page and all the details populate for you. The new version will have the same name as your earlier model — what we call the versionID — although you will give it a unique version name during creation. As you add new versions to a model, you can see all the previous versions and their details in one view from the model details page. With versioning, you can see how model performance changes as you make changes to your training dataset.

(34)

Model Versioning with Amazon Comprehend

Create a new Custom classifier version (console)

1. Sign in to the AWS Management Console and open the Amazon Comprehend console.

2. From the left menu, choose Customization and then choose Custom classification.

3. From the Classifiers list, choose the name of the custom model from which you want to create a new version. The custom model details page is displayed.

4. On the top right, select Create new model. A screen opens with prepopulated details from the parent custom classification model.

5. Under Version name add a unique name to the new version.

6. Under version details, you can change the language and number of labels associated with your new model.

7. Under the Data specifications section configure how you want to provide the data to your new version— make sure to provide full data, which includes documents from your previous model and your new documents. You can change the Classifier mode (single-label, or multi-label), Data format (CSV file, Augmented manifest), your Training dataset, and your Test dataset (autosplit, or your custom test data configuration).

8. (Optional) update the S3 location for your output data 9. Under Access permissions, create or use an existing IAM role.

10. (Optional) Update your VPC settings

(35)

Creating a Topic Modeling Job Using the Console

11. (Optional) Add tags to your new version to help keep track of the details.

For more information about creating custom classifiers, see Custom Classification (p. 90) and Creating and Using Custom Classifiers (p. 20)

Create a new Custom entity recognizer version (console)

1. Sign in to the AWS Management Console and open the Amazon Comprehend console.

2. From the left menu, choose Customization and then choose Custom entity recognition.

3. From the Recognizer model list, choose the name of the recognizer from which you want to create a new version. The details page is displayed.

4. On the top right, select Train new version. A screen opens with prepopulated details from the parent entity recognizer.

5. Under Version name add a unique name to the new version.

6. Under Custom entity type, add the custom labels or label you want the recognizer to identify in your dataset and select Add type. Choose a custom entity type from the annotations or entity list you've provided. The recognizer will then use all of the included entity types to identify entities in the data set when running your job. Each entity type must be upper-case and separated by and underscore if it uses multiple words. A maximum of 25 types are allowed.

7. (Optional) Select Recognizer encryption to encrypt the data in the storage volume while your job is being processed.

8. Under the Training data section, specify the Annotation and data format details (CSV file, Augmented manifest)single-label, or multi-label), Data format (CSV, Augmented manifest), your Training dataset, and your Test dataset (autosplit, or your custom test data configuration).

9. (Optional) update the S3 location for your output data 10. Under Access permissions, create or use an existing IAM role.

11. (Optional) Update your VPC settings

12. (Optional) Add tags to your new version to help keep track of the details.

To learn more about custom entity recognizers, see Custom Entity Recognition (p. 109) and Creating a Custom Entity Recognizer Using the Console (p. 16).

Creating a Topic Modeling Job Using the Console

You can use the Amazon Comprehend console to create and manage asynchronous topic detection jobs.

To create a topic modeling job

1. Sign in to the AWS Management Console and open the Amazon Comprehend console.

2. From the left menu, choose Analysis Jobs and then choose Create.

3. Under Job settings, give the job a name. The name must be unique within the region and account.

4. For Analysis Type, choose Topic Modeling.

5. (Optional) If you choose to encrypt the data in the storage volume while your job is processed, choose Job encryption and then choose whether to use a KMS key associated with the current account, or one from another account.

• If you are using a key associated with the current account, for KMS key IDchoose the key ID.

• If you are using a key associated with a different account, for KMS key ARN enter the ARN for the key ID.

(36)

Creating a Topic Modeling Job Using the Console

NoteFor more information on creating and using KMS keys and the associated encryption, see Key Management Service (KMS).

6. Choose the data source to use. You can use either sample data or you can analyze your own data stored in an Amazon S3 bucket.

If you choose to use your own data, provide the following information:

S3 data location – An Amazon S3 data bucket that contains the documents to analyze. You can choose the folder icon to browse to the location of your data. The bucket must be in the same region as the API that you are calling.

Input format – Optionally choose whether input data is contained in one document per file, or if there is one document per line in a file.

Number of topics – The number of topics to return.

7. (Optional) If you choose to encrypt the output result from your job, choose Encryption and then choose whether to use a KMS key associated with the current account, or one from another account.

• If you are using a key associated with the current account, for KMS key ID choose the key alias or ID.

• If you are using a key associated with a different account, for KMS key ID enter the ARN for the key alias or ID.

8. In the Choose an IAM role section, either select an existing IAM role or create a new one.

Choose an existing IAM role – Choose this option if you already have an IAM role with permissions to access the input and output Amazon S3 buckets.

Create a new IAM role – Choose this option when you want to create a new IAM role with the proper permissions for Amazon Comprehend to access the input and output buckets. For more information about the permissions given to the IAM role, see Role-Based Permissions Required for Asynchronous Operations (p. 210).

Note

If the input documents are encrypted, the IAM role used must have KMS:Decrypt permission. For more information, see Permissions Required to Use KMS

Encryption (p. 207).

9. (Optional) To launch your resources into Amazon Comprehend from a VPC, enter the VPC ID under VPC or choose the ID from the drop-down list.

1. Choose the subnet under Subnet(s). After you select the first subnet, you can choose additional ones.

2. Under Security Group(s), choose the security group to use if you specified one. After you select the first security group, you can choose additional ones.

Note

When you use a VPC with your topic modeling job, the DataAccessRole that is used for the Create and Start operations must have permissions to the VPC from which the input documents and the output bucket are accessed.

10. When you have finished filling out the form, choose Create job to create and start the topic detection job.

The new job appears in the job list with the status field showing the status of the job. The field can be IN_PROGRESS for a job that is processing, COMPLETED for a job that has finished successfully,

參考文獻

相關文件

To facilitate the Administrator to create student accounts, a set of procedures is prepared for the Administrator to extract the student accounts from WebSAMS. For detailed

• Use table to create a table for column-oriented or tabular data that is often stored as columns in a spreadsheet.. • Use detectImportOptions to create import options based on

• Tactics: the art of organizing an army, and using weapons or military units in combination against the enemy in military encounters.. • Operational art: a component of military

private methods effectively not inherited be- cause not “visible” to the subclass.. More on Access Permissions:

The MTMH problem is divided into three subproblems which are separately solved in the following three stages: (1) find a minimum set of tag SNPs based on pairwise perfect LD

In this theses the wooden constructions as a historical and traditional value holder is combined with recent hi – tech material (ETFE foil) to create

After teaching the use and importance of rhyme and rhythm in chants, an English teacher designs a choice board for students to create a new verse about transport based on the chant

It is intended in this project to integrate the similar curricula in the Architecture and Construction Engineering departments to better yet simpler ones and to create also a new