Amazon Kendra

(1)

Amazon Kendra

Developer Guide

(2)

Amazon Kendra: Developer Guide

Amazon's trademarks and trade dress may not be used in connection with any product or service that is not Amazon's, in any manner that is likely to cause confusion among customers, or in any manner that disparages or discredits Amazon. All other trademarks not owned by Amazon are the property of their respective owners, who may or may not be aﬃliated with, connected to, or sponsored by Amazon.

(3)

What is Amazon Kendra?

Amazon Kendra is a highly accurate and intelligent search service that enables your users to search unstructured and structured data using natural language processing and advanced search algorithms. It returns speciﬁc answers to questions, giving users an experience that's close to interacting with a human expert. It is highly scalable and capable of meeting performance demands, tightly integrated with other AWS services such as Amazon S3 and Amazon Lex, and oﬀers enterprise-grade security.

Amazon Kendra users can ask the following types of questions, or queries:

• Factoid questions — Simple who, what, when, or where questions, such as Who is on duty today?

or Where is the nearest service center to Seattle? Factoid questions have fact-based answers that can be returned in the form of a single word or phrase. The answer is retrieved from a FAQ or from your indexed documents.

• Descriptive questions — Questions whose answer could be a sentence, passage, or an entire

document. For example, How do I connect my Echo Plus to my network? or How do I get tax beneﬁts for lower income families?.

• Keyword searches — Questions where the intent and scope are not clear. For example, keynote address. As 'address' can often have several meanings, Amazon Kendra can infer the user's intent behind the search query to return relevant information aligned with the user's intended meaning.

Amazon Kendra uses deep learning models to handle this kind of query.

Beneﬁts of Amazon Kendra

Amazon Kendra has the following beneﬁts:

• Accuracy — Unlike traditional search services that use keyword searches where results are based on basic keyword matching and ranking, Amazon Kendra attempts to understand the context of the question. Amazon Kendra searches across your data and goes beyond traditional search to return the most relevant word, snippet, or document for your query. Amazon Kendra uses machine learning to improve search results over time.

• Simplicity — Amazon Kendra provides a console and API for managing your documents that you want to search. You can use a simple search API to integrate Amazon Kendra into your client applications, such as websites or mobile applications.

• Connectivity — Amazon Kendra can connect to third-party data repositories or data sources such as Microsoft SharePoint. You can easily index and search your documents using your data source.

• User Access Control — Amazon Kendra delivers highly secure enterprise search for your search applications. Your search results reﬂect the security model of your organization. Customers are responsible for authenticating and authorizing users to gain access to their search application.

Amazon Kendra Developer Edition

The Amazon Kendra Developer Edition provides all of the features of Amazon Kendra at a lower cost. It includes a free tier that provides 750 hours of use. The Developer Edition is ideal to explore how Amazon Kendra indexes your documents, to try out features, and to develop applications that use Amazon Kendra.

The developer edition provides the following:

(14)

Amazon KendraEnterprise Edition

• Up to 5 indexes with up to 5 data sources each.

• 10,000 documents or 3 GB of extracted text.

• Approximately 4,000 queries per day or 0.05 queries per second.

• Runs in 1 availability zone (AZ) – see Availability Zones (data centers in AWS regions)

You should not use the Developer Edition for a production application. The Developer Edition doesn't provide any guarantees of latency or availability.

Amazon KendraEnterprise Edition

Use Amazon Kendra Enterprise Edition when you want to index your entire enterprise document library or for when your application is ready for use in a production environment.

The enterprise edition provides the following:

• Up to 5 indexes with up to 50 data sources each.

• 100,000 documents or 30 GB of extracted text.

• Approximately 8,000 queries per day or 0.1 queries per second.

• Runs in 3 availability zones (AZ) – see Availability Zones (data centers in AWS regions)

You can increase this quota using the Service Quotas console.

Pricing for Amazon Kendra

You can get started for free with the Amazon Kendra Developer Edition that provides usage of up to 750 hours for the ﬁrst 30 days. After your trial expires, you are charged for all provisioned Amazon Kendra indexes, even if they are empty and no queries are executed. After the trial expires, there are additional charges for scanning and syncing documents using the Amazon Kendra data sources.

For a complete list of charges and prices, see Amazon Kendra pricing

Are you a ﬁrst-time Amazon Kendra user?

If you are a ﬁrst-time user of Amazon Kendra, we recommend that you read the following sections in order:

1.How Amazon Kendra works (p. 3) – Introduces the Amazon Kendra components and describes how you use them to create a search solution.

2.Getting started (p. 46) – Explains how to set up your account and test the Amazon Kendra search API.

3.Creating an index (p. 75) – Explains how to use Amazon Kendra to create a search index and to add data sources to sync your documents.

4.Adding documents directly to an index (p. 85) – Explains how to add documents directly to an Amazon Kendra index.

5.Searching indexes (p. 157) – Explains how to use the Amazon Kendra search API to search an index.

6.Deploying Amazon Kendra (p. 36) – Provides a sample application you can use to deploy Amazon Kendra to your website.

(15)

Index

How Amazon Kendra works

Amazon Kendra provides the functionality to your search application. It indexes your documents directly or from your third-party document repository and intelligently serves relevant information to your users.

You can use Amazon Kendra to create an updatable index of documents of a variety of types, including plain text, HTML ﬁles, Microsoft Word documents, Microsoft PowerPoint presentations, and PDF ﬁles.

Amazon Kendra integrates with other services. For example, you can power Amazon Lex chat bots with Amazon Kendra search to provide answers to users' questions. You can use Amazon S3 bucket as a data source for your Amazon Kendra index. And you can set up AWS Identity and Access Management to control access to Amazon Kendra resources.

Amazon Kendra has the following components:

• The index, which allows your documents to be searched. You create the index from a source or repository of documents.

• A source repository, which contains the documents to index.

• A data source that syncs the documents in your source repository to an Amazon Kendra index. You can automatically synchronize a data source with an Amazon Kendra index so that new, updated, and deleted ﬁles in the source repository are updated in the index.

• A document addition API, that adds documents directly to the index.

You can use Amazon Kendra through the console or the API. You can create, update, and delete indexes.

Deleting an index deletes all data sources and permanently deletes all of your document information from Amazon Kendra.

Topics

• Index (p. 3)

• Documents (p. 5)

• Data sources (p. 6)

• Queries (p. 7)

• Tags (p. 8)

Index

An index holds the contents of your documents and is structured in a way to make the documents searchable. The way you add documents to the index depends on how you store your documents.

• If you store your documents in some kind of repository, such as an Amazon S3 bucket or a Microsoft SharePoint site, you use a data source connector to index your documents from your repository.

• If you don't store your documents in a respository, you use the BatchPutDocument API to directly index your documents.

• For FAQ questions and answers, which must be stored in an Amazon Kendra (Amazon S3) bucket, you upload them from the bucket

You can create indexes with the Amazon Kendra console, the AWS CLI, or an AWS SDK. For information about the types of documents that can be indexed, see Types of documents (p. 5).

(16)

Index ﬁelds

An index contains fields that you map to the attributes of your document. Attributes could include, for example, the document title, main body text, last updated date, and other attributes contained within the structure of your documents. You can also create custom attributes such as the figure description, or the business department the document is associated with. Index fields, which you map to your document attributes, provide the schema for your index. Amazon Kendra uses the fields to search your documents.

After you map your ﬁelds to your document attributes, you can use the information in the ﬁeld for searching on.

Amazon Kendra has 15 reserved ﬁelds, which you can map to your document attributes:

• _authors – A list of one or more authors responsible for the content of the document.

• _category – A category that places a document in a speciﬁc group.

• _created_at – The date and time in ISO 8601 format that the document was created. For example, 2012-03-25T12:30:10+01:00 is the ISO 8601 date-time format for March 25th 2012 at 12:30PM (plus 10 seconds) in Central European Time.

• _data_source_id – The identiﬁer of the data source that contains the document.

• _document_body – The content of the document.

• _document_id – A unique identiﬁer for the document.

• _document_title – The title of the document.

• _excerpt_page_number – The page number in a PDF ﬁle where the document excerpt appears. If your index was created before September 8, 2020, you must re-index your documents before you can use this attribute.

• _faq_id – If this is an FAQ question and answer, a unique identiﬁer for them.

• _file_type – The ﬁle type of the document, such as pdf or doc.

• _last_updated_at – The date and time in ISO 8601 format that the document was last updated.

For example, 2012-03-25T12:30:10+01:00 is the ISO 8601 date-time format for March 25th 2012 at 12:30PM (plus 10 seconds) in Central European Time.

• _source_uri – The URI where the document is available. For example, the URI of the document on a company website.

• _version – An identiﬁer for the speciﬁc version of a document.

• _view_count – The number of times that the document has been viewed.

• _language_code (String) – The code for a language that applies to the document. This defaults to English if you do not specify a language. For more information on supported languages, including their codes, see Adding documents in languages other than English.

You can also create custom ﬁelds, which you can use like the reserved ﬁelds for search and display, and to create facets.

There are four types of custom ﬁelds:

• Date

• Number

• String

• String list

You create a custom field using the console or by using the UpdateIndex API. After you create a custom field, you map it to a document attribute, just as you do with a reserved field. If you added a document to the index with BatchPutDocument API, you map the attributes with the API. For documents indexed from an Amazon S3 data source, you map the attributes using a metadata file that contains a JSON

(17)

Searching indexes

structure that describes the document attributes. For documents indexed with a database or a data source that allows ﬁeld mapping, you map attributes with the console or the data source conﬁguration.

For more information, see Searching indexes.

Searching indexes

After you create an index, you can start searching your documents. For more information, see Searching indexes.

Documents

Amazon Kendra can index many types of documents.

Topics

• Types of documents (p. 5)

• Document attributes (p. 6)

Types of documents

An index can include both structured and unstructured text:

• Structured text

• Frequently asked questions and answers

• Unstructured text

• HTML ﬁles

• Microsoft PowerPoint presentations

• Microsoft Word documents

• Plain text documents

• PDFs

You can add documents directly to an index by calling the BatchPutDocument API. You can also add documents from a data source. For information about adding ﬁles to a data source, see Adding documents from a data source. For an example that shows how to add Microsoft Word documents directly to an index from an Amazon S3 bucket, see Adding documents from an Amazon S3 bucket.

An index can contain multiple documents and multiple types of documents.

HTML

HTML format files. You add an HTML file to an index the same way that you add a plain text file.

Plain text

You can add plain text ﬁles to an index using the BatchPutDocument API or from a data source. For an example of adding a plain text document directly to an index, see Adding documents with the API.

Microsoft Word document

Microsoft Word format ﬁles can be added to an index as binary data, from an Amazon S3 bucket, or from an Amazon Kendra data source.

(18)

Document attributes

Microsoft PowerPoint document

Microsoft PowerPoint format ﬁles can be added to an index as binary data, from an Amazon S3 bucket, or from an Amazon Kendra data source.

Portable document format (PDF)

PDF format ﬁles can be added to an index either as binary data, from an Amazon S3 bucket, or from an Amazon Kendra data source.

Frequently asked questions and answers

Frequently asked question and answer format documents are used to answer questions such as How tall is the Space Needle? You can specify multiple questions that return the same answer. You specify the questions and answers in a comma-separated values (CSV) ﬁle stored in an Amazon S3 bucket.

For an example, see Adding questions and answers directly to an index.

Document attributes

A document has attributes associated with it. Attributes of a document are the properties of a document or what is contained within the structure of a document. For example, each of your documents might contain title, body text, and author. You can also add your own custom attributes of your documents.

Custom attributes are attributes that you specify for your own needs. For example, if your index searches tax documents, you might specify a custom attribute for the type of tax document such as W-2, 1099, and so on.

Before you can use a document attribute in a query, it must be mapped to an index field. For example, the title attribute can be mapped to the field _document_title. For more information, see Mapping fields. To add a new attribute, you must create an index field to map the attribute to. You create index fields using the console or by using the UpdateIndex API.

You can use document attributes to filter responses and to make faceted search suggestions. For example, you can filter a response to only return a specific version of a document, or you can filter searches to only return 1099 type of tax documents that match the search term. For more information, see Filtering queries.

You can also use document attributes to manually tune the query response. For example, you can choose to increase the importance of the title ﬁeld to increase the weight that Amazon Kendra assigns to the ﬁeld when determining which documents to return in the response. For more information, see Tuning search relevance.

If you are adding a document directly to an index, you specify the attributes in the Document input parameter to the the section called “BatchPutDocument” (p. 325) API. You specify the custom attribute values in a DocumentAttribute (p. 560) object array. If you are using a data source, the method that you use to add the document attributes depends on the data source. For more information, see Creating custom document attributes (p. 131).

Data sources

A data source is a location, such as an Amazon Simple Storage Service (Amazon S3) bucket, where you store the documents for indexing. You can automatically synchronize data sources with an Amazon Kendra index so that new, updated, or deleted documents in the data source are also added, updated, or deleted in the index for searching on.

(19)

Queries

Supported data sources are:

• Amazon S3 buckets

• Conﬂuence instances

• Google Workspace Drives

• Amazon RDS for MySQL and Amazon RDS for PostgreSQL databases

• Conﬂuence cloud and Conﬂuence server

• Custom data sources

• Microsoft OneDrive for Business

• Microsoft SharePoint online and SharePoint server (versions 2013 and 2016

• Salesforce sites

• ServiceNow instances

• Amazon Kendra Web Crawler

• Amazon WorkDocs

• Amazon FSx

Supported document formats are: plain text, Microsoft Word, Microsoft PowerPoint, HTML, and PDF. For more information, see Types of documents (p. 5).

NoteTo create an index, you don't need a data source. You can add documents directly to an index.

For more information, see Adding documents directly to an index (p. 85).

To index documents using a data source.

1. Create an index (p. 75).

2. Create a data source (p. 94).

For a walkthrough with the Amazon Kendra console or with the AWS CLI, see Getting started (p. 46).

Queries

To get answers, users query an index. Users can use natural language in their queries. The response contains information, such as the title, a text excerpt, and the location of documents in the index that provide the best answer.

Amazon Kendra uses all of the information that you provide about your documents, not just the contents of the documents, to determine whether a document is relevant to the query. For example, if your index contains information about when documents were last updated, you can tell Amazon Kendra to assign a higher relevance to documents that were updated more recently.

A query can also contain criteria for how to filter the response so that Amazon Kendra returns only documents that satisfy the filter criteria. For example, if you created an index field called department, you can filter the response so that only documents with the department field set to legal are returned.

For more information, see Filtering queries (p. 167).

You can influence the results of a query by tuning the relevance of individual fields in the index. Tuning changes the importance of a field on the results. For example, if you raise the importance of documents with the category new, documents with this category are more likely to be included in the response. For more information, see Tuning search relevance (p. 186).

For more information about using queries, see Searching indexes (p. 157).

(20)

Tagging resources

If you're using the Amazon Kendra console, you can tag resources when you create them or add them later. You can also use the console to update or remove tags.

If you're using the AWS Command Line Interface (AWS CLI) or the Amazon Kendra API, use the following operations to manage tags for your resources:

• CreateDataSource (p. 332) – Apply tags when you create a data source.

• CreateFaq (p. 345) – Apply tags when you create an FAQ.

• CreateIndex (p. 349) – Apply tags when you create an index.

• ListTagsForResource (p. 454) – View the tags associated with a resource.

• TagResource (p. 479) – Add and modify tags for a resource.

• UntagResource (p. 481) – Remove tags from a resource.

Tag restrictions

The following restrictions apply to tags on Amazon Kendra resources:

• Maximum number of tags – 50

• Maximum key length – 128 characters

• Maximum value length – 256 characters

• Valid characters for key and value – a–z, A–Z, space, and the following characters: _ . : / = + - and @

• Keys and values are case sensitive

• Don't use aws: as a preﬁx for keys; it's reserved for AWS use

(21)

Sign up for AWS

Setting up Amazon Kendra

Before using Amazon Kendra, you must have an Amazon Web Services (AWS) account. After you have an AWS account, you can access Amazon Kendra through the Amazon Kendra console, the AWS Command Line Interface (AWS CLI), or the AWS SDKs.

This guide includes examples for AWS CLI, Java, and Python.

Topics

• Sign up for AWS (p. 9)

• Regions and endpoints (p. 9)

• Setting up the AWS CLI (p. 9)

• Setting up the AWS SDKs (p. 10)

Sign up for AWS

When you sign up for Amazon Web Services (AWS), your account is automatically signed up for all services in AWS, including Amazon Kendra. You are charged only for the services that you use.

If you have an AWS account already, skip to the next task. If you don't have an AWS account, use the following procedure to create one.

To sign up for AWS

1. Open https://aws.amazon.com, and then choose Create an AWS Account.

2. Follow the on-screen instructions to complete the account creation. Note your 12-digit AWS account number. Part of the sign-up procedure involves receiving a phone call and entering a PIN using the phone keypad.

3. Create an AWS Identity and Access Management (IAM) admin user. See Creating Your First IAM User and Group in the AWS Identity and Access Management User Guide for instructions.

Regions and endpoints

An endpoint is a URL that is the entry point for a web service. Each endpoint is associated with a speciﬁc AWS region. If you use a combination of the Amazon Kendra console, the AWS CLI, and the Amazon Kendra SDKs, pay attention to their default regions as all Amazon Kendra components of a given campaign (index, query, etc.) must be created in the same region. For the regions and endpoints supported by Amazon Kendra, see Regions and Endpoints.

Setting up the AWS CLI

The AWS Command Line Interface (AWS CLI) is a uniﬁed developer tool for managing AWS services, including Amazon Kendra. We recommend that you install it.

1. To install the AWS CLI, follow the instructions in Installing the AWS Command Line Interface in the AWS Command Line Interface User Guide.

(22)

Setting up the AWS SDKs

2. To configure the AWS CLI and set up a profile to call the AWS CLI, follow the instructions in Configuring the AWS CLI in the AWS Command Line Interface User Guide.

3. To confirm that the AWS CLI profile is configured properly, run the following command:

aws configure --profile default

If your proﬁle has been conﬁgured correctly, you will see output similar to the following:

AWS Access Key ID [****************52FQ]:

AWS Secret Access Key [****************xgyZ]:

Default region name [us-west-2]:

Default output format [json]:

4. To verify that the AWS CLI is conﬁgured for use with Amazon Kendra, run the following commands:

aws kendra help

If the AWS CLI is conﬁgured correctly, you will see a list of the supported AWS CLI commands for Amazon Kendra, Amazon Kendra runtime, and Amazon Kendra events.

Setting up the AWS SDKs

Download and install the AWS SDKs that you want to use. This guide provides examples for Python. For information about other AWS SDKs, see Tools for Amazon Web Services.

(23)

IAM roles for indexes

IAM access roles for Amazon Kendra.

When you create an index, data source, or an FAQ, Amazon Kendra needs access to the AWS resources required to create the Amazon Kendra resource. You must create a AWS Identity and Access Management (IAM) policy before you create the Amazon Kendra resource. When you call the operation, you provide the Amazon Resource Name (ARN) of the role with the policy attached. For example, if you are calling the BatchPutDocument API to add documents from an Amazon S3 bucket, you provide Amazon Kendra with a role with a policy that has access to the bucket.

You can create a new IAM role in the Amazon Kendra console or choose an IAM existing role to use. The console displays roles that have the string "kendra" or "Kendra" in the role name.

The following topics provide details for the required policies. If you create IAM roles using the Amazon Kendra console these policies are created for you.

Topics

• IAM roles for indexes (p. 11)

• IAM roles for the BatchPutDocument API (p. 13)

• IAM roles for data sources (p. 14)

• IAM roles for frequently asked questions (p. 29)

• IAM roles for query suggestions (p. 30)

• IAM roles for principal mapping of users and groups (p. 31)

• IAM roles for AWS Single Sign-On (p. 32)

• IAM roles for Amazon Kendra experiences (p. 32)

• IAM roles for Custom Document Enrichment (p. 34)

IAM roles for indexes

When you create an index, you must provide an IAM role with permission to write to an Amazon CloudWatch. You must also provide a trust policy that allows Amazon Kendra to assume the role. The following are the policies that must be provided.

A role policy to allow Amazon Kendra to access a CloudWatch log.

{

"Version": "2012-10-17", "Statement": [

{

"Effect": "Allow",

"Action": "cloudwatch:PutMetricData", "Resource": "*",

"Condition": { "StringEquals": {

"cloudwatch:namespace": "Kendra"

} } }, {

"Effect": "Allow",

"Action": "logs:DescribeLogGroups", "Resource": "*"

},

(24)

IAM roles for indexes

{

"Effect": "Allow",

"Action": "logs:CreateLogGroup",

"Resource": "arn:aws:logs:region:account ID:log-group:/aws/kendra/*"

}, {

"Effect": "Allow", "Action": [

"logs:DescribeLogStreams", "logs:CreateLogStream", "logs:PutLogEvents"

],

"Resource": "arn:aws:logs:region:account ID:log-group:/aws/kendra/*:log- stream:*"

} ] }

A role policy to allow Amazon Kendra to access AWS Secrets Manager. If you are using user context with Secrets Manager as a key location, you can use the following policy.

{ "Version":"2012-10-17", "Statement":[

{

"Effect":"Allow",

"Action":"cloudwatch:PutMetricData", "Resource":"*",

"Condition":{

"StringEquals":{

"cloudwatch:namespace":"Kendra"

} } }, {

"Effect":"Allow",

"Action":"logs:DescribeLogGroups", "Resource":"*"

}, {

"Effect":"Allow",

"Action":"logs:CreateLogGroup",

"Resource":"arn:aws:logs:region:account ID:log-group:/aws/kendra/*"

}, {

"Effect":"Allow", "Action":[

"logs:DescribeLogStreams", "logs:CreateLogStream", "logs:PutLogEvents"

],

"Resource":"arn:aws:logs:region:account ID:log-group:/aws/kendra/*:log-stream:*"

}, {

"Effect":"Allow", "Action":[

"secretsmanager:GetSecretValue"

],

"Resource":[

"arn:aws:secretsmanager:region:account ID:secret:secret_id"

] }, {

"Effect":"Allow",

(25)

IAM roles for the BatchPutDocument API

"Action":[

"kms:Decrypt"

],

"Resource":[

"arn:aws:kms:region:account ID:key/key_id"

],

"Condition":{

"StringLike":{

"kms:ViaService":[

"secretsmanager.*.amazonaws.com"

] } } } ]}

A trust policy to allow Amazon Kendra to assume a role.

{ "Version": "2012-10-17", "Statement": {

"Effect": "Allow", "Principal": {

"Service": "kendra.amazonaws.com"

},

"Action": "sts:AssumeRole"

} }

IAM roles for the BatchPutDocument API

Warning

Amazon Kendra does not use bucket policy that grants permissions to an Amazon Kendra principal to interact with an Amazon S3 bucket. Instead, it uses IAM roles. Please check Amazon Kendra is not included as a trusted member in your bucket policy to help avoid any data security issues in accidentally granting permissions to arbitrary principals rather than bucket owners.

When you use the BatchPutDocument API to index documents in an Amazon S3 bucket, you must provide Amazon Kendra with an IAM role with access to the bucket. You must also provide a trust policy that allows Amazon Kendra to assume the role. If the documents in the bucket are encrypted, you must provide permission to use the AWS KMS customer master key (CMK) to decrypt the documents.

A required role policy to allow Amazon Kendra to access an Amazon S3 bucket.

{

"s3:GetObject"

],

"Resource": [

"arn:aws:s3:::bucket name/*"

] } ] }

(26)

IAM roles for data sources

A required trust policy to allow Amazon Kendra to assume a role.

"Sid": "AllowKendraToAssumeAttachedRole", "Effect": "Allow",

"Principal": {

},

} }

An optional role policy to allow Amazon Kendra to use an AWS KMS customer master key (CMK) to decrypt documents in an Amazon S3 bucket.

{ "Version": "2012-10-17", "Statement": [

{

"kms:Decrypt"

],

"Resource": [

"arn:aws:kms:region:account ID:key/key ID"

] } ] }

IAM roles for data sources

When you use the CreateDataSource API, you must give Amazon Kendra an IAM role that has permission to access the database resources. The speciﬁc permissions required depend on the data source.

Topics

• IAM roles for Amazon S3 data sources (p. 15)

• IAM roles for Conﬂuence server data sources (p. 16)

• IAM roles for database data sources (p. 17)

• IAM roles for Google Workspace Drive data sources (p. 18)

• IAM roles for Microsoft OneDrive data sources (p. 19)

• IAM roles for Salesforce data sources (p. 20)

• IAM roles for ServiceNow data sources (p. 21)

• IAM roles for Microsoft SharePoint data sources (p. 22)

• Virtual private cloud (VPC) IAM role (p. 24)

• IAM roles for web crawler data sources (p. 25)

• IAM roles for Amazon WorkDocs data sources (p. 25)

• IAM roles for Amazon FSx data sources (p. 27)

(27)

IAM roles for Amazon S3 data sources

Warning

Amazon Kendra does not use bucket policy that grants permissions to an Amazon Kendra principal to interact with an Amazon S3 bucket. Instead, it uses IAM roles. Please check Amazon Kendra is not included as a trusted member in your bucket policy to help avoid any data security issues in accidentally granting permissions to arbitrary principals rather than bucket owners.

When you use an Amazon S3 bucket as a data source, you supply a role that has permission to access the bucket, and to use the BatchPutDocument and BatchDeleteDocument operations. If the documents in the Amazon S3 bucket are encrypted, you must provide permission to use the AWS KMS customer master key (CMK) to decrypt the documents.

A required role policy to allow Amazon Kendra to use an Amazon S3 bucket as a data source.

{

"Action": [

"s3:GetObject"

],

"Resource": [

],

"Effect": "Allow"

}, {

"Action": [

"s3:ListBucket"

],

"Resource": [

"arn:aws:s3:::bucket name"

],

"Effect": "Allow"

}, {

"kendra:BatchPutDocument", "kendra:BatchDeleteDocument"

],

"Resource": [

"arn:aws:kendra:region:account ID:index/index ID"

] } ] }

An optional role policy to allow Amazon Kendra to use an AWS KMS customer master key (CMK) to decrypt documents in an Amazon S3 bucket.

{

"kms:Decrypt"

],

"Resource": [

(28)

IAM roles for Conﬂuence server data sources

] } ] }

IAM roles for Conﬂuence server data sources

When you use a Conﬂuence server as a data source, you provide a role with the following policies:

• Permission to access the AWS Secrets Manager secret that contains the credentials necessary to connect to the Conﬂuence server. For more information about the contents of the secret, see Using an Atlassian Conﬂuence data source (p. 100).

• Permission to use the AWS KMS customer master key (CMK) to decrypt the user name and password secret stored by Secrets Manager.

• Permission to use the BatchPutDocument and BatchDeleteDocument operations to update the index.

{ "Effect": "Allow", "Action": [

],

"Resource": [

"arn:aws:secretsmanager:region:account ID:secret:secret ID"

] },

{ "Effect": "Allow", "Action": [ "kms:Decrypt"

],

"Resource": [

],

"Condition": { "StringLike": { "kms:ViaService": [

] } } }, {

],

"Resource": "arn:aws:kendra:region:account ID:index/index ID"

}]}

If you are using a VPC, provide a policy that gives Amazon Kendra access to the required resources. See Virtual private cloud (VPC) IAM role (p. 24) for the required policy.

(29)

IAM roles for database data sources

When you use a database as a data source, you provide Amazon Kendra with a role that has the permissions necessary for connecting to the database. These include:

• Permission to access the AWS Secrets Manager secret that contains the user name and password for the database site. For more information about the contents of the secret, see Using a database data source (p. 109).

• Permission to access the Amazon S3 bucket that contains the SSL certiﬁcate used to communicate with the database site.

{

],

"Resource": [

] }, {

"kms:Decrypt"

],

"Resource": [

] }, {

],

"Resource": [

"Condition": { "StringLike": {

"kms:ViaService": [

"kendra.amazonaws.com"

] } } }, {

"s3:GetObject"

],

"Resource": [

]

(30)

IAM roles for Google Drive data sources

} ] }

There are two optional policies that you might use with a database data source.

If you have encrypted the Amazon S3 bucket that contains the SSL certiﬁcate used to communicate with the database, provide a policy to give Amazon Kendra access to the key.

{

"kms:Decrypt"

],

"Resource": [

] } ] }

If you are using a VPC, provide a policy that gives Amazon Kendra access to the required resources. See Virtual private cloud (VPC) IAM role (p. 24) for the required policy.

IAM roles for Google Workspace Drive data sources

When you use a Google Workspace Drive data source, you provide Amazon Kendra with a role that has the permissions necessary for connecting to the site. These include:

• Permission to get and decrypt the AWS Secrets Manager secret that contains the client account email, admin account email, and private key necessary to connect to the Google Drive site. For more information about the contents of the secret, see Using a Google Workspace Drive data source (p. 112).

• Permission to use the BatchPutDocument and BatchDeleteDocument APIs.

The following IAM policy provides the necessary permissions:

{

],

"Resource": [

] },

],

"Resource": [

(31)

IAM roles for OneDrive data sources

],

] } } },

],

}]}

IAM roles for Microsoft OneDrive data sources

When you use a Microsoft OneDrive data source, you provide Amazon Kendra with a role that has the permissions necessary for connecting to the site. These include:

• Permission to get and decrypt the AWS Secrets Manager secret that contains the application ID and secret key necessary to connect to the OneDrive site. For more information about the contents of the secret, see Using a Microsoft OneDrive data source (p. 114).

• Permission to use the BatchPutDocument and BatchDeleteDocument APIs.

The following IAM policy provides the necessary permissions:

],

"Resource": [

] }, {

"Effect": "Allow", "Action": [ "kms:Decrypt"

],

"Resource": [

],

] } } },

{ "Effect": "Allow",

(32)

IAM roles for Salesforce data sources

"Action": [

],

}]

}

If you are storing the list of users to index in an Amazon S3 bucket, you must also provide permission to use the S3 GetObject operation. The following IAM policy provides the necessary permissions:

{

],

"Resource": [

] }, {

"Action": [ "s3:GetObject"

],

"Resource": [

"arn:aws:s3:::input_bucket_name/*"

],

"Effect": "Allow"

},

],

"Resource": [

"arn:aws:kms:region:account ID:key/[[key IDs]]"

],

"secretsmanager.*.amazonaws.com", "s3.*.amazonaws.com"

] } } }, {

],

}]

}

IAM roles for Salesforce data sources

When you use a Salesforce as a data source, you provide a role with the following policies:

(33)

IAM roles for ServiceNow data sources

• Permission to access the AWS Secrets Manager secret that contains the user name and password for the Salesforce site. For more information about the contents of the secret, see Using a Salesforce data source (p. 116).

{

],

"Resource": [

] }, {

"Effect": "Allow", "Action": [ "kms:Decrypt"

],

"Resource": [

],

] } } }, {

],

}]

}

IAM roles for ServiceNow data sources

When you use a ServiceNow as a data source, you provide a role with the following policies:

• Permission to access the Secrets Manager secret that contains the user name and password for the ServiceNow site. For more information about the contents of the secret, see Using a ServiceNow data source (p. 119).

(34)

IAM roles for SharePoint data sources

{

],

"Resource": [

] },

],

"Resource": [

],

] } } }, {

],

}]}

IAM roles for Microsoft SharePoint data sources

For a Microsoft SharePoint data source, you provide a role with the following policies.

• Permission to access the AWS Secrets Manager secret that contains the user name and password for the SharePoint site. For more information about the contents of the secret, see Using a Microsoft SharePoint data source (p. 124).

• Permission to use the AWS KMS customer master key (CMK) to decrypt the user name and password secret stored by AWS Secrets Manager.

• Permission to access the Amazon S3 bucket that contains the SSL certiﬁcate used to communicate with the SharePoint site.

You must also attach a trust policy that allows Amazon Kendra to assume the role.

{

(35)

IAM roles for SharePoint data sources

],

"Resource": [

] }, {

"kms:Decrypt"

],

"Resource": [

] }, {

],

"Resource": [

],

"Condition": { "StringLike": {

"kms:ViaService": [

"kendra.amazonaws.com"

] } } }, {

"s3:GetObject"

],

"Resource": [

] } ] }

You must apply the following trust policy to the role.

"Effect": "Allow", "Principal": {

},

} }

If you have encrypted the Amazon S3 bucket that contains the SSL certiﬁcate used to communicate with the Sharepoint site, provide a policy to give Amazon Kendra access to the key.

{

Amazon Kendra

Amazon Kendra

Developer Guide

Amazon Kendra: Developer Guide

Table of Contents

What is Amazon Kendra?

Beneﬁts of Amazon Kendra

Amazon Kendra Developer Edition

Amazon KendraEnterprise Edition

Pricing for Amazon Kendra

Are you a ﬁrst-time Amazon Kendra user?

How Amazon Kendra works

Index

Index ﬁelds

Searching indexes

Documents

Types of documents

HTML

Plain text

Microsoft Word document

Microsoft PowerPoint document

Portable document format (PDF)

Frequently asked questions and answers

Document attributes

Data sources

Queries

Tags

Tagging resources

Tag restrictions

Setting up Amazon Kendra

Sign up for AWS

Regions and endpoints

Setting up the AWS CLI

Setting up the AWS SDKs

IAM access roles for Amazon Kendra.

IAM roles for indexes

IAM roles for the BatchPutDocument API

IAM roles for data sources

IAM roles for Amazon S3 data sources

IAM roles for Conﬂuence server data sources

IAM roles for database data sources

IAM roles for Google Workspace Drive data sources

IAM roles for Microsoft OneDrive data sources

IAM roles for Salesforce data sources

IAM roles for ServiceNow data sources

IAM roles for Microsoft SharePoint data sources