Amazon Kendra
Developer Guide
Amazon Kendra: Developer Guide
Copyright © Amazon Web Services, Inc. and/or its affiliates. All rights reserved.
Amazon's trademarks and trade dress may not be used in connection with any product or service that is not Amazon's, in any manner that is likely to cause confusion among customers, or in any manner that disparages or discredits Amazon. All other trademarks not owned by Amazon are the property of their respective owners, who may or may not be affiliated with, connected to, or sponsored by Amazon.
Table of Contents
... xii
What is Amazon Kendra? ... 1
Benefits of Amazon Kendra ... 1
Amazon Kendra Developer Edition ... 1
Amazon KendraEnterprise Edition ... 2
Pricing for Amazon Kendra ... 2
Are you a first-time Amazon Kendra user? ... 2
How Amazon Kendra works ... 3
Index ... 3
Index fields ... 4
Searching indexes ... 5
Documents ... 5
Types of documents ... 5
Document attributes ... 6
Data sources ... 6
Queries ... 7
Tags ... 8
Tagging resources ... 8
Tag restrictions ... 8
Setting up Amazon Kendra ... 9
Sign up for AWS ... 9
Regions and endpoints ... 9
Setting up the AWS CLI ... 9
Setting up the AWS SDKs ... 10
IAM access roles for Amazon Kendra. ... 11
IAM roles for indexes ... 11
IAM roles for the BatchPutDocument API ... 13
IAM roles for data sources ... 14
IAM roles for Amazon S3 data sources ... 15
IAM roles for Confluence server data sources ... 16
IAM roles for database data sources ... 17
IAM roles for Google Drive data sources ... 18
IAM roles for OneDrive data sources ... 19
IAM roles for Salesforce data sources ... 20
IAM roles for ServiceNow data sources ... 21
IAM roles for SharePoint data sources ... 22
Virtual private cloud (VPC) IAM role ... 24
IAM roles for web crawler data sources ... 25
IAM roles for Amazon WorkDocs data sources ... 25
IAM roles for Amazon FSx data sources ... 27
IAM roles for frequently asked questions ... 29
IAM roles for query suggestions ... 30
IAM roles for principal mapping of users and groups ... 31
IAM roles for AWS Single Sign-On ... 32
IAM roles for Amazon Kendra experiences ... 32
IAM roles for Amazon Kendra search experience ... 32
IAM roles for Custom Document Enrichment ... 34
Deploying [SEE NEW] ... 36
Overview ... 36
Prerequisites ... 36
Setting up the example ... 37
Main search page ... 37
Search component ... 37
Results component ... 37
Facets component ... 37
Pagination component ... 38
Deploying a search application with no code [NEW] ... 38
How the search Experience Builder works ... 38
Design and tune your search experience ... 38
Providing access to your search page ... 39
Configuring a search experience ... 40
Adjusting capacity ... 43
Viewing capacity ... 43
Adding and removing capacity ... 43
Query suggestions capacity ... 44
Amazon Kendra experience capacity ... 44
Search experience capacity ... 44
Adaptive query bursting ... 44
Getting started [SEE NEW] ... 46
Prerequisites ... 46
Amazon Kendra resources: AWS CLI, SDK, console ... 47
Getting started with an S3 bucket (console) ... 50
Getting started (AWS CLI) ... 51
Getting started (SDK for Python (Boto3)) ... 52
Getting started (SDK for Java) ... 54
Getting started with Confluence (console) ... 57
Getting started with Google Drive (console) ... 58
Getting started with OneDrive (console) ... 60
Getting started with S3 (console) ... 62
Getting started with MySQL (console) ... 63
Getting started with Salesforce (console) ... 64
Getting started with ServiceNow (console) ... 65
Getting started with SharePoint (console) ... 66
Getting started with Amazon Kendra Web Crawler (console) ... 68
Getting started with Amazon WorkDocs (console) ... 70
Getting started with an AWS SSO identity source (console) ... 71
Changing your AWS SSO identity source ... 72
Getting started with Amazon FSx (console) [NEW] ... 73
Creating an index [SEE NEW] ... 75
Controlling access to documents in an index ... 78
Using OpenID ... 78
Using a JSON Web Token (JWT) with a shared secret ... 80
Using a JSON Web Token (JWT) with a public key ... 82
Using JSON ... 84
Adding documents directly to an index ... 85
Adding documents with the API ... 86
Adding documents from an Amazon S3 bucket ... 87
Adding questions and answers directly to an index ... 89
Basic .csv file ... 90
Custom .csv file ... 90
JSON file ... 92
Using your FAQ file ... 93
Adding documents from a data source [SEE NEW] ... 94
Setting an update schedule ... 95
Setting a language ... 95
Using an Amazon S3 data source ... 95
Using a Confluence data source ... 100
Using a custom data source ... 104
Using a database data source ... 109
Using a Google Drive data source ... 112
Using a OneDrive data source ... 114
Using a Salesforce data source ... 116
Using a ServiceNow data source ... 119
Using a SharePoint data source ... 124
Using a web crawler data source ... 126
Using an Amazon WorkDocs data source ... 128
Using an Amazon FSx data source [NEW] ... 129
Adding documents in languages other than English ... 130
Creating custom document attributes ... 131
Adding custom attributes with the BatchPutDocument API ... 133
Adding custom attributes to an Amazon S3 data source ... 133
Mapping data source fields ... 133
Customizing document metadata during ingestion [NEW] ... 135
How Custom Document Enrichment works ... 136
Basic data manipulation ... 136
Advanced data manipulation ... 142
Data contracts for Lambda functions ... 147
Configuring Amazon Kendra to use a VPC ... 150
Connecting to a database in a VPC ... 151
Deleting an index ... 154
Deleting data sources ... 154
Searching indexes [SEE NEW] ... 157
Querying an index ... 157
Prerequisites ... 158
Searching an index (console) ... 158
Searching an index (SDK) ... 158
Searching with advanced query syntax ... 160
Searching in languages ... 163
Browsing an index [NEW] ... 165
Filtering queries ... 167
Facets ... 168
Using document attributes to filter search results ... 169
Filtering a document's attributes ... 170
Filtering on user context ... 170
Filtering by user token ... 170
Filtering by user ID ... 170
Filtering by user attribute ... 171
User context filtering for documents added directly to an index ... 173
User context filtering for frequently asked questions ... 173
User context filtering for database data sources ... 173
User context filtering for Confluence data sources ... 173
User context filtering for Google Drive data sources ... 174
User context filtering for Microsoft OneDrive data sources ... 174
User context filtering for Amazon S3 data sources ... 175
User context filtering for Salesforce data sources ... 175
User context filtering for ServiceNow data sources ... 176
User context filtering for Microsoft SharePoint data sources ... 176
User context filtering for Amazon WorkDocs data sources ... 176
User context filtering for Amazon FSx data sources ... 177
Query responses ... 177
Query suggestions ... 179
Query spell checker [NEW] ... 179
Using the query spell checker with default limits ... 180
Tuning responses ... 180
Sorting responses ... 181
Response types ... 182
Answer ... 182
Document ... 183
Question and answer ... 184
Tuning search relevance ... 186
Relevance tuning at the index level ... 187
Relevance tuning at the query level ... 187
Gaining insights with search analytics [NEW] ... 189
Metrics for search ... 189
Click-through rate ... 190
Zero click rate ... 190
Zero search results rate ... 190
Instant answer rate ... 190
Top queries ... 190
Top queries with zero clicks ... 191
Top queries with zero search results ... 191
Top clicked on documents ... 191
Total queries ... 191
Total documents ... 191
Example of retrieving metric data ... 192
From metrics to actionable insights ... 193
Visualizing and reporting search analytics ... 193
Total queries graph ... 193
Click-through rate graph ... 194
Zero click rate graph ... 194
Zero search results rate graph ... 194
Instant answer rate graph ... 194
Suggesting popular search queries ... 195
Query suggestions settings ... 195
Block certain queries from suggestions ... 199
Clear suggestions ... 203
No suggestions available ... 203
Submitting feedback for incremental learning ... 204
Using the Amazon Kendra JavaScript library to submit feedback ... 205
Step 1: Insert a script tag into your Amazon Kendra search application ... 205
Step 2: Add the feedback token to search results ... 206
Step 3: Test the feedback script ... 207
Using the Amazon Kendra API to submit feedback ... 207
Adding synonyms to an index ... 209
Creating a thesaurus file ... 210
Adding a thesaurus to an index ... 211
Updating a thesaurus ... 214
Deleting a thesaurus ... 217
Highlights in search results ... 218
Tutorial: Building an intelligent search solution ... 219
Prerequisites ... 220
Step 1: Adding documents ... 220
Downloading the sample dataset ... 221
Creating an Amazon S3 bucket ... 222
Creating data and metadata folders in your S3 bucket ... 224
Uploading the input data ... 226
Step 2: Detecting entities ... 227
Running an Amazon Comprehend entities analysis job ... 227
Step 3: Formatting the metadata ... 233
Downloading and extracting the Amazon Comprehend output ... 234
Uploading the output into the S3 bucket ... 236
Converting the output to Amazon Kendra metadata format ... 237
Cleaning up your Amazon S3 bucket ... 240
Step 4: Creating an index and ingesting the metadata ... 242
Creating an Amazon Kendra index ... 242
Updating the IAM role for Amazon S3 access ... 248
Creating Amazon Kendra custom search index fields ... 250
Adding the Amazon S3 bucket as a data source for the index ... 254
Syncing the Amazon Kendra index ... 256
Step 5: Querying the index ... 258
Querying your Amazon Kendra index ... 259
Filtering your search results ... 263
Step 6: Cleaning up ... 265
Cleaning up your files ... 265
... 266
Monitoring and Logging ... 267
Monitoring indexes ... 267
Monitoring Amazon Kendra API calls with CloudTrail ... 270
Amazon Kendra Information in CloudTrail ... 271
Example: Amazon Kendra log file Entries ... 271
Monitoring Amazon Kendra with CloudWatch ... 272
Viewing Amazon Kendra metrics ... 272
Creating an alarm ... 272
CloudWatch Metrics for index synchronization Jobs ... 273
Metrics for Amazon Kendra data sources ... 274
Metrics for indexed documents ... 276
Monitoring Amazon Kendra with CloudWatch Logs ... 276
Data source log streams ... 277
Document log streams ... 278
Security ... 279
Data protection ... 279
Encryption at rest ... 280
Encryption in transit ... 280
Key management ... 280
VPC endpoints (AWS PrivateLink) ... 281
Considerations for Amazon Kendra VPC endpoints ... 281
Creating an interface VPC endpoint for Amazon Kendra ... 281
Creating a VPC endpoint policy for Amazon Kendra ... 282
Identity and access management ... 282
Audience ... 283
Authenticating with identities ... 283
Managing access using policies ... 285
How Amazon Kendra works with IAM ... 287
Identity-based policy examples ... 290
AWS managed policies ... 294
Troubleshooting ... 297
Logging and monitoring in Amazon Kendra ... 299
Compliance validation ... 299
Resilience ... 300
Infrastructure security ... 300
Quotas ... 301
Supported regions ... 301
Quotas ... 301
Troubleshooting ... 304
Troubleshooting data sources ... 304
My documents were not indexed ... 304
My synchronization job failed ... 304
My synchronization job is incomplete ... 305
My synchronization job succeeded but there are no indexed documents ... 305
Troubleshooting document search results ... 305
Why do I only see 100 results? ... 305
Why are documents I expect to see missing? ... 306
Why do I see documents that have an ACL policy? ... 306
Troubleshooting general issues ... 306
Document history ... 307
API Reference ... 311
Actions ... 311
AssociateEntitiesToExperience ... 313
AssociatePersonasToEntities ... 316
BatchDeleteDocument ... 319
BatchGetDocumentStatus ... 322
BatchPutDocument ... 325
ClearQuerySuggestions ... 330
CreateDataSource ... 332
CreateExperience ... 342
CreateFaq ... 345
CreateIndex ... 349
CreateQuerySuggestionsBlockList ... 354
CreateThesaurus ... 358
DeleteDataSource ... 362
DeleteExperience ... 364
DeleteFaq ... 366
DeleteIndex ... 368
DeletePrincipalMapping ... 370
DeleteQuerySuggestionsBlockList ... 373
DeleteThesaurus ... 375
DescribeDataSource ... 377
DescribeExperience ... 387
DescribeFaq ... 391
DescribeIndex ... 395
DescribePrincipalMapping ... 400
DescribeQuerySuggestionsBlockList ... 403
DescribeQuerySuggestionsConfig ... 407
DescribeThesaurus ... 410
DisassociateEntitiesFromExperience ... 414
DisassociatePersonasFromEntities ... 417
GetQuerySuggestions ... 420
GetSnapshots ... 423
ListDataSources ... 427
ListDataSourceSyncJobs ... 430
ListEntityPersonas ... 434
ListExperienceEntities ... 437
ListExperiences ... 440
ListFaqs ... 443
ListGroupsOlderThanOrderingId ... 446
ListIndices ... 449
ListQuerySuggestionsBlockLists ... 451
ListTagsForResource ... 454
ListThesauri ... 456
PutPrincipalMapping ... 459
Query ... 463
StartDataSourceSyncJob ... 472
StopDataSourceSyncJob ... 474
SubmitFeedback ... 476
TagResource ... 479
UntagResource ... 481
UpdateDataSource ... 483
UpdateExperience ... 492
UpdateIndex ... 495
UpdateQuerySuggestionsBlockList ... 499
UpdateQuerySuggestionsConfig ... 502
UpdateThesaurus ... 505
Data Types ... 507
AccessControlListConfiguration ... 511
AclConfiguration ... 512
AdditionalResultAttribute ... 513
AdditionalResultAttributeValue ... 514
AttributeFilter ... 515
AuthenticationConfiguration ... 517
BasicAuthenticationConfiguration ... 518
BatchDeleteDocumentResponseFailedDocument ... 519
BatchGetDocumentStatusResponseError ... 520
BatchPutDocumentResponseFailedDocument ... 521
CapacityUnitsConfiguration ... 522
ClickFeedback ... 523
ColumnConfiguration ... 524
ConfluenceAttachmentConfiguration ... 526
ConfluenceAttachmentToIndexFieldMapping ... 527
ConfluenceBlogConfiguration ... 528
ConfluenceBlogToIndexFieldMapping ... 529
ConfluenceConfiguration ... 530
ConfluencePageConfiguration ... 533
ConfluencePageToIndexFieldMapping ... 534
ConfluenceSpaceConfiguration ... 535
ConfluenceSpaceToIndexFieldMapping ... 537
ConnectionConfiguration ... 538
ContentSourceConfiguration ... 540
Correction ... 541
CustomDocumentEnrichmentConfiguration ... 542
DatabaseConfiguration ... 544
DataSourceConfiguration ... 546
DataSourceGroup ... 548
DataSourceSummary ... 549
DataSourceSyncJob ... 551
DataSourceSyncJobMetrics ... 553
DataSourceSyncJobMetricTarget ... 555
DataSourceToIndexFieldMapping ... 556
DataSourceVpcConfiguration ... 557
Document ... 558
DocumentAttribute ... 560
DocumentAttributeCondition ... 561
DocumentAttributeTarget ... 563
DocumentAttributeValue ... 565
DocumentAttributeValueCountPair ... 566
DocumentInfo ... 567
DocumentMetadataConfiguration ... 568
DocumentRelevanceConfiguration ... 569
DocumentsMetadataConfiguration ... 570
EntityConfiguration ... 571
EntityDisplayData ... 572
EntityPersonaConfiguration ... 574
ExperienceConfiguration ... 575
ExperienceEndpoint ... 576
ExperienceEntitiesSummary ... 577
ExperiencesSummary ... 578
Facet ... 580
FacetResult ... 581
FailedEntity ... 582
FaqStatistics ... 583
FaqSummary ... 584
FsxConfiguration ... 586
GoogleDriveConfiguration ... 588
GroupMembers ... 590
GroupOrderingIdSummary ... 591
GroupSummary ... 593
HierarchicalPrincipal ... 594
Highlight ... 595
HookConfiguration ... 596
IndexConfigurationSummary ... 598
IndexStatistics ... 600
InlineCustomDocumentEnrichmentConfiguration ... 601
JsonTokenTypeConfiguration ... 602
JwtTokenTypeConfiguration ... 603
MemberGroup ... 605
MemberUser ... 606
OneDriveConfiguration ... 607
OneDriveUsers ... 609
PersonasSummary ... 610
Principal ... 612
ProxyConfiguration ... 613
QueryResultItem ... 614
QuerySuggestionsBlockListSummary ... 616
Relevance ... 618
RelevanceFeedback ... 620
S3DataSourceConfiguration ... 621
S3Path ... 623
SalesforceChatterFeedConfiguration ... 624
SalesforceConfiguration ... 626
SalesforceCustomKnowledgeArticleTypeConfiguration ... 629
SalesforceKnowledgeArticleConfiguration ... 631
SalesforceStandardKnowledgeArticleTypeConfiguration ... 632
SalesforceStandardObjectAttachmentConfiguration ... 633
SalesforceStandardObjectConfiguration ... 634
ScoreAttributes ... 636
Search ... 637
SeedUrlConfiguration ... 638
ServerSideEncryptionConfiguration ... 639
ServiceNowConfiguration ... 640
ServiceNowKnowledgeArticleConfiguration ... 642
ServiceNowServiceCatalogConfiguration ... 644
SharePointConfiguration ... 646
SiteMapsConfiguration ... 649
SortingConfiguration ... 650
SpellCorrectedQuery ... 652
SpellCorrectionConfiguration ... 653
SqlConfiguration ... 654
Status ... 655
Suggestion ... 656
SuggestionHighlight ... 657
SuggestionTextWithHighlights ... 658
SuggestionValue ... 659
Tag ... 660
TextDocumentStatistics ... 661
TextWithHighlights ... 662
ThesaurusSummary ... 663
TimeRange ... 665
Urls ... 666
UserContext ... 667
UserGroupResolutionConfiguration ... 669
UserIdentityConfiguration ... 670
UserTokenConfiguration ... 671
Warning ... 672
WebCrawlerConfiguration ... 673
WorkDocsConfiguration ... 676
Common Errors ... 677
Common Parameters ... 679
AWS glossary ... 682
Benefits of Amazon Kendra
What is Amazon Kendra?
Amazon Kendra is a highly accurate and intelligent search service that enables your users to search unstructured and structured data using natural language processing and advanced search algorithms. It returns specific answers to questions, giving users an experience that's close to interacting with a human expert. It is highly scalable and capable of meeting performance demands, tightly integrated with other AWS services such as Amazon S3 and Amazon Lex, and offers enterprise-grade security.
Amazon Kendra users can ask the following types of questions, or queries:
• Factoid questions — Simple who, what, when, or where questions, such as Who is on duty today?
or Where is the nearest service center to Seattle? Factoid questions have fact-based answers that can be returned in the form of a single word or phrase. The answer is retrieved from a FAQ or from your indexed documents.
• Descriptive questions — Questions whose answer could be a sentence, passage, or an entire
document. For example, How do I connect my Echo Plus to my network? or How do I get tax benefits for lower income families?.
• Keyword searches — Questions where the intent and scope are not clear. For example, keynote address. As 'address' can often have several meanings, Amazon Kendra can infer the user's intent behind the search query to return relevant information aligned with the user's intended meaning.
Amazon Kendra uses deep learning models to handle this kind of query.
Benefits of Amazon Kendra
Amazon Kendra has the following benefits:
• Accuracy — Unlike traditional search services that use keyword searches where results are based on basic keyword matching and ranking, Amazon Kendra attempts to understand the context of the question. Amazon Kendra searches across your data and goes beyond traditional search to return the most relevant word, snippet, or document for your query. Amazon Kendra uses machine learning to improve search results over time.
• Simplicity — Amazon Kendra provides a console and API for managing your documents that you want to search. You can use a simple search API to integrate Amazon Kendra into your client applications, such as websites or mobile applications.
• Connectivity — Amazon Kendra can connect to third-party data repositories or data sources such as Microsoft SharePoint. You can easily index and search your documents using your data source.
• User Access Control — Amazon Kendra delivers highly secure enterprise search for your search applications. Your search results reflect the security model of your organization. Customers are responsible for authenticating and authorizing users to gain access to their search application.
Amazon Kendra Developer Edition
The Amazon Kendra Developer Edition provides all of the features of Amazon Kendra at a lower cost. It includes a free tier that provides 750 hours of use. The Developer Edition is ideal to explore how Amazon Kendra indexes your documents, to try out features, and to develop applications that use Amazon Kendra.
The developer edition provides the following:
Amazon KendraEnterprise Edition
• Up to 5 indexes with up to 5 data sources each.
• 10,000 documents or 3 GB of extracted text.
• Approximately 4,000 queries per day or 0.05 queries per second.
• Runs in 1 availability zone (AZ) – see Availability Zones (data centers in AWS regions)
You should not use the Developer Edition for a production application. The Developer Edition doesn't provide any guarantees of latency or availability.
Amazon KendraEnterprise Edition
Use Amazon Kendra Enterprise Edition when you want to index your entire enterprise document library or for when your application is ready for use in a production environment.
The enterprise edition provides the following:
• Up to 5 indexes with up to 50 data sources each.
• 100,000 documents or 30 GB of extracted text.
• Approximately 8,000 queries per day or 0.1 queries per second.
• Runs in 3 availability zones (AZ) – see Availability Zones (data centers in AWS regions)
You can increase this quota using the Service Quotas console.
Pricing for Amazon Kendra
You can get started for free with the Amazon Kendra Developer Edition that provides usage of up to 750 hours for the first 30 days. After your trial expires, you are charged for all provisioned Amazon Kendra indexes, even if they are empty and no queries are executed. After the trial expires, there are additional charges for scanning and syncing documents using the Amazon Kendra data sources.
For a complete list of charges and prices, see Amazon Kendra pricing
Are you a first-time Amazon Kendra user?
If you are a first-time user of Amazon Kendra, we recommend that you read the following sections in order:
1.How Amazon Kendra works (p. 3) – Introduces the Amazon Kendra components and describes how you use them to create a search solution.
2.Getting started (p. 46) – Explains how to set up your account and test the Amazon Kendra search API.
3.Creating an index (p. 75) – Explains how to use Amazon Kendra to create a search index and to add data sources to sync your documents.
4.Adding documents directly to an index (p. 85) – Explains how to add documents directly to an Amazon Kendra index.
5.Searching indexes (p. 157) – Explains how to use the Amazon Kendra search API to search an index.
6.Deploying Amazon Kendra (p. 36) – Provides a sample application you can use to deploy Amazon Kendra to your website.
Index
How Amazon Kendra works
Amazon Kendra provides the functionality to your search application. It indexes your documents directly or from your third-party document repository and intelligently serves relevant information to your users.
You can use Amazon Kendra to create an updatable index of documents of a variety of types, including plain text, HTML files, Microsoft Word documents, Microsoft PowerPoint presentations, and PDF files.
Amazon Kendra integrates with other services. For example, you can power Amazon Lex chat bots with Amazon Kendra search to provide answers to users' questions. You can use Amazon S3 bucket as a data source for your Amazon Kendra index. And you can set up AWS Identity and Access Management to control access to Amazon Kendra resources.
Amazon Kendra has the following components:
• The index, which allows your documents to be searched. You create the index from a source or repository of documents.
• A source repository, which contains the documents to index.
• A data source that syncs the documents in your source repository to an Amazon Kendra index. You can automatically synchronize a data source with an Amazon Kendra index so that new, updated, and deleted files in the source repository are updated in the index.
• A document addition API, that adds documents directly to the index.
You can use Amazon Kendra through the console or the API. You can create, update, and delete indexes.
Deleting an index deletes all data sources and permanently deletes all of your document information from Amazon Kendra.
Topics
• Index (p. 3)
• Documents (p. 5)
• Data sources (p. 6)
• Queries (p. 7)
• Tags (p. 8)
Index
An index holds the contents of your documents and is structured in a way to make the documents searchable. The way you add documents to the index depends on how you store your documents.
• If you store your documents in some kind of repository, such as an Amazon S3 bucket or a Microsoft SharePoint site, you use a data source connector to index your documents from your repository.
• If you don't store your documents in a respository, you use the BatchPutDocument API to directly index your documents.
• For FAQ questions and answers, which must be stored in an Amazon Kendra (Amazon S3) bucket, you upload them from the bucket
You can create indexes with the Amazon Kendra console, the AWS CLI, or an AWS SDK. For information about the types of documents that can be indexed, see Types of documents (p. 5).
Index fields
Index fields
An index contains fields that you map to the attributes of your document. Attributes could include, for example, the document title, main body text, last updated date, and other attributes contained within the structure of your documents. You can also create custom attributes such as the figure description, or the business department the document is associated with. Index fields, which you map to your document attributes, provide the schema for your index. Amazon Kendra uses the fields to search your documents.
After you map your fields to your document attributes, you can use the information in the field for searching on.
Amazon Kendra has 15 reserved fields, which you can map to your document attributes:
• _authors – A list of one or more authors responsible for the content of the document.
• _category – A category that places a document in a specific group.
• _created_at – The date and time in ISO 8601 format that the document was created. For example, 2012-03-25T12:30:10+01:00 is the ISO 8601 date-time format for March 25th 2012 at 12:30PM (plus 10 seconds) in Central European Time.
• _data_source_id – The identifier of the data source that contains the document.
• _document_body – The content of the document.
• _document_id – A unique identifier for the document.
• _document_title – The title of the document.
• _excerpt_page_number – The page number in a PDF file where the document excerpt appears. If your index was created before September 8, 2020, you must re-index your documents before you can use this attribute.
• _faq_id – If this is an FAQ question and answer, a unique identifier for them.
• _file_type – The file type of the document, such as pdf or doc.
• _last_updated_at – The date and time in ISO 8601 format that the document was last updated.
For example, 2012-03-25T12:30:10+01:00 is the ISO 8601 date-time format for March 25th 2012 at 12:30PM (plus 10 seconds) in Central European Time.
• _source_uri – The URI where the document is available. For example, the URI of the document on a company website.
• _version – An identifier for the specific version of a document.
• _view_count – The number of times that the document has been viewed.
• _language_code (String) – The code for a language that applies to the document. This defaults to English if you do not specify a language. For more information on supported languages, including their codes, see Adding documents in languages other than English.
You can also create custom fields, which you can use like the reserved fields for search and display, and to create facets.
There are four types of custom fields:
• Date
• Number
• String
• String list
You create a custom field using the console or by using the UpdateIndex API. After you create a custom field, you map it to a document attribute, just as you do with a reserved field. If you added a document to the index with BatchPutDocument API, you map the attributes with the API. For documents indexed from an Amazon S3 data source, you map the attributes using a metadata file that contains a JSON
Searching indexes
structure that describes the document attributes. For documents indexed with a database or a data source that allows field mapping, you map attributes with the console or the data source configuration.
For more information, see Searching indexes.
Searching indexes
After you create an index, you can start searching your documents. For more information, see Searching indexes.
Documents
Amazon Kendra can index many types of documents.
Topics
• Types of documents (p. 5)
• Document attributes (p. 6)
Types of documents
An index can include both structured and unstructured text:
• Structured text
• Frequently asked questions and answers
• Unstructured text
• HTML files
• Microsoft PowerPoint presentations
• Microsoft Word documents
• Plain text documents
• PDFs
You can add documents directly to an index by calling the BatchPutDocument API. You can also add documents from a data source. For information about adding files to a data source, see Adding documents from a data source. For an example that shows how to add Microsoft Word documents directly to an index from an Amazon S3 bucket, see Adding documents from an Amazon S3 bucket.
An index can contain multiple documents and multiple types of documents.
HTML
HTML format files. You add an HTML file to an index the same way that you add a plain text file.
Plain text
You can add plain text files to an index using the BatchPutDocument API or from a data source. For an example of adding a plain text document directly to an index, see Adding documents with the API.
Microsoft Word document
Microsoft Word format files can be added to an index as binary data, from an Amazon S3 bucket, or from an Amazon Kendra data source.
Document attributes
Microsoft PowerPoint document
Microsoft PowerPoint format files can be added to an index as binary data, from an Amazon S3 bucket, or from an Amazon Kendra data source.
Portable document format (PDF)
PDF format files can be added to an index either as binary data, from an Amazon S3 bucket, or from an Amazon Kendra data source.
Frequently asked questions and answers
Frequently asked question and answer format documents are used to answer questions such as How tall is the Space Needle? You can specify multiple questions that return the same answer. You specify the questions and answers in a comma-separated values (CSV) file stored in an Amazon S3 bucket.
For an example, see Adding questions and answers directly to an index.
Document attributes
A document has attributes associated with it. Attributes of a document are the properties of a document or what is contained within the structure of a document. For example, each of your documents might contain title, body text, and author. You can also add your own custom attributes of your documents.
Custom attributes are attributes that you specify for your own needs. For example, if your index searches tax documents, you might specify a custom attribute for the type of tax document such as W-2, 1099, and so on.
Before you can use a document attribute in a query, it must be mapped to an index field. For example, the title attribute can be mapped to the field _document_title. For more information, see Mapping fields. To add a new attribute, you must create an index field to map the attribute to. You create index fields using the console or by using the UpdateIndex API.
You can use document attributes to filter responses and to make faceted search suggestions. For example, you can filter a response to only return a specific version of a document, or you can filter searches to only return 1099 type of tax documents that match the search term. For more information, see Filtering queries.
You can also use document attributes to manually tune the query response. For example, you can choose to increase the importance of the title field to increase the weight that Amazon Kendra assigns to the field when determining which documents to return in the response. For more information, see Tuning search relevance.
If you are adding a document directly to an index, you specify the attributes in the Document input parameter to the the section called “BatchPutDocument” (p. 325) API. You specify the custom attribute values in a DocumentAttribute (p. 560) object array. If you are using a data source, the method that you use to add the document attributes depends on the data source. For more information, see Creating custom document attributes (p. 131).
Data sources
A data source is a location, such as an Amazon Simple Storage Service (Amazon S3) bucket, where you store the documents for indexing. You can automatically synchronize data sources with an Amazon Kendra index so that new, updated, or deleted documents in the data source are also added, updated, or deleted in the index for searching on.
Queries
Supported data sources are:
• Amazon S3 buckets
• Confluence instances
• Google Workspace Drives
• Amazon RDS for MySQL and Amazon RDS for PostgreSQL databases
• Confluence cloud and Confluence server
• Custom data sources
• Microsoft OneDrive for Business
• Microsoft SharePoint online and SharePoint server (versions 2013 and 2016
• Salesforce sites
• ServiceNow instances
• Amazon Kendra Web Crawler
• Amazon WorkDocs
• Amazon FSx
Supported document formats are: plain text, Microsoft Word, Microsoft PowerPoint, HTML, and PDF. For more information, see Types of documents (p. 5).
NoteTo create an index, you don't need a data source. You can add documents directly to an index.
For more information, see Adding documents directly to an index (p. 85).
To index documents using a data source.
1. Create an index (p. 75).
2. Create a data source (p. 94).
For a walkthrough with the Amazon Kendra console or with the AWS CLI, see Getting started (p. 46).
Queries
To get answers, users query an index. Users can use natural language in their queries. The response contains information, such as the title, a text excerpt, and the location of documents in the index that provide the best answer.
Amazon Kendra uses all of the information that you provide about your documents, not just the contents of the documents, to determine whether a document is relevant to the query. For example, if your index contains information about when documents were last updated, you can tell Amazon Kendra to assign a higher relevance to documents that were updated more recently.
A query can also contain criteria for how to filter the response so that Amazon Kendra returns only documents that satisfy the filter criteria. For example, if you created an index field called department, you can filter the response so that only documents with the department field set to legal are returned.
For more information, see Filtering queries (p. 167).
You can influence the results of a query by tuning the relevance of individual fields in the index. Tuning changes the importance of a field on the results. For example, if you raise the importance of documents with the category new, documents with this category are more likely to be included in the response. For more information, see Tuning search relevance (p. 186).
For more information about using queries, see Searching indexes (p. 157).
Tags
Tags
Manage your indexes, data sources, and FAQs by assigning metadata to them with tags. Use tags to categorize your Amazon Kendra resources in various ways, for example, by purpose, owner, or application, or any combination. Each tag consists of a key and a value, both of which you define.
Tags help you to:
• Identify and organize your AWS resources. Many AWS services support tagging, so you can assign the same tag to resources in different services to indicate that the resources are related. For example, you can tag an index and the Amazon Lex bot that uses it with the same tag.
• Allocate costs. You activate tags on the AWS Billing and Cost Management dashboard. AWS uses tags to categorize your costs and deliver a monthly cost allocation report to you. For more information, see Cost Allocation and Tagging in About AWS Billing and Cost Management.
• Control access to your resources. You can use tags in AWS Identity and Access Management (IAM) polices that control access to Amazon Kendra resources. You can attach these policies to an IAM role or user to enable tag-based access control. For more information, see Authorization based on Amazon Kendra tags (p. 289).
You can create and manage tags using the AWS Management Console, the AWS Command Line Interface (AWS CLI), or the Amazon Kendra API.
Tagging resources
If you're using the Amazon Kendra console, you can tag resources when you create them or add them later. You can also use the console to update or remove tags.
If you're using the AWS Command Line Interface (AWS CLI) or the Amazon Kendra API, use the following operations to manage tags for your resources:
• CreateDataSource (p. 332) – Apply tags when you create a data source.
• CreateFaq (p. 345) – Apply tags when you create an FAQ.
• CreateIndex (p. 349) – Apply tags when you create an index.
• ListTagsForResource (p. 454) – View the tags associated with a resource.
• TagResource (p. 479) – Add and modify tags for a resource.
• UntagResource (p. 481) – Remove tags from a resource.
Tag restrictions
The following restrictions apply to tags on Amazon Kendra resources:
• Maximum number of tags – 50
• Maximum key length – 128 characters
• Maximum value length – 256 characters
• Valid characters for key and value – a–z, A–Z, space, and the following characters: _ . : / = + - and @
• Keys and values are case sensitive
• Don't use aws: as a prefix for keys; it's reserved for AWS use
Sign up for AWS
Setting up Amazon Kendra
Before using Amazon Kendra, you must have an Amazon Web Services (AWS) account. After you have an AWS account, you can access Amazon Kendra through the Amazon Kendra console, the AWS Command Line Interface (AWS CLI), or the AWS SDKs.
This guide includes examples for AWS CLI, Java, and Python.
Topics
• Sign up for AWS (p. 9)
• Regions and endpoints (p. 9)
• Setting up the AWS CLI (p. 9)
• Setting up the AWS SDKs (p. 10)
Sign up for AWS
When you sign up for Amazon Web Services (AWS), your account is automatically signed up for all services in AWS, including Amazon Kendra. You are charged only for the services that you use.
If you have an AWS account already, skip to the next task. If you don't have an AWS account, use the following procedure to create one.
To sign up for AWS
1. Open https://aws.amazon.com, and then choose Create an AWS Account.
2. Follow the on-screen instructions to complete the account creation. Note your 12-digit AWS account number. Part of the sign-up procedure involves receiving a phone call and entering a PIN using the phone keypad.
3. Create an AWS Identity and Access Management (IAM) admin user. See Creating Your First IAM User and Group in the AWS Identity and Access Management User Guide for instructions.
Regions and endpoints
An endpoint is a URL that is the entry point for a web service. Each endpoint is associated with a specific AWS region. If you use a combination of the Amazon Kendra console, the AWS CLI, and the Amazon Kendra SDKs, pay attention to their default regions as all Amazon Kendra components of a given campaign (index, query, etc.) must be created in the same region. For the regions and endpoints supported by Amazon Kendra, see Regions and Endpoints.
Setting up the AWS CLI
The AWS Command Line Interface (AWS CLI) is a unified developer tool for managing AWS services, including Amazon Kendra. We recommend that you install it.
1. To install the AWS CLI, follow the instructions in Installing the AWS Command Line Interface in the AWS Command Line Interface User Guide.
Setting up the AWS SDKs
2. To configure the AWS CLI and set up a profile to call the AWS CLI, follow the instructions in Configuring the AWS CLI in the AWS Command Line Interface User Guide.
3. To confirm that the AWS CLI profile is configured properly, run the following command:
aws configure --profile default
If your profile has been configured correctly, you will see output similar to the following:
AWS Access Key ID [****************52FQ]:
AWS Secret Access Key [****************xgyZ]:
Default region name [us-west-2]:
Default output format [json]:
4. To verify that the AWS CLI is configured for use with Amazon Kendra, run the following commands:
aws kendra help
If the AWS CLI is configured correctly, you will see a list of the supported AWS CLI commands for Amazon Kendra, Amazon Kendra runtime, and Amazon Kendra events.
Setting up the AWS SDKs
Download and install the AWS SDKs that you want to use. This guide provides examples for Python. For information about other AWS SDKs, see Tools for Amazon Web Services.
IAM roles for indexes
IAM access roles for Amazon Kendra.
When you create an index, data source, or an FAQ, Amazon Kendra needs access to the AWS resources required to create the Amazon Kendra resource. You must create a AWS Identity and Access Management (IAM) policy before you create the Amazon Kendra resource. When you call the operation, you provide the Amazon Resource Name (ARN) of the role with the policy attached. For example, if you are calling the BatchPutDocument API to add documents from an Amazon S3 bucket, you provide Amazon Kendra with a role with a policy that has access to the bucket.
You can create a new IAM role in the Amazon Kendra console or choose an IAM existing role to use. The console displays roles that have the string "kendra" or "Kendra" in the role name.
The following topics provide details for the required policies. If you create IAM roles using the Amazon Kendra console these policies are created for you.
Topics
• IAM roles for indexes (p. 11)
• IAM roles for the BatchPutDocument API (p. 13)
• IAM roles for data sources (p. 14)
• IAM roles for frequently asked questions (p. 29)
• IAM roles for query suggestions (p. 30)
• IAM roles for principal mapping of users and groups (p. 31)
• IAM roles for AWS Single Sign-On (p. 32)
• IAM roles for Amazon Kendra experiences (p. 32)
• IAM roles for Custom Document Enrichment (p. 34)
IAM roles for indexes
When you create an index, you must provide an IAM role with permission to write to an Amazon CloudWatch. You must also provide a trust policy that allows Amazon Kendra to assume the role. The following are the policies that must be provided.
A role policy to allow Amazon Kendra to access a CloudWatch log.
{
"Version": "2012-10-17", "Statement": [
{
"Effect": "Allow",
"Action": "cloudwatch:PutMetricData", "Resource": "*",
"Condition": { "StringEquals": {
"cloudwatch:namespace": "Kendra"
} } }, {
"Effect": "Allow",
"Action": "logs:DescribeLogGroups", "Resource": "*"
},
IAM roles for indexes
{
"Effect": "Allow",
"Action": "logs:CreateLogGroup",
"Resource": "arn:aws:logs:region:account ID:log-group:/aws/kendra/*"
}, {
"Effect": "Allow", "Action": [
"logs:DescribeLogStreams", "logs:CreateLogStream", "logs:PutLogEvents"
],
"Resource": "arn:aws:logs:region:account ID:log-group:/aws/kendra/*:log- stream:*"
} ] }
A role policy to allow Amazon Kendra to access AWS Secrets Manager. If you are using user context with Secrets Manager as a key location, you can use the following policy.
{ "Version":"2012-10-17", "Statement":[
{
"Effect":"Allow",
"Action":"cloudwatch:PutMetricData", "Resource":"*",
"Condition":{
"StringEquals":{
"cloudwatch:namespace":"Kendra"
} } }, {
"Effect":"Allow",
"Action":"logs:DescribeLogGroups", "Resource":"*"
}, {
"Effect":"Allow",
"Action":"logs:CreateLogGroup",
"Resource":"arn:aws:logs:region:account ID:log-group:/aws/kendra/*"
}, {
"Effect":"Allow", "Action":[
"logs:DescribeLogStreams", "logs:CreateLogStream", "logs:PutLogEvents"
],
"Resource":"arn:aws:logs:region:account ID:log-group:/aws/kendra/*:log-stream:*"
}, {
"Effect":"Allow", "Action":[
"secretsmanager:GetSecretValue"
],
"Resource":[
"arn:aws:secretsmanager:region:account ID:secret:secret_id"
] }, {
"Effect":"Allow",
IAM roles for the BatchPutDocument API
"Action":[
"kms:Decrypt"
],
"Resource":[
"arn:aws:kms:region:account ID:key/key_id"
],
"Condition":{
"StringLike":{
"kms:ViaService":[
"secretsmanager.*.amazonaws.com"
] } } } ]}
A trust policy to allow Amazon Kendra to assume a role.
{ "Version": "2012-10-17", "Statement": {
"Effect": "Allow", "Principal": {
"Service": "kendra.amazonaws.com"
},
"Action": "sts:AssumeRole"
} }
IAM roles for the BatchPutDocument API
Warning
Amazon Kendra does not use bucket policy that grants permissions to an Amazon Kendra principal to interact with an Amazon S3 bucket. Instead, it uses IAM roles. Please check Amazon Kendra is not included as a trusted member in your bucket policy to help avoid any data security issues in accidentally granting permissions to arbitrary principals rather than bucket owners.
When you use the BatchPutDocument API to index documents in an Amazon S3 bucket, you must provide Amazon Kendra with an IAM role with access to the bucket. You must also provide a trust policy that allows Amazon Kendra to assume the role. If the documents in the bucket are encrypted, you must provide permission to use the AWS KMS customer master key (CMK) to decrypt the documents.
A required role policy to allow Amazon Kendra to access an Amazon S3 bucket.
{
"Version": "2012-10-17", "Statement": [
{
"Effect": "Allow", "Action": [
"s3:GetObject"
],
"Resource": [
"arn:aws:s3:::bucket name/*"
] } ] }
IAM roles for data sources
A required trust policy to allow Amazon Kendra to assume a role.
{ "Version": "2012-10-17", "Statement": {
"Sid": "AllowKendraToAssumeAttachedRole", "Effect": "Allow",
"Principal": {
"Service": "kendra.amazonaws.com"
},
"Action": "sts:AssumeRole"
} }
An optional role policy to allow Amazon Kendra to use an AWS KMS customer master key (CMK) to decrypt documents in an Amazon S3 bucket.
{ "Version": "2012-10-17", "Statement": [
{
"Effect": "Allow", "Action": [
"kms:Decrypt"
],
"Resource": [
"arn:aws:kms:region:account ID:key/key ID"
] } ] }
IAM roles for data sources
When you use the CreateDataSource API, you must give Amazon Kendra an IAM role that has permission to access the database resources. The specific permissions required depend on the data source.
Topics
• IAM roles for Amazon S3 data sources (p. 15)
• IAM roles for Confluence server data sources (p. 16)
• IAM roles for database data sources (p. 17)
• IAM roles for Google Workspace Drive data sources (p. 18)
• IAM roles for Microsoft OneDrive data sources (p. 19)
• IAM roles for Salesforce data sources (p. 20)
• IAM roles for ServiceNow data sources (p. 21)
• IAM roles for Microsoft SharePoint data sources (p. 22)
• Virtual private cloud (VPC) IAM role (p. 24)
• IAM roles for web crawler data sources (p. 25)
• IAM roles for Amazon WorkDocs data sources (p. 25)
• IAM roles for Amazon FSx data sources (p. 27)
IAM roles for Amazon S3 data sources
IAM roles for Amazon S3 data sources
Warning
Amazon Kendra does not use bucket policy that grants permissions to an Amazon Kendra principal to interact with an Amazon S3 bucket. Instead, it uses IAM roles. Please check Amazon Kendra is not included as a trusted member in your bucket policy to help avoid any data security issues in accidentally granting permissions to arbitrary principals rather than bucket owners.
When you use an Amazon S3 bucket as a data source, you supply a role that has permission to access the bucket, and to use the BatchPutDocument and BatchDeleteDocument operations. If the documents in the Amazon S3 bucket are encrypted, you must provide permission to use the AWS KMS customer master key (CMK) to decrypt the documents.
A required role policy to allow Amazon Kendra to use an Amazon S3 bucket as a data source.
{ "Version": "2012-10-17", "Statement": [
{
"Action": [
"s3:GetObject"
],
"Resource": [
"arn:aws:s3:::bucket name/*"
],
"Effect": "Allow"
}, {
"Action": [
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::bucket name"
],
"Effect": "Allow"
}, {
"Effect": "Allow", "Action": [
"kendra:BatchPutDocument", "kendra:BatchDeleteDocument"
],
"Resource": [
"arn:aws:kendra:region:account ID:index/index ID"
] } ] }
An optional role policy to allow Amazon Kendra to use an AWS KMS customer master key (CMK) to decrypt documents in an Amazon S3 bucket.
{ "Version": "2012-10-17", "Statement": [
{
"Effect": "Allow", "Action": [
"kms:Decrypt"
],
"Resource": [
IAM roles for Confluence server data sources
"arn:aws:kms:region:account ID:key/key ID"
] } ] }
IAM roles for Confluence server data sources
When you use a Confluence server as a data source, you provide a role with the following policies:
• Permission to access the AWS Secrets Manager secret that contains the credentials necessary to connect to the Confluence server. For more information about the contents of the secret, see Using an Atlassian Confluence data source (p. 100).
• Permission to use the AWS KMS customer master key (CMK) to decrypt the user name and password secret stored by Secrets Manager.
• Permission to use the BatchPutDocument and BatchDeleteDocument operations to update the index.
{ "Version": "2012-10-17", "Statement": [
{ "Effect": "Allow", "Action": [
"secretsmanager:GetSecretValue"
],
"Resource": [
"arn:aws:secretsmanager:region:account ID:secret:secret ID"
] },
{ "Effect": "Allow", "Action": [ "kms:Decrypt"
],
"Resource": [
"arn:aws:kms:region:account ID:key/key ID"
],
"Condition": { "StringLike": { "kms:ViaService": [
"secretsmanager.*.amazonaws.com"
] } } }, {
"Effect": "Allow", "Action": [
"kendra:BatchPutDocument", "kendra:BatchDeleteDocument"
],
"Resource": "arn:aws:kendra:region:account ID:index/index ID"
}]}
If you are using a VPC, provide a policy that gives Amazon Kendra access to the required resources. See Virtual private cloud (VPC) IAM role (p. 24) for the required policy.
IAM roles for database data sources
IAM roles for database data sources
When you use a database as a data source, you provide Amazon Kendra with a role that has the permissions necessary for connecting to the database. These include:
• Permission to access the AWS Secrets Manager secret that contains the user name and password for the database site. For more information about the contents of the secret, see Using a database data source (p. 109).
• Permission to use the AWS KMS customer master key (CMK) to decrypt the user name and password secret stored by Secrets Manager.
• Permission to use the BatchPutDocument and BatchDeleteDocument operations to update the index.
• Permission to access the Amazon S3 bucket that contains the SSL certificate used to communicate with the database site.
{ "Version": "2012-10-17", "Statement": [
{
"Effect": "Allow", "Action": [
"secretsmanager:GetSecretValue"
],
"Resource": [
"arn:aws:secretsmanager:region:account ID:secret:secret ID"
] }, {
"Effect": "Allow", "Action": [
"kms:Decrypt"
],
"Resource": [
"arn:aws:kms:region:account ID:key/key ID"
] }, {
"Effect": "Allow", "Action": [
"kendra:BatchPutDocument", "kendra:BatchDeleteDocument"
],
"Resource": [
"arn:aws:kendra:region:account ID:index/index ID"
"Condition": { "StringLike": {
"kms:ViaService": [
"kendra.amazonaws.com"
] } } }, {
"Effect": "Allow", "Action": [
"s3:GetObject"
],
"Resource": [
"arn:aws:s3:::bucket name/*"
]
IAM roles for Google Drive data sources
} ] }
There are two optional policies that you might use with a database data source.
If you have encrypted the Amazon S3 bucket that contains the SSL certificate used to communicate with the database, provide a policy to give Amazon Kendra access to the key.
{ "Version": "2012-10-17", "Statement": [
{
"Effect": "Allow", "Action": [
"kms:Decrypt"
],
"Resource": [
"arn:aws:kms:region:account ID:key/key ID"
] } ] }
If you are using a VPC, provide a policy that gives Amazon Kendra access to the required resources. See Virtual private cloud (VPC) IAM role (p. 24) for the required policy.
IAM roles for Google Workspace Drive data sources
When you use a Google Workspace Drive data source, you provide Amazon Kendra with a role that has the permissions necessary for connecting to the site. These include:
• Permission to get and decrypt the AWS Secrets Manager secret that contains the client account email, admin account email, and private key necessary to connect to the Google Drive site. For more information about the contents of the secret, see Using a Google Workspace Drive data source (p. 112).
• Permission to use the BatchPutDocument and BatchDeleteDocument APIs.
The following IAM policy provides the necessary permissions:
{ "Version": "2012-10-17", "Statement": [
{
"Effect": "Allow", "Action": [
"secretsmanager:GetSecretValue"
],
"Resource": [
"arn:aws:secretsmanager:region:account ID:secret:secret ID"
] },
{ "Effect": "Allow", "Action": [ "kms:Decrypt"
],
"Resource": [
"arn:aws:kms:region:account ID:key/key ID"
IAM roles for OneDrive data sources
],
"Condition": { "StringLike": { "kms:ViaService": [
"secretsmanager.*.amazonaws.com"
] } } },
{ "Effect": "Allow", "Action": [
"kendra:BatchPutDocument", "kendra:BatchDeleteDocument"
],
"Resource": "arn:aws:kendra:region:account ID:index/index ID"
}]}
IAM roles for Microsoft OneDrive data sources
When you use a Microsoft OneDrive data source, you provide Amazon Kendra with a role that has the permissions necessary for connecting to the site. These include:
• Permission to get and decrypt the AWS Secrets Manager secret that contains the application ID and secret key necessary to connect to the OneDrive site. For more information about the contents of the secret, see Using a Microsoft OneDrive data source (p. 114).
• Permission to use the BatchPutDocument and BatchDeleteDocument APIs.
The following IAM policy provides the necessary permissions:
{ "Version": "2012-10-17", "Statement": [
{ "Effect": "Allow", "Action": [
"secretsmanager:GetSecretValue"
],
"Resource": [
"arn:aws:secretsmanager:region:account ID:secret:secret ID"
] }, {
"Effect": "Allow", "Action": [ "kms:Decrypt"
],
"Resource": [
"arn:aws:kms:region:account ID:key/key ID"
],
"Condition": { "StringLike": { "kms:ViaService": [
"secretsmanager.*.amazonaws.com"
] } } },
{ "Effect": "Allow",
IAM roles for Salesforce data sources
"Action": [
"kendra:BatchPutDocument", "kendra:BatchDeleteDocument"
],
"Resource": "arn:aws:kendra:region:account ID:index/index ID"
}]
}
If you are storing the list of users to index in an Amazon S3 bucket, you must also provide permission to use the S3 GetObject operation. The following IAM policy provides the necessary permissions:
{
"Version": "2012-10-17", "Statement": [
{
"Effect": "Allow", "Action": [
"secretsmanager:GetSecretValue"
],
"Resource": [
"arn:aws:secretsmanager:region:account ID:secret:secret ID"
] }, {
"Action": [ "s3:GetObject"
],
"Resource": [
"arn:aws:s3:::input_bucket_name/*"
],
"Effect": "Allow"
},
{ "Effect": "Allow", "Action": [ "kms:Decrypt"
],
"Resource": [
"arn:aws:kms:region:account ID:key/[[key IDs]]"
],
"Condition": { "StringLike": { "kms:ViaService": [
"secretsmanager.*.amazonaws.com", "s3.*.amazonaws.com"
] } } }, {
"Effect": "Allow", "Action": [
"kendra:BatchPutDocument", "kendra:BatchDeleteDocument"
],
"Resource": "arn:aws:kendra:region:account ID:index/index ID"
}]
}
IAM roles for Salesforce data sources
When you use a Salesforce as a data source, you provide a role with the following policies:
IAM roles for ServiceNow data sources
• Permission to access the AWS Secrets Manager secret that contains the user name and password for the Salesforce site. For more information about the contents of the secret, see Using a Salesforce data source (p. 116).
• Permission to use the AWS KMS customer master key (CMK) to decrypt the user name and password secret stored by Secrets Manager.
• Permission to use the BatchPutDocument and BatchDeleteDocument operations to update the index.
{
"Version": "2012-10-17", "Statement": [
{
"Effect": "Allow", "Action": [
"secretsmanager:GetSecretValue"
],
"Resource": [
"arn:aws:secretsmanager:region:account ID:secret:secret ID"
] }, {
"Effect": "Allow", "Action": [ "kms:Decrypt"
],
"Resource": [
"arn:aws:kms:region:account ID:key/key ID"
],
"Condition": { "StringLike": { "kms:ViaService": [
"secretsmanager.*.amazonaws.com"
] } } }, {
"Effect": "Allow", "Action": [
"kendra:BatchPutDocument", "kendra:BatchDeleteDocument"
],
"Resource": "arn:aws:kendra:region:account ID:index/index ID"
}]
}
IAM roles for ServiceNow data sources
When you use a ServiceNow as a data source, you provide a role with the following policies:
• Permission to access the Secrets Manager secret that contains the user name and password for the ServiceNow site. For more information about the contents of the secret, see Using a ServiceNow data source (p. 119).
• Permission to use the AWS KMS customer master key (CMK) to decrypt the user name and password secret stored by Secrets Manager.
• Permission to use the BatchPutDocument and BatchDeleteDocument operations to update the index.
IAM roles for SharePoint data sources
{ "Version": "2012-10-17", "Statement": [
{
"Effect": "Allow", "Action": [
"secretsmanager:GetSecretValue"
],
"Resource": [
"arn:aws:secretsmanager:region:account ID:secret:secret ID"
] },
{ "Effect": "Allow", "Action": [ "kms:Decrypt"
],
"Resource": [
"arn:aws:kms:region:account ID:key/key ID"
],
"Condition": { "StringLike": { "kms:ViaService": [
"secretsmanager.*.amazonaws.com"
] } } }, {
"Effect": "Allow", "Action": [
"kendra:BatchPutDocument", "kendra:BatchDeleteDocument"
],
"Resource": "arn:aws:kendra:region:account ID:index/index ID"
}]}
IAM roles for Microsoft SharePoint data sources
For a Microsoft SharePoint data source, you provide a role with the following policies.
• Permission to access the AWS Secrets Manager secret that contains the user name and password for the SharePoint site. For more information about the contents of the secret, see Using a Microsoft SharePoint data source (p. 124).
• Permission to use the AWS KMS customer master key (CMK) to decrypt the user name and password secret stored by AWS Secrets Manager.
• Permission to use the BatchPutDocument and BatchDeleteDocument operations to update the index.
• Permission to access the Amazon S3 bucket that contains the SSL certificate used to communicate with the SharePoint site.
You must also attach a trust policy that allows Amazon Kendra to assume the role.
{
"Version": "2012-10-17", "Statement": [
{
"Effect": "Allow", "Action": [
IAM roles for SharePoint data sources
"secretsmanager:GetSecretValue"
],
"Resource": [
"arn:aws:secretsmanager:region:account ID:secret:secret ID"
] }, {
"Effect": "Allow", "Action": [
"kms:Decrypt"
],
"Resource": [
"arn:aws:kms:region:account ID:key/key ID"
] }, {
"Effect": "Allow", "Action": [
"kendra:BatchPutDocument", "kendra:BatchDeleteDocument"
],
"Resource": [
"arn:aws:kendra:region:account ID:index/index ID"
],
"Condition": { "StringLike": {
"kms:ViaService": [
"kendra.amazonaws.com"
] } } }, {
"Effect": "Allow", "Action": [
"s3:GetObject"
],
"Resource": [
"arn:aws:s3:::bucket name/*"
] } ] }
You must apply the following trust policy to the role.
{ "Version": "2012-10-17", "Statement": {
"Effect": "Allow", "Principal": {
"Service": "kendra.amazonaws.com"
},
"Action": "sts:AssumeRole"
} }
If you have encrypted the Amazon S3 bucket that contains the SSL certificate used to communicate with the Sharepoint site, provide a policy to give Amazon Kendra access to the key.
{
"Version": "2012-10-17", "Statement": [