Basic data manipulation - Amazon Kendra

You can manipulate your document metadata ﬁelds or attributes and content using basic logic. This includes removing values in a ﬁeld, modifying values in a ﬁeld using a condition, or creating a ﬁeld. For advanced manipulations that go beyond what you can manipulate using basic logic, you need to invoke a Lambda function—see the section called “Advanced data manipulation” (p. 142).

To apply basic logic, you specify the target ﬁeld you want to manipulate using the

DocumentAttributeTarget object. You provide the attribute key. For example, the key 'Department' is a ﬁeld or attribute that holds all the department names associated with the documents. You can also specify a value to use in the target ﬁeld if a certain condition is met. You set the condition using the DocumentAttributeCondition object. For example, if the 'Source_URI' ﬁeld contains 'ﬁnancial' in its URI value, then preﬁll the target ﬁeld 'Department' with the target value 'Finance' for the document. You can also delete the values of the target document attribute.

To apply basic logic using the console, select your index and then select Document enrichments in the navigation menu. Go to Conﬁgure basic operations to apply basic manipulations to your document metadata ﬁelds or attributes and content.

The following is an example of using basic logic to remove all customer identiﬁcation numbers in the document metadata ﬁeld called 'Customer_ID'.

Example 1: Removing customer identiﬁcation numbers associated with the documents Data before basic manipulation applied.

Basic data manipulation

Document_ID Body_Text Customer_ID

1 Lorem Ipsum. CID1234

2 Lorem Ipsum. CID1235

3 Lorem Ipsum. CID1236

Data after basic manipulation applied.

Document_ID Body_Text Customer_ID

1 Lorem Ipsum.

2 Lorem Ipsum.

3 Lorem Ipsum.

The following is an example of using basic logic to create a metadata ﬁeld called 'Department' and preﬁll this ﬁeld with the department names based on information from the 'Source_URI' ﬁeld. This uses the condition that if the 'Source_URI' ﬁeld contains 'ﬁnancial' in its URI value, then preﬁll the target ﬁeld 'Department' with the target value 'Finance' for the document.

Example 2: Creating 'Department' ﬁeld and preﬁlling it with department names associated with the documents using a condition.

Data before basic manipulation applied.

Document_ID Body_Text Source_URI

1 Lorem Ipsum. ﬁnancial/1

2 Lorem Ipsum. ﬁnancial/2

3 Lorem Ipsum. ﬁnancial/3

Data after basic manipulation applied.

Document_ID Body_Text Source_URI Department

1 Lorem Ipsum. ﬁnancial/1 Finance

2 Lorem Ipsum. ﬁnancial/2 Finance

3 Lorem Ipsum. ﬁnancial/3 Finance

NoteAmazon Kendra cannot create a target document metadata ﬁeld if it has not already created as an index ﬁeld. After you create your index ﬁeld, you can create a document metadata ﬁeld using DocumentAttributeTarget. Amazon Kendra then will map your newly created metadata ﬁeld to your index ﬁeld.

The following code is an example of conﬁguring basic data manipulation to remove customer identiﬁcation numbers associated with the documents.

Basic data manipulation

Console

To conﬁgure basic data manipulation to remove customer identiﬁcation numbers 1. In the left navigation pane, under Indexes, select Document enrichments and then select Add

document enrichment.

2. On the Conﬁgure basic operations page, choose from the dropdown your data source that you want to alter document metadata ﬁelds and content. Then choose from the dropdown the document ﬁeld name 'Customer_ID', select from the dropdown the index ﬁeld name 'Customer_ID', and select from the dropdown the target action Delete. Then select Add basic operation.

CLI

To conﬁgure basic data manipulation to remove customer identiﬁcation numbers

aws kendra create-data-source \ --name data-source-name \ --index-id index-id \

--role-arn arn:aws:iam::account-id:role/role-name \ --type S3 \

--configuration '{"S3Configuration":{"BucketName":"S3-bucket-name"}}' \

--custom-document-enrichment-configuration '{"InlineConfigurations":[{"Target":

{"TargetDocumentAttributeKey":"Customer_ID", "TargetDocumentAttributeValueDeletion":

true}}]}'

Python

To conﬁgure basic data manipulation to remove customer identiﬁcation numbers

import boto3

from botocore.exceptions import ClientError import pprint

import time

kendra = boto3.client("kendra")

print("Create a data source with customizations") name = "data-source-name"

index_id = "index-id"

role_arn = "arn:aws:iam::${account-id}:role/${role-name}"

data_source_type = "S3"

configuration = {"S3Configuration":

{

"BucketName": S3-bucket-name }

}

custom_document_enrichment_configuration = {"InlineConfigurations":[

{

"Target":{"TargetDocumentAttributeKey":"Customer_ID", "TargetDocumentAttributeValueDeletion": True}

}]

} try:

data_source_response = kendra.create_data_source(

Name = name, IndexId = index_id, RoleArn = role_arn, Type = data_source_type

Basic data manipulation

Configuration = configuration

CustomDocumentEnrichmentConfiguration = custom_document_enrichment_configuration )

pprint.pprint(data_source_response)

data_source_id = data_source_response["Id"]

print("Wait for Kendra to create the data source with your customizations.") while True:

# Get the data source description

data_source_description = kendra.describe_data_source(

Id = data_source_id, IndexId = index_id )

status = data_source_description["Status"]

print(" Creating data source. Status: "+status) time.sleep(60)

if status != "CREATING":

break

print("Synchronize the data source.")

sync_response = kendra.start_data_source_sync_job(

Id = data_source_id, IndexId = index_id )

pprint.pprint(sync_response)

print("Wait for the data source to sync with the index.") while True:

jobs = kendra.list_data_source_sync_jobs(

Id=data_source_id, IndexId=index_id )

# For this example, there should be one job status = jobs["History"][0]["Status"]

print(" Syncing data source. Status: "+status) time.sleep(60)

if status != "SYNCING":

break

except ClientError as e:

print("%s" % e) print("Program ends.") Java

To conﬁgure basic data manipulation to remove customer identiﬁcation numbers

package com.amazonaws.kendra;

import java.util.concurrent.TimeUnit;

import software.amazon.awssdk.services.kendra.KendraClient;

import software.amazon.awssdk.services.kendra.model.CreateDataSourceRequest;

import software.amazon.awssdk.services.kendra.model.CreateDataSourceResponse;

import software.amazon.awssdk.services.kendra.model.CreateIndexRequest;

Basic data manipulation

public static void main(String[] args) throws InterruptedException { System.out.println("Create a data source with customizations");

String dataSourceName = "data-source-name";

String indexId = "index-id";

String dataSourceRoleArn = "arn:aws:iam::account-id:role/role-name";

String s3BucketName = "S3-bucket-name"

KendraClient kendra = KendraClient.builder().build();

CreateDataSourceRequest createDataSourceRequest = CreateDataSourceRequest .builder()

CreateDataSourceResponse createDataSourceResponse = kendra.createDataSource(createDataSourceRequest);

System.out.println(String.format("Response of creating data source: %s", createDataSourceResponse));

String dataSourceId = createDataSourceResponse.id();

Basic data manipulation

System.out.println(String.format("Waiting for Kendra to create the data source %s", dataSourceId));

DescribeDataSourceRequest describeDataSourceRequest = DescribeDataSourceRequest .builder()

.indexId(indexId) .id(dataSourceId) .build();

while (true) {

DescribeDataSourceResponse describeDataSourceResponse = kendra.describeDataSource(describeDataSourceRequest);

DataSourceStatus status = describeDataSourceResponse.status();

System.out.println(String.format("Creating data source. Status: %s", status));

TimeUnit.SECONDS.sleep(60);

if (status != DataSourceStatus.CREATING) { break;

} }

System.out.println(String.format("Synchronize the data source %s", dataSourceId));

StartDataSourceSyncJobRequest startDataSourceSyncJobRequest = StartDataSourceSyncJobRequest

.builder() .indexId(indexId) .id(dataSourceId) .build();

StartDataSourceSyncJobResponse startDataSourceSyncJobResponse = kendra.startDataSourceSyncJob(startDataSourceSyncJobRequest);

System.out.println(String.format("Waiting for the data source to sync with the index %s for execution ID %s", indexId, startDataSourceSyncJobResponse.executionId()));

// For this example, there should be one job

ListDataSourceSyncJobsRequest listDataSourceSyncJobsRequest = ListDataSourceSyncJobsRequest

.builder() .indexId(indexId) .id(dataSourceId) .build();

while (true) {

ListDataSourceSyncJobsResponse listDataSourceSyncJobsResponse = kendra.listDataSourceSyncJobs(listDataSourceSyncJobsRequest);

DataSourceSyncJob job = listDataSourceSyncJobsResponse.history().get(0);

System.out.println(String.format("Syncing data source. Status: %s", job.status()));

TimeUnit.SECONDS.sleep(60);

if (job.status() != DataSourceSyncJobStatus.SYNCING) { break;

} }

System.out.println("Data source creation with customizations is complete");

} }

在文檔中 Amazon Kendra (頁 148-154)