• 沒有找到結果。

AWS DataSync

N/A
N/A
Protected

Academic year: 2022

Share "AWS DataSync"

Copied!
321
0
0

加載中.... (立即查看全文)

全文

(1)

AWS DataSync

User Guide

(2)

AWS DataSync: User Guide

Copyright © Amazon Web Services, Inc. and/or its affiliates. All rights reserved.

Amazon's trademarks and trade dress may not be used in connection with any product or service that is not Amazon's, in any manner that is likely to cause confusion among customers, or in any manner that disparages or discredits Amazon. All other trademarks not owned by Amazon are the property of their respective owners, who may or may not be affiliated with, connected to, or sponsored by Amazon.

(3)

Table of Contents

What is AWS DataSync? ... 1

Use cases ... 1

Benefits ... 1

Additional AWS DataSync resources ... 2

How AWS DataSync works ... 3

AWS DataSync architecture ... 3

Data transfer between self-managed storage and AWS ... 3

Data transfer between AWS storage services ... 4

Data transfer using a DataSync EC2 agent deployed in a Region ... 5

Components and terminology ... 5

Agent ... 6

Location ... 6

Task ... 6

Task execution ... 6

How DataSync transfers files ... 7

How AWS DataSync verifies data integrity ... 8

How DataSync handles open and locked files ... 8

Setting up ... 9

Sign up for AWS ... 9

AWS Regions and endpoints ... 9

How to access AWS DataSync ... 9

DataSync pricing ... 9

Requirements ... 10

Agent requirements ... 10

Supported hypervisors ... 10

Virtual machine requirements ... 11

Amazon EC2 instance requirements ... 11

Network requirements ... 11

Network requirements to connect to your self-managed storage ... 11

Network requirements when using VPC endpoints ... 12

Network requirements when using public service endpoints or FIPS endpoints ... 15

Required network interfaces for data transfers ... 19

Getting started ... 21

Create an agent ... 21

Deploy your agent ... 22

Choose a service endpoint ... 26

Activate your agent ... 28

Configure a source location ... 29

Configure a destination location ... 30

Configure task settings ... 31

Data verification options ... 31

Ownership and permissions-related options ... 32

File metadata options and file management ... 32

Bandwidth options ... 33

Filtering options ... 33

Scheduling and queueing options ... 33

Tags and logging options ... 34

Review and create your task ... 34

Start your task ... 34

Clean up resources ... 35

Using the AWS CLI ... 36

Step 1: Create an agent ... 36

Step 2: Create locations ... 39

Create an NFS location ... 39

(4)

Create an SMB location ... 40

Create an HDFS location ... 41

Create an object storage location ... 42

Create an Amazon EFS location ... 42

Create an FSx for Windows File Server location ... 44

Create an Amazon FSx for Lustre location ... 44

Create an Amazon S3 location ... 45

Step 3: Create a task ... 49

Step 4: Start a task execution ... 50

Step 5: Monitor your task execution ... 51

Monitor your task execution in real time ... 52

API filters ... 52

Parameters for API filtering ... 52

API filtering ListLocations ... 53

API filtering ListTasks ... 53

Working with agents ... 55

Creating and activating an agent ... 55

Using DataSync in a VPC ... 56

How DataSync works with VPC endpoints ... 56

Configuring DataSync to use private IP addresses for data transfer ... 56

Deploying your agent in AWS Regions ... 59

Data transfer from in-cloud file system to in-cloud file system or Amazon S3 ... 59

Data transfer from S3 to in-cloud file systems ... 60

Editing your agent's properties ... 61

Using multiple agents for a location ... 62

Agent statuses ... 62

Deleting an agent ... 62

Configuring your agent for multiple NICs ... 63

Working with your agent's local console ... 63

Logging in to the agent local console ... 64

Obtaining an activation key using the local console ... 64

Configuring your agent network settings ... 65

Testing your agent connectivity to the internet ... 66

Testing connectivity to self-managed storage ... 67

Viewing your agent system resource status ... 67

Synchronizing your VMware agent time ... 68

Running AWS DataSync commands on the local console ... 69

Enabling AWS Support to help troubleshoot DataSync ... 70

Working with locations ... 72

Creating a location for NFS ... 73

NFS location settings ... 74

NFS server on AWS Snowcone and AWS Snowball Edge ... 75

Creating a location for SMB ... 75

SMB location settings ... 76

Creating a location for HDFS ... 77

Unsupported HDFS features ... 78

Creating a location for object storage ... 78

Creating a location for Amazon EFS ... 79

Considerations when creating a location for Amazon EFS ... 80

Creating a location for FSx for Windows File Server ... 81

Creating a location for FSx for Lustre ... 83

Creating a location for Amazon S3 ... 83

Amazon S3 location settings ... 85

Considerations when working with Amazon S3 storage classes in DataSync ... 86

Manually configuring an IAM role to access your Amazon S3 bucket ... 88

How DataSync handles metadata and special files ... 90

Metadata copied by DataSync ... 90

(5)

Default POSIX metadata applied by DataSync ... 92

Links and directories copied by DataSync ... 93

Deleting a location ... 93

Working with tasks ... 94

Creating your task ... 94

Creating a task for DataSync ... 94

Creating a task to transfer data between self-managed storage and AWS ... 95

Creating a task to transfer between in-cloud locations ... 95

Configuring task settings ... 99

Filtering data ... 100

Filtering terms, definitions, and syntax ... 100

Excluding data from a transfer ... 101

Including data in a transfer ... 102

Sample filters for common uses ... 102

Scheduling your task ... 103

Configuring a task schedule ... 104

Editing a task schedule ... 104

Task creation statuses ... 105

Starting your task ... 105

Queueing task executions ... 106

Working with task executions ... 106

Adjust bandwidth throttling ... 106

Task execution statuses ... 107

Cancel a task execution ... 107

Deleting your task ... 108

Monitoring ... 109

Accessing CloudWatch metrics ... 109

DataSync CloudWatch metrics ... 109

CloudWatch events for DataSync ... 110

DataSync dimensions ... 111

Uploading logs to Amazon CloudWatch log groups ... 111

Security ... 113

Data protection ... 113

Data encryption ... 113

Identity and access management ... 114

Overview of managing access ... 114

Using identity-based policies (IAM policies) ... 119

Cross-service confused deputy prevention ... 122

DataSync API permissions reference ... 123

Logging ... 128

Working with AWS DataSync information in CloudTrail ... 128

Understanding AWS DataSync log file entries ... 129

Compliance validation ... 130

Resilience ... 131

Infrastructure security ... 131

Quotas and limits ... 132

Quotas for tasks ... 132

Quotas for task executions ... 134

Limits for DataSync file systems ... 134

Limits for DataSync filters ... 134

Troubleshooting ... 135

I need DataSync to use a specific NFS or SMB version to mount my share ... 135

What does the "Failed to retrieve agent activation key" error mean? ... 136

I can't activate an agent I created using a VPC endpoint ... 136

My task status is unavailable and indicates a mount error ... 136

My task failed with an input/output error message ... 137

My task is stuck in launching status ... 137

(6)

My task failed with a permissions denied error message ... 137

My task has had a preparing status for a long time ... 138

How long does it take to verify a task I've run? ... 138

My storage cost is higher than I expected ... 138

I don't know what's going on with my agent. Can someone help me? ... 139

How do I connect to an Amazon EC2 agent's local console? ... 139

Tutorial: Transferring across accounts to S3 ... 140

Overview ... 140

Prerequisites ... 140

Step 1: Create an IAM role for DataSync in Account A ... 141

Create the IAM role ... 141

Attach a custom policy to the IAM role ... 141

Step 2: Disable ACLs for your S3 bucket in Account B ... 142

Step 3: Update the S3 bucket policy in Account B ... 142

Step 4: Create a DataSync destination location for the S3 bucket ... 143

Step 5: Create and start your DataSync task ... 144

Related ... 144

Additional resources ... 146

Transferring data from a self-managed storage array ... 146

Other use cases ... 146

Transferring files in opposite directions ... 146

Using multiple tasks to write to the same Amazon S3 bucket ... 147

Allowing Amazon S3 access from a private VPC endpoint ... 147

API reference ... 149

Actions ... 149

CancelTaskExecution ... 151

CreateAgent ... 153

CreateLocationEfs ... 157

CreateLocationFsxLustre ... 161

CreateLocationFsxWindows ... 164

CreateLocationHdfs ... 167

CreateLocationNfs ... 172

CreateLocationObjectStorage ... 176

CreateLocationS3 ... 180

CreateLocationSmb ... 185

CreateTask ... 189

DeleteAgent ... 194

DeleteLocation ... 196

DeleteTask ... 198

DescribeAgent ... 200

DescribeLocationEfs ... 203

DescribeLocationFsxLustre ... 206

DescribeLocationFsxWindows ... 209

DescribeLocationHdfs ... 212

DescribeLocationNfs ... 216

DescribeLocationObjectStorage ... 219

DescribeLocationS3 ... 222

DescribeLocationSmb ... 225

DescribeTask ... 228

DescribeTaskExecution ... 234

ListAgents ... 239

ListLocations ... 241

ListTagsForResource ... 244

ListTaskExecutions ... 247

ListTasks ... 250

StartTaskExecution ... 253

TagResource ... 257

(7)

UntagResource ... 259

UpdateAgent ... 261

UpdateLocationHdfs ... 263

UpdateLocationNfs ... 267

UpdateLocationObjectStorage ... 270

UpdateLocationSmb ... 273

UpdateTask ... 276

UpdateTaskExecution ... 279

Data Types ... 280

AgentListEntry ... 282

Ec2Config ... 283

FilterRule ... 284

HdfsNameNode ... 285

LocationFilter ... 286

LocationListEntry ... 287

NfsMountOptions ... 289

OnPremConfig ... 290

Options ... 291

PrivateLinkConfig ... 296

QopConfiguration ... 298

S3Config ... 299

SmbMountOptions ... 300

TagListEntry ... 301

TaskExecutionListEntry ... 302

TaskExecutionResultDetail ... 303

TaskFilter ... 305

TaskListEntry ... 306

TaskSchedule ... 307

Common Errors ... 307

Common Parameters ... 309

Document history ... 311

AWS glossary ... 314

(8)

What is AWS DataSync?

AWS DataSync is an online data transfer service that simplifies, automates, and accelerates moving data between on-premises storage systems and AWS storage services, and also between AWS storage services. DataSync can copy data between:

• Network File System (NFS) file servers

• Server Message Block (SMB) file servers

• Hadoop Distributed File System (HDFS)

• On-premises (self-managed) object storage

• Snow Family devices

• Amazon Simple Storage Service (Amazon S3) buckets

• Amazon EFS file systems

• Amazon FSx for Windows File Server file systems

• Amazon FSx for Lustre file systems

In this guide, you can find a description of the components of DataSync, detailed instructions on how to get started, and the API Reference.

Topics

• Use cases (p. 1)

• Benefits (p. 1)

• Additional AWS DataSync resources (p. 2)

Use cases

These are some of the main use cases for AWS DataSync:

Data migration – Move active datasets rapidly over the network into Amazon S3, Amazon EFS, FSx for Windows File Server, or FSx for Lustre. DataSync includes automatic encryption and data integrity validation to help make sure that your data arrives securely, intact, and ready to use.

Archiving cold data – Move cold data stored in on-premises storage directly to durable and secure long-term storage classes such as S3 Glacier Flexible Retrieval or S3 Glacier Deep Archive. Doing so can free up on-premises storage capacity and shut down legacy systems.

Data protection – Move data into any Amazon S3 storage class, choosing the most cost-effective storage class for your needs. You can also send data to Amazon EFS, FSx for Windows File Server, or FSx for Lustre for a standby file system.

Data movement for timely in-cloud processing – Move data in to or out of AWS for processing when working with systems that generate data on-premises. This approach can speed up critical hybrid cloud workflows across many industries. These include machine learning in the life-sciences industry, video production in media and entertainment, big-data analytics in financial services, and seismic research in the oil and gas industry.

Benefits

By using AWS DataSync, you can get the following benefits:

(9)

Additional AWS DataSync resources

Simplify and automate data movement – AWS DataSync makes it easier to move data over the network between on-premises storage and AWS storage services, and also between AWS storage services. DataSync automates both the management of data-transfer processes and the infrastructure required for high performance and secure data transfer.

Transfer data securely – DataSync provides end-to-end security, including encryption and integrity validation, to help ensure that your data arrives securely, intact, and ready to use. DataSync accesses your AWS storage through built-in AWS security mechanisms, such as AWS Identity and Access Management (IAM) roles. It also supports virtual private cloud (VPC) endpoints, giving you the option to transfer data without traversing the public internet, and further increasing the security of data copied online.

Move data faster – With DataSync, you can transfer data rapidly over the network into AWS. It uses a purpose-built network protocol and a parallel, multi-threaded architecture to accelerate your transfers.

This approach speeds up migrations, recurring data-processing workflows for analytics and machine learning, and data-protection processes.

Reduce operational costs – You can move data cost-effectively with the flat, per-gigabyte pricing of DataSync. You can save on script development, and deployment and maintenance costs, and avoid the need for costly commercial transfer tools.

Additional AWS DataSync resources

We recommend that you read the following:

• DataSync resources – The resources page includes blogs, videos, and other training materials.

• AWS DataSync developer forum – The AWS DataSync developer forum.

• AWS DataSync pricing – AWS DataSync pricing information.

AWS DataSync also supports Terraform. To learn more about DataSync deployment automation with Terraform, see the Terraform documentation.

(10)

How AWS DataSync works

In this section, you can find information about components, terms, and how DataSync works.

Topics

• AWS DataSync architecture (p. 3)

• Components and terminology (p. 5)

• How DataSync transfers files (p. 7)

AWS DataSync architecture

Topics

• Data transfer between self-managed storage and AWS (p. 3)

• Data transfer between AWS storage services (p. 4)

• Data transfer using a DataSync EC2 agent deployed in a Region (p. 5)

The architectural diagrams show how DataSync transfers data between on-premises (self-managed) storage systems and AWS storage services, and between in-cloud storage systems and AWS storage services.

For a list of all DataSync supported source and destination endpoints, see Working with locations (p. 72).

Data transfer between self-managed storage and AWS

The following diagram shows a high-level view of the DataSync architecture for transferring files between self-managed storage and AWS services.

(11)

Data transfer between AWS storage services

Data transfer between AWS storage services

The following diagram provides a high-level view of the DataSync architecture for transferring files between AWS services within the same AWS account. This architecture applies to both in-Region and cross-Region transfers.

(12)

Important

When you use DataSync to copy files or objects between AWS Regions, you pay for data transfer between Regions. This is billed as data transfer OUT from your source Region to your destination Region. For more information, see Data transfer pricing.

Data transfer using a DataSync EC2 agent deployed in a Region

You can use DataSync to transfer data between AWS services in different AWS accounts, or between self- managed file systems in AWS and Amazon S3, by deploying the DataSync Amazon EC2 agent in an AWS Region. For more information, see Deploying your DataSync agent in AWS Regions (p. 59).

Components and terminology

The components of DataSync include the following:

• Agent – A virtual machine (VM) that's used to read data from or write data to a self-managed location.

An agent isn't required when transferring between AWS storage services in the same AWS account.

• Location – Any source or destination location that's used in the data transfer, such as, Amazon S3, Amazon EFS, Amazon FSx for Windows File Server, Amazon FSx for Lustre, Network File System (NFS), Server Message Block (SMB), Hadoop Distributed File System (HDFS), or self-managed object storage.

• Task – A source location and a destination location, and a configuration that defines how data is transferred. A task always transfers data from the source to the destination. The configuration can include options such as task schedule, bandwidth limit, and so on. A task is the complete definition of a data transfer.

(13)

Agent

• Task execution – An individual run of a task, which includes information such as the start time, end time, bytes written, and status.

Agent

An agent is a VM that you own that's used to read or write data from self-managed storage systems. The agent can be deployed on VMware ESXi, KVM, Microsoft Hyper-V hypervisors, or it can be launched as an Amazon EC2 instance. You use the AWS DataSync console or the API to set up and activate your agent.

The activation process associates your agent VM with your AWS account. For information about agents, see Working with agents (p. 55).

An agent that's functioning properly has the status ONLINE. If an agent is unable to communicate with AWS, it transitions to OFFLINE status. This transition can result from issues with a network partition, firewall misconfiguration, and other events that make the agent VM unable to connect to AWS. The status of an agent that's powered off also shows as OFFLINE.

Location

A location is an endpoint of a task. Each task has two locations—a source location and a destination location. AWS DataSync supports the following location types:

• Network File System (NFS)

• Server Message Block (SMB)

• Hadoop Distributed File System (HDFS)

• On-premises (self-managed) object storage

• Amazon EFS

• Amazon FSx for Windows File Server

• Amazon FSx for Lustre

• Amazon S3

For more information, see Working with locations (p. 72).

Task

A task includes two locations (source and destination), and the configuration of how to transfer the data from one location to the other. The configuration settings can include options such as how to treat metadata, deleted files, and permissions. A task is the complete definition of a data transfer.

Task execution

A task execution is an individual run of a task, which shows information such as the start time, end time, number of transferred files, and status.

A task execution has five transition phases and two terminal statuses, as shown in the following diagram.

These phases and statuses are:

QUEUEING – This phase consists of queuing the task executions that are running using the same agent.

LAUNCHING – During this phase, the task execution is initialized.

PREPARING – During this phase, DataSync computes which files need to be transferred.

TRANSFERRING – During this phase, DataSync transfers data to AWS.

(14)

VERIFYING – During this optional phase, DataSync performs a full data and metadata integrity verification. This phase occurs only if the VerifyMode option is enabled during configuration.

SUCCESS or ERROR – When the task is finished, DataSync sets the task to one of these terminal statuses, depending on whether it was successful.

If the VerifyMode option isn't enabled in the task configuration, the terminal status is set after the TRANSFERRING phase. Otherwise, it is set after the VERIFYING phase. The two terminal statuses are these:

SUCCESS

ERROR

For more information, see Task execution statuses (p. 107).

How DataSync transfers files

Topics

• How AWS DataSync verifies data integrity (p. 8)

• How DataSync handles open and locked files (p. 8)

When a task starts, it goes through different phases: LAUNCHING, PREPARING, TRANSFERRING, and VERIFYING. In the LAUNCHING phase, DataSync initializes the task execution. In the PREPARING phase, DataSync examines the source and destination file systems to determine which files to sync. It does so by recursively scanning the contents and metadata of files on the source and destination file systems for differences.

The time that DataSync spends in the PREPARING phase depends on the number of files in both the source and destination file systems. It also depends on the performance of these file systems and usually takes between a few minutes to a few hours. For more information, see Starting your DataSync task (p. 105).

After the scanning is done and the differences are calculated, DataSync transitions to the

TRANSFERRING phase. At this point, DataSync starts transferring files and metadata from the source file system to the destination. DataSync copies changes to files with contents or metadata that are different between the source and the destination. You can narrow down the copied files by filtering the data or by configuring DataSync to not overwrite files that are already present in the destination.

Note

By default, any changes to metadata on the source storage result in this metadata being copied to the destination storage.

After the TRANSFERRING phase is done, DataSync verifies consistency between the source and destination file systems. This is the VERIFYING phase.

(15)

How AWS DataSync verifies data integrity

When DataSync transfers data, it always performs data integrity checks during the transfer. You can enable additional verification to compare the source and destination at the end of a transfer. This additional check can verify the entire dataset or only the files that were transferred as part of the task execution. For most use cases, we recommend verifying only the files transferred.

How AWS DataSync verifies data integrity

AWS DataSync locally calculates the checksum of every file in the source file system and the destination and compares them. Additionally, DataSync compares the metadata of every file in the source and destination and compares them. If there are differences in either one, verification fails with an error code that specifies precisely what failed. For examples, you see error codes such as Checksum failure, Metadata failure, Files were added, Files were removed, and so on.

For more information, see DataSync task creation statuses (p. 105) and Enable verification in the Configuring task settings (p. 99) section.

How DataSync handles open and locked files

In general, DataSync can transfer open files without any limitations.

If a file is open and it's being written to during the transfer, DataSync detects data inconsistency during the VERIFYING phase. This phase is when DataSync detects whether the file on the source is different from the file on the destination.

If a file is locked and the server prevents DataSync from opening it, DataSync skips transferring it.

DataSync logs an error during the TRANSFERRING phase and sends a verification error.

(16)

Setting up

To get started, you first sign up for AWS. If you are a first-time user, we recommend that you read the Regions and requirements section.

Topics

• Sign up for AWS (p. 9)

• AWS Regions and endpoints (p. 9)

• How to access AWS DataSync (p. 9)

• DataSync pricing (p. 9)

Sign up for AWS

To use AWS DataSync, you need an AWS account that gives you access to all AWS resources, forums, support, and usage reports. You aren't charged for any of the services unless you use them. If you already have an AWS account, you can skip this step.

To sign up for AWS account

1. Open https://portal.aws.amazon.com/billing/signup.

2. Follow the online instructions.

Part of the sign-up procedure involves receiving a phone call and entering a verification code on the phone keypad.

AWS Regions and endpoints

AWS DataSync is available in the following AWS Regions.

How to access AWS DataSync

You can use the DataSync management console to perform various sync configuration and management tasks.

Additionally, you can use the AWS DataSync API or the AWS CLI to programmatically configure and manage DataSync. For more information about the API, see API reference (p. 149).

You can also use the AWS SDKs to develop applications that interact with DataSync. The AWS SDKs for Java, .NET, and PHP wrap the underlying DataSync API to simplify your programming tasks. For information about downloading the SDK libraries, see Sample code libraries.

DataSync pricing

For information about AWS DataSync pricing, see AWS DataSync pricing on the DataSync pricing page.

(17)

Agent requirements

Requirements for AWS DataSync

AWS DataSync agent and network requirements vary based on where and how you plan to transfer data.

Topics

• Agent requirements (p. 10)

• Network requirements (p. 11)

Agent requirements

Your AWS DataSync agent must adhere to the requirements that apply to your scenario.

Topics

• Supported hypervisors (p. 10)

• Virtual machine requirements (p. 11)

• Amazon EC2 instance requirements (p. 11)

Supported hypervisors

DataSync supports the following hypervisor versions and hosts:

VMware ESXi Hypervisor (version 6.5, 6.7, or 7.0): A free version of VMware is available on the VMware website. You also need a VMware vSphere client to connect to the host.

NoteWhen VMware ends general support for an ESXi hypervisor version, DataSync also ends support for that version. For information about VMware's supported hypervisor versions, see VMware lifecycle policy on the VMware website.

Microsoft Hyper-V Hypervisor (version 2012 R2, 2016, or 2019): A free, standalone version of Hyper- V is available at the Microsoft Download Center. For this setup, you need a Microsoft Hyper-V Manager on a Microsoft Windows client computer to connect to the host.

NoteThe DataSync VM is a generation 1 virtual machine. For more information about the

differences between generation 1 and generation 2 VMs, see Should I create a generation 1 or 2 virtual machine in Hyper-V?

Linux Kernel-based Virtual Machine (KVM): A free, open-source virtualization technology. KVM is included in Linux versions 2.6.20 and newer. AWS DataSync is tested and supported for the CentOS/

RHEL 7.8, Ubuntu 16.04 LTS, and Ubuntu 18.04 LTS distributions. Any other modern Linux distribution might work, but function or performance is not guaranteed. We recommend this option if you already have a KVM environment up and running and you're already familiar with how KVM works.

Note

Running KVM on Amazon EC2 isn't supported, and cannot be used for DataSync agents. To run the agent on Amazon EC2, deploy an agent Amazon Machine Image (AMI). For more information about deploying an agent AMI on Amazon EC2, see Deploy your agent as an Amazon EC2 instance (p. 24).

Amazon EC2 instance: DataSync provides an Amazon Machine Image (AMI) that contains the DataSync VM image. For the recommended instance types, see Amazon EC2 instance requirements (p. 11).

(18)

Virtual machine requirements

When deploying AWS DataSync on-premises, make sure that the underlying hardware where you deploy the DataSync VM can dedicate the following minimum resources:

Virtual processors: Four virtual processors assigned to the VM.

Disk space: 80 GB of disk space for installation of VM image and system data.

RAM: Depending on your configuration, one of the following:

• 32 GB of RAM assigned to the VM, for tasks that transfer up to 20 million files.

• 64 GB of RAM assigned to the VM, for tasks that transfer more than 20 million files.

Amazon EC2 instance requirements

When deploying a DataSync agent with Amazon EC2, the instance size must be at least 2xlarge.

We recommend using one of the following instance sizes:

m5.2xlarge: For tasks to transfer up to 20 million files.

m5.4xlarge: For tasks to transfer more than 20 million files.

Note

An exception to this is if you're running DataSync on a AWS Snowcone device. Use the default instance snc1.medium, which provides 2 CPU cores and 4 GiB of memory.

To connect to an Amazon EC2 agent using SSH, you must use the following cryptographic algorithms:

SSH cipher: aes128-ctr

Key exchange: diffie-hellman-group14-sha1

Network requirements

DataSync network requirements depend on how you plan to transfer data (for example, over the public internet or using a more private connection).

Use the following tables to help you configure network access for DataSync agents that transfer data from your self-managed storage system and through virtual private cloud (VPC), public service, Federal Information Processing Standard (FIPS) endpoints.

Topics

• Network requirements to connect to your self-managed storage (p. 11)

• Network requirements when using VPC endpoints (p. 12)

• Network requirements when using public service endpoints or FIPS endpoints (p. 15)

• Required network interfaces for data transfers (p. 19)

Network requirements to connect to your self- managed storage

To minimize network latency, deploy the DataSync agent close to your self-managed storage. Doing this ensures that files travel over the network between the DataSync agent and the DataSync service using our purpose-built, accelerated protocol, which significantly speeds up transfers.

(19)

Network requirements when using VPC endpoints

The following ports are required for communication between the DataSync agent and your Network File System (NFS) server, Hadoop Distributed File System (HDFS) cluster, Server Message Block (SMB) server, or Amazon S3 API compatible storage.

From To Protocol Port How used

Agent NFS server TCP/UDP 2049 (NFS) By the DataSync agent to

mount a source NFS file system.

Supports NFS v3.x, NFS v4.0, and NFS v4.1.

Agent SMB server TCP/UDP 139 (SMB)

or 445 (SMB)

By the DataSync agent to mount a source SMB file share.

Supports SMB 2.1 and SMB 3 versions.

Agent Self-managed object

storage TCP 443 (HTTPS)

or 80 (HTTP) By the DataSync agent to access your self-managed object storage.

Agent Hadoop cluster TCP NameNode

port (default is 8020)

By the DataSync agent to access the NameNodes in your Hadoop cluster.

Specify the port used when creating an HDFS location.

Agent Hadoop cluster TCP DataNode

port (default is 50010)

By the DataSync agent to access the DataNodes in your Hadoop cluster.

The DataSync agent automatically determines the port to use.

Agent Hadoop Key Management

Server (KMS) TCP KMS port

(default 9600)

By the DataSync agent to access the KMS for your Hadoop cluster.

Agent Kerberos Key Distribution

Center (KDC) server TCP KDC port

(default 88) By the DataSync agent when authenticating to the Kerberos realm. This port is used only with HDFS.

Network requirements when using VPC endpoints

If you use only private IP addresses, you can ensure that your VPC can't be reached over the internet, and you can prevent any packets from entering or exiting the network. By using private IP addresses, you can eliminate all internet access from your self-managed systems, and still use DataSync for data transfers to and from AWS.

DataSync requires the following ports for its operation when your agent is using private endpoints.

(20)

From To Protocol Port How used Your web

browser Your DataSync agent TCP 80 (HTTP) By your computer to obtain the agent activation key.

After successful activation, DataSync closes the agent's port 80.

The DataSync agent doesn't require port 80 to be publicly accessible. The required level of access to port 80 depends on your network configuration.

NoteAlternatively, you can obtain the activation key from the agent's local console.

This method does not require connectivity between the browser and your agent. For more information about using the local console to get the activation key, see Obtaining an activation key using the local console (p. 64).

Agent Your DataSync VPC endpoint

To find the correct IP address, open the Amazon VPC console, and choose Endpoints from the left navigation pane. Choose the DataSync endpoint, and check the Subnets list to find the private IP address that corresponds to the subnet that you chose for your VPC endpoint setup.

For more information, see step 5 in Configuring DataSync to use private IP addresses for data transfer (p. 56).

TCP 1024–1064 For control traffic between the DataSync agent and the AWS service.

(21)

Network requirements when using VPC endpoints

From To Protocol Port How used

Agent Your task's elastic network interfaces

To find the related IP addresses, open the Amazon EC2 console and choose Network Interfaces from the left navigation pane. To see the four network interfaces for the task, enter your task ID in the search filter.

For more information, see step 9 in Configuring DataSync to use private IP addresses for data transfer (p. 56).

TCP 443 (HTTPS) For data transfer from the DataSync VM to the AWS service.

Agent Your DataSync VPC

endpoint TCP 22 (Support

channel) To allow AWS Support to access your DataSync to help you with

troubleshooting DataSync issues.

You don't need this port open for normal operation, but it's required for troubleshooting.

Following is an illustration of the ports required by DataSync when using private endpoints.

(22)

Network requirements when using public service endpoints or FIPS endpoints

Your agent VM requires access to the following endpoints to communicate with AWS when using public service endpoints, or when using FIPS endpoints. Enabling this access is not necessary when using DataSync with VPC endpoints.

If you use a firewall or router to filter or limit network traffic, configure your firewall or router to allow these service endpoints. They're required to enable outbound communication between your network and AWS.

From To Protocol Port How used Endpoints accessed by the agent Your web

browser DataSync

agent TCP 80

(HTTP) Used by your computer to obtain the agent activation key.

After successful activation, DataSync closes the agent's port 80.

The DataSync agent doesn't require port 80 to be publicly accessible. The required level of access to port 80 depends on your network configuration.

NoteAlternatively, you can obtain the activation keyfrom the agent's local console.

This method does not require connectivity between thebrowser and your agent.

For more

N/A

(23)

Network requirements when using public service endpoints or FIPS endpoints

From To Protocol Port How used Endpoints accessed by the agent information

about using the local console to get the activation key, see Obtaining an activation keyusing the local console (p. 64).

Agent AWS TCP 443

(HTTPS) Used by the DataSync agent to activate with your AWS account. You can block the public endpoints after activation.

For public endpoint activation:

activation.datasync.us- east-2.amazonaws.com For FIPS endpoint activation:

activation.datasync-fips.us- east-2.amazonaws.com

Agent AWS TCP 443

(HTTPS) For

communication between the DataSync agent and the AWS service endpoint.

For information about Regions and service endpoints, see Choose a service endpoint (p. 26).

API endpoints:

datasync.us-

east-2.amazonaws.com Data transfer endpoints:

yourTaskId.datasync-dp.us- east-2.amazonaws.com cp.datasync.us- east-2.amazonaws.com Data transfer endpoints for FIPS:

cp.datasync-fips.us- east-2.amazonaws.com

(24)

From To Protocol Port How used Endpoints accessed by the agent

Agent AWS TCP 80

(HTTP) Allows the DataSync agent to get updates from AWS.

The activation_region variable is the AWS Region you used to activate your DataSync agent.

repo.default.amazonaws.com packages.us-

west-1.amazonaws.com packages.sa-

east-1.amazonaws.com repo.

$activation_region.amazonaws.com packages.

$activation_region.amazonaws.com

*.s3.

$activation_region.amazonaws.com

Agent AWS TCP 443

(HTTPS) Allows the DataSync agent to get updates from AWS.

The activation_region variable is the AWS Region you used to activate your DataSync agent.

amazonlinux.default.amazonaws.com cdn.amazonlinux.com

amazonlinux-2-repos- us-east-1.s3.dualstack.

$activation_region.amazonaws.com amazonlinux-2-

repos-us-east-1.s3.

$activation_region.amazonaws.com

Agent Domain

NameService (DNS) server

TCP/UDP 53 (DNS) For

communication between DataSync agent and the DNS server.

N/A

Agent AWS TCP 22

(Support channel)

Allows AWS Support to access your DataSync to help you with troubleshooting DataSync issues.

You don't need this port open for normal operation, but it's required for troubleshooting.

AWS support channel:

54.201.223.107

(25)

Network requirements when using public service endpoints or FIPS endpoints

From To Protocol Port How used Endpoints accessed by the agent Agent Network

Time Protocol (NTP) server

UDP 123

(NTP) Used by local systems to synchronize the VM time to the host time.

NTP:

0.amazon.pool.ntp.org 1.amazon.pool.ntp.org 2.amazon.pool.ntp.org 3.amazon.pool.ntp.org

NoteIf you want to change the default NTP configuration of your VM agent to use a different NTP server using the local console, see Configuring a Network Time Protocol (NTP) server for VMware agents (p. 68).

The following diagram shows the ports required by DataSync when using public service endpoints or FIPS endpoints.

(26)

Required network interfaces for data transfers

For every task you run, DataSync automatically creates elastic network interfaces (ENIs) to manage data transfer traffic. How many ENIs DataSync creates and where they’re created depends on the following details about your task:

• Whether your task requires a DataSync agent.

• Your source and destination locations (where you’re copying data from and to).

• The type of endpoint used to activate your agent.

Each ENI uses a single IP address in your subnet (the more ENIs there are, the more IP addresses you need). Use the following tables to make sure your subnet has enough IP addresses for your task.

Transfers with agents

You need a DataSync agent when copying data between a self-managed storage system and an AWS storage service.

Location ENIs created by default Where ENIs are created when using a public or FIPS endpoint

Where ENIs are created when using a private (VPC) endpoint

Amazon S3 4 N/A (ENIs aren’t

needed since DataSync communicates directly with the S3 bucket)

The subnet you specified when activating your DataSync agent.

Amazon EFS 4 The subnet you specify when

creating the Amazon EFS location.

Amazon FSx for

Windows File Server 4 The same subnet as the preferred

file server for the file system.

Amazon FSx for Lustre 4 The same subnet as the file system.

Transfers without agents

You don’t need a DataSync agent when copying data between AWS storage services.

NoteThe total number of ENIs depends on your DataSync task locations. For example, transferring from an Amazon EFS location to FSx for Lustre requires four ENIs. Meanwhile, transferring from an FSx for Windows File Server to an Amazon S3 bucket requires two ENIs.

Location ENIs created by default Where ENIs are created

Amazon S3 N/A (ENIs aren’t needed since DataSync

communicates directly with the S3 bucket)

Amazon EFS 2 The subnet you specify when

creating the Amazon EFS location.

(27)

Required network interfaces for data transfers

Location ENIs created by default Where ENIs are created

Amazon FSx for Windows File

Server 2 The same subnet as the

preferred file server for the file system.

Amazon FSx for Lustre 2 The same subnet as the file

system.

To see the ENIs allocated for your DataSync task, use the DescribeTask operation.

(28)

Getting started with AWS DataSync

In this topic, you can find step-by-step instructions on how to get started using AWS DataSync on the AWS Management Console.

Before you begin, we recommend reading How AWS DataSync works (p. 3) to understand the components and terms used in DataSync and how DataSync works. We also recommend reading the Using identity-based policies (IAM policies) for DataSync (p. 119) section to understand the AWS Identity and Access Management (IAM) permissions that DataSync requires.

To use AWS DataSync

1. Open the AWS DataSync console at https://console.aws.amazon.com/datasync/.

2. In the upper-right corner, choose the AWS Region where you want to run DataSync. We recommend choosing the AWS Region where you plan to locate your Amazon S3 bucket, Amazon EFS file system, Amazon FSx for Windows File Server file system, or Amazon FSx for Lustre file system.

If you haven't created DataSync resources in this AWS Region, the DataSync home page appears.

3. On the DataSync home page, select whether to create the data transfer task Between on-premises storage and AWS or Between AWS Storage services.

4. Choose Get started to begin using DataSync.

If this is your first time using DataSync in this AWS Region, the Create agent page appears. From this page, you can download your virtual machine (VM) or create an Amazon EC2 instance.

If you have used DataSync in this AWS Region, the Agents page appears and you can see your agents listed.

Next, take the following steps.

Topics

• Create an agent (p. 21)

• Configure a source location (p. 29)

• Configure a destination location (p. 30)

• Configure task settings (p. 31)

• Review your settings and create your task (p. 34)

• Start your task (p. 34)

• Clean up resources (p. 35)

Create an agent

For AWS DataSync to access your self-managed storage (whether on-premises or in the cloud), you need a DataSync agent associated with your AWS account.

Tip

An agent isn't required when transferring between AWS storage services in the same AWS account. To set up a data transfer between two AWS services, see Configure a source location (p. 29).

Topics

• Deploy your DataSync agent (p. 22)

(29)

Deploy your agent

• Choose a service endpoint (p. 26)

• Activate your agent (p. 28)

Deploy your DataSync agent

Where you deploy your AWS DataSync agent depends on where you're copying data to and from and whether you're working with on-premises or in-cloud storage systems.

Topics

• Deploy your agent on VMware (p. 22)

• Deploy your agent on KVM (p. 22)

• Deploy your agent on Hyper-V (p. 23)

• Deploy your agent as an Amazon EC2 instance (p. 24)

• Deploy your agent on Snow Family devices (p. 26)

• Deploy your agent on AWS Outposts (p. 26)

Deploy your agent on VMware

You can download and deploy an AWS DataSync agent in your VMware environment and then activate it. You can also use an existing agent instead of deploying a new one. You can use a previously created agent if it can access your self-managed storage and if it's activated in the same AWS Region.

To deploy an agent on VMware

1. Open the AWS DataSync console at https://console.aws.amazon.com/datasync/.

2. If you don't have an agent, on the Create agent page in the console, choose Download image in the Deploy agent section. Doing this downloads the agent and deploys it in your VMware ESXi hypervisor. The agent is available as a VM. If you want to deploy the agent as an Amazon EC2 instance, see Deploy your agent as an Amazon EC2 instance (p. 24).

AWS DataSync currently supports the VMware ESXi hypervisor. For information about hardware requirements for the VM, see Virtual machine requirements (p. 11). For information about how to deploy an .ova file in a VMware host, see the documentation for your hypervisor.

If you have previously activated an agent in this AWS Region and want to use that agent, choose that agent and choose Create agent. The Configure a source location (p. 29) page appears.

3. Power on your hypervisor, log in to your VM, and get the IP address of the agent. You need this IP address to activate the agent.

Note

The VM's default credentials are the login admin and the password password.

You can change the password on the local console. You don't need to log in to the VM for DataSync functionality. Login is mainly required for troubleshooting, such as running a connectivity test or opening a support channel with AWS. It's also required for network- specific settings, such as setting up a static IP address.

After you have deployed an agent, you choose a service endpoint (p. 26).

Deploy your agent on KVM

You can download and deploy an AWS DataSync agent in your KVM environment and then activate it.

You can also use an existing agent instead of deploying a new one. You can use a previously created agent if it can access your self-managed storage and if it's activated in the same AWS Region.

(30)

To deploy an agent on KVM

1. Open the AWS DataSync console at https://console.aws.amazon.com/datasync/.

2. If you don't have an agent, on the Create agent page in the console, choose Download image in the Deploy agent section. Doing this downloads the agent in a .zip file that contains a .qcow2 image file that can you can deploy in your KVM hypervisor.

The agent is available as a VM. If you want to deploy the agent as an Amazon EC2 instance, see Deploy your agent as an Amazon EC2 instance (p. 24).

AWS DataSync currently supports the KVM hypervisor. For information about hardware requirements for the VM, see Virtual machine requirements (p. 11).

To get started installing your .qcow2 image for use in KVM, use the following command.

virt-install \ --name "datasync" \

--description "AWS DataSync agent" \ --os-type=generic \

--ram=32768 \ --vcpus=4 \

--disk path=datasync-yyyymmdd-x86_64.qcow2,bus=virtio,size=80 \ --network default,model=virtio \

--graphics none \ --import

For information about how to manage this VM, and your KVM host, see the documentation for your hypervisor.

If you previously activated an agent in this AWS Region and want to use that agent, choose that agent, and then choose Create agent. The Configure a source location (p. 29) page appears.

3. Power on your hypervisor, log in to your VM, and get the IP address of the agent. You need this IP address to activate the agent.

NoteThe VM's default credentials are the login admin and the password password.

You can change the password on the local console. You don't need to log in to the VM for DataSync functionality. Login is mainly required for troubleshooting, such as running a connectivity test or opening a support channel with AWS. It's also required for network- specific settings, such as setting up a static IP address.

After you deploy an agent, you choose a service endpoint (p. 26).

Deploy your agent on Hyper-V

You can download and deploy an AWS DataSync agent in your Hyper-V environment and then activate it. You can also use an existing agent instead of deploying a new one. You can use a previously created agent if it can access your self-managed storage and if it's activated in the same AWS Region.

To deploy an agent on Hyper-V

1. Open the AWS DataSync console at https://console.aws.amazon.com/datasync/.

2. If you don't have an agent, on the Create agent page in the console, choose Download image in the Deploy agent section. Doing this downloads the agent in a .zip file that contains a .vhdx image file that can you can deploy in your Hyper-V hypervisor.

(31)

Deploy your agent

The agent is available as a VM. If you want to deploy the agent as an Amazon EC2 instance, see Deploy your agent as an Amazon EC2 instance (p. 24).

AWS DataSync currently supports the Hyper-V hypervisor. For information about hardware requirements for the VM, see Virtual machine requirements (p. 11). For information about how to deploy a .vhdx file in a Hyper-V host, see the documentation for your hypervisor.

If you previously activated an agent in this AWS Region and want to use that agent, choose that agent, and then choose Create agent. The Configure a source location (p. 29) page appears.

3. Power on your hypervisor, log in to your VM, and get the IP address of the agent. You need this IP address to activate the agent.

Note

The VM's default credentials are the login admin and the password password.

You can change the password on the local console. You don't need to log in to the VM for DataSync functionality. Login is mainly required for troubleshooting, such as running a connectivity test or opening a support channel with AWS. It's also required for network- specific settings, such as setting up a static IP address.

After you deploy an agent, you choose a service endpoint (p. 26).

Deploy your agent as an Amazon EC2 instance

You deploy a DataSync agent as an Amazon EC2 instance when copying data between:

• A self-managed, in-cloud storage system and an AWS storage service.

For more information about these use cases, including high-level architecture diagrams, see Deploying your DataSync agent in AWS Regions (p. 59).

• Amazon S3 on AWS Outposts (p. 26) and an AWS storage service.

Warning

We don't recommend using an Amazon EC2 agent to access your on-premises storage because of increased network latency. Instead, deploy the agent as a VMware, KVM, or Hyper-V virtual machine in your data center as close to your on-premises storage as possible.

To choose the agent AMI for your AWS Region

• Use the following CLI command to get the latest DataSync Amazon Machine Image (AMI) ID for the specified AWS Region.

aws ssm get-parameter --name /aws/service/datasync/ami --region $region

Example Example command and output

aws ssm get-parameter --name /aws/service/datasync/ami --region us-east-1

{ "Parameter": {

"Name": "/aws/service/datasync/ami", "Type": "String",

"Value": "ami-id", "Version": 6,

"LastModifiedDate": 1569946277.996,

(32)

"ARN": "arn:aws:ssm:us-east-1::parameter/aws/service/datasync/ami"

} }

For the recommended instance types, see Amazon EC2 instance requirements (p. 11).

If you activate an agent in the Region that has access to your file system using a mount target in the same Availability Zone and you want to use that agent, choose the agent and select choose Create agent. The Configure a source location (p. 29) page appears.

To deploy your DataSync agent as an Amazon EC2 instance Important

To avoid charges, deploy your agent in a way that it doesn't require network traffic between Availability Zones. For example, deploy your agent in the Availability Zone where your self- managed file system resides.

To learn more about data transfer prices for all AWS Regions, see Amazon EC2 On-Demand pricing.

1. From the AWS account where the source file system resides, launch the agent using your AMI from the Amazon EC2 launch wizard. Use the following URL to launch the AMI.

https://console.aws.amazon.com/ec2/v2/home?region=source-file-system- region#LaunchInstanceWizard:ami=ami-id

In the URL, replace the source-file-system-region and ami-id with your own source AWS Region and AMI ID. The Choose an Instance Type page appears on the Amazon EC2 console. To find the DataSync AMI ID for a specified AWS Region, use the .AMI-command CLI command described in the preceding section.

2. Choose one of the recommended instance types for your use case, and choose Next:

Configure Instance Details. For the recommended instance types, see Amazon EC2 instance requirements (p. 11).

3. On the Configure Instance Details page, do the following:

a. For Network, choose the virtual private cloud (VPC) where your source Amazon EFS or NFS file system is located.

b. For Auto-assign Public IP, choose a value. For your instance to be accessible from the public internet, set Auto-assign Public IP to Enable. Otherwise, set Auto-assign Public IP to Disable.

If a public IP address isn't assigned, activate the agent in your VPC using its private IP address.

When you transfer files from an in-cloud file system, to increase performance we recommend that you choose a Placement Group value where your NFS server resides.

4. Choose Next: Add Storage. The agent doesn't require additional storage, so you can skip this step and choose Next: Add tags.

5. (Optional) On the Add Tags page, you can add tags to your Amazon EC2 instance. When you're finished on the page, choose Next: Configure Security Group.

6. On the Configure Security Group page, do the following:

a. Make sure that the selected security group allows inbound access to HTTP port 80 from the web browser that you plan to use to activate the agent.

b. Make sure that the security group of the source file system allows inbound traffic from the agent. In addition, make sure that the agent allows outbound traffic to the source file system.

If you deploy your agent using a VPC endpoint, you need to allow additional ports. For more information, see How DataSync works with VPC endpoints (p. 56).

(33)

Choose a service endpoint

For the complete set of network requirements for DataSync, see Network requirements (p. 11).

7. Choose Review and Launch to review your configuration, then choose Launch to launch your instance. Remember to use a key pair that's accessible to you. A confirmation page appears and indicates that your instance is launching.

8. Choose View Instances to close the confirmation page and return to the Amazon EC2 instances screen. When you launch an instance, its initial state is pending. After the instance starts, its state changes to running. At this point, it's assigned a public Domain Name System (DNS) name and IP address, you can find these in the Descriptions tab.

9. If you set Auto-assign Public IP to Enable, choose your instance and note the public IP address in the Description tab. You use this IP address later to connect to your sync agent.

If you set Auto-assign Public IP to Disable, launch or use an existing instance in your VPC to

activate the agent. In this case, you use the private IP address of the sync agent to activate the agent from this instance in the VPC.

Deploy your agent on Snow Family devices

The DataSync agent AMI is pre-installed on your Snow Family Device. You can use AWS OpsHub for Snow Family or the AWS Snowball Edge CLI command line tool to launch the agent and attach a virtual interface to the agent. Then, use the virtual interface's IP address to activate the agent.

For instructions on launching the agent using AWS OpsHub, see Using DataSync to transfer files to AWS.

For instructions on launching the agent using the Snowball CLI, see Launching AWS DataSync AMI.

For information about using the AWS Snowcone client, see Using the Snowcone client.

Deploy your agent on AWS Outposts

You can launch a DataSync Amazon EC2 instance on your AWS Outpost. To learn more about launching an AMI on AWS Outposts, see Launch an instance on your Outpost in the AWS Outposts User Guide.

When using DataSync to access Amazon S3 on Outposts, you must launch the agent in a VPC that's allowed to access your Amazon S3 access point, and activate the agent in the Outpost's parent Region.

The agent must also be able to route to the Amazon S3 on Outposts endpoint for the bucket. To learn more about working with Amazon S3 on Outposts endpoints, see Working with Amazon S3 on Outposts in the Amazon S3 User Guide.

Choose a service endpoint

You must specify an endpoint that your AWS DataSync agent uses to communicate with AWS. The agent can connect to the following types of endpoints:

Public endpoints: If you use public endpoints, all communication from your DataSync agent to AWS occurs over the public internet. For instructions, see Choose a public service endpoint (p. 27).

Federal Information Processing Standard (FIPS) endpoints: If you need FIPS 140-2 validated cryptographic modules when accessing the AWS GovCloud (US-East) or AWS GovCloud (US-West) Region, use this endpoint to activate your agent. You use the AWS CLI or API to access this endpoint.

For more information, see Federal Information Processing Standard (FIPS) 140-2.

Virtual private cloud (VPC) endpoints: If you use a VPC endpoint, all communication from DataSync to AWS occurs through the endpoint in your AWS VPC. This establishes a private connection between your self-managed storage system, your VPC, and AWS services, providing extra security as your data is copied over the network. For instructions, see Using AWS DataSync in a virtual private cloud (p. 56).

(34)

NoteAfter you choose a service endpoint type and activate your agent, you can't change it to use a different service endpoint type later. If you need to transfer data to multiple endpoint types, create a DataSync agent for each endpoint type that you use.

For more information about service endpoints, see AWS DataSync in the AWS General Reference.

Topics

• Choose a public service endpoint (p. 27)

• Choose a FIPS service endpoint (p. 27)

• Choose a VPC endpoint (p. 27)

Choose a public service endpoint

If you use a public endpoint, all communication from your DataSync agent to AWS occurs over the public internet.

To choose a public service endpoint

1. Open the AWS DataSync console at https://console.aws.amazon.com/datasync/.

2. Go to the Agents page and choose Create agent.

3. In the Service endpoint section, choose Public service endpoints in AWS Region name. For a list of supported AWS Regions, see AWS DataSync in the AWS General Reference.

Next Step: the section called “Activate your agent” (p. 28)

Choose a FIPS service endpoint

If you use a FIPS service endpoint, DataSync communicates with the AWS GovCloud (US) or Canada (Central) Region.

To choose a FIPS service endpoint

1. Open the AWS DataSync console at https://console.aws.amazon.com/datasync/.

2. Go to the Agents page and choose Create agent.

3. In the Service endpoint section, choose the FIPS endpoint that you want. For information about supported FIPS endpoint, see AWS DataSync in the AWS General Reference.

Next step: the section called “Activate your agent” (p. 28)

Choose a VPC endpoint

If you use a VPC endpoint, all communication from DataSync to AWS services occurs through the VPC endpoint in your VPC in AWS. This approach provides a private connection between your self-managed data center, your VPC, and AWS services.

You can also use a VPC endpoint outside your VPC to connect your data center directly to AWS resources.

In this case, you use a virtual private network (VPN) or AWS Direct Connect. You set up a VPC route table to use the endpoint to access the service. For detailed information, see Routing for gateway endpoints.

To choose a VPC endpoint

1. Create a VPC endpoint. For instructions, see Creating an interface endpoint. If you already have a VPC endpoint in the AWS Region, you can use it.

(35)

Activate your agent

Important

In step 4 of the instructions mentioned preceding, choose

com.amazonaws.region.datasync for Service Name in the table of endpoints.

For information about supported AWS Regions, see AWS DataSync in the AWS General Reference.

2. Open the AWS DataSync console at https://console.aws.amazon.com/datasync/.

3. Go to the Agents page and choose Create agent.

4. In the Service endpoint section, choose VPC endpoints using AWS PrivateLink. This is the VPC endpoint that the agent has access to.

5. For VPC Endpoint, choose the private VPC endpoint that you want your agent to connect to. You noted the endpoint ID when you created the VPC endpoint.

6. For Subnet, choose the subnet in which you want to run your task. This is the subnet where the elastic network interface is created.

7. For Security Group, choose a security group for your task. This is the security group that protects your network interface for tasks that run on your agent.

For additional information about using DataSync in a VPC, see Using AWS DataSync in a virtual private cloud (p. 56).

Next step: the section called “Activate your agent” (p. 28)

Activate your agent

After you deploy your AWS DataSync agent and specify a service endpoint, you must activate the agent to associate it with your AWS account.

NoteAn agent can be associated with only one AWS account at a time.

To activate your agent

1. Open the AWS DataSync console at https://console.aws.amazon.com/datasync/.

2. Go to the Agents page and choose Create agent.

3. In the Activation key section, select Automatically get the activation key from your agent.

This option requires that your browser access the agent using port 80. Once activated, the agent closes the port. For more information, see Network requirements (p. 11).

Alternatively, select Manually enter your agent's activation key if you don't want a connection between your browser and agent. For more information, see Obtaining an activation key using the local console (p. 64).

(36)

4. For the Agent address, enter the agent's IP address or domain name and select Get key. Your browser connects to the IP address and gets a unique activation key from your agent.

If activation succeeds, the activation key is displayed. If the activation fails, make sure that your security group is configured properly and verify that your firewall allows the required ports.

5. (Optional) For Agent name, enter a name for your agent.

6. (Optional) For Tags, enter a key and value to add a tag to your agent. A tag is a key-value pair that helps you manage, filter, and search for your agents.

7. Choose Create agent. Your agent is listed on the Agents page. In the Service endpoint column, verify that your service endpoint is correct.

8. In the Tasks section of the page, choose Create task. The Configure source location page appears.

Configure a source location

A task consists of a pair of locations that data will be transferred between. The source location defines the storage system or service that you want to read data from. The destination location defines the storage system or service that you want to write data to.

For a list of all DataSync supported source and destination endpoints, see Working with locations (p. 72).

In the following walkthrough, we give an example of configuring a Network File System (NFS) file system as the source location.

To configure a different location type as your source location, see the following topics:

• Creating a location for NFS (p. 73)

• Creating a location for SMB (p. 75)

• Creating a location for HDFS (p. 77)

• Creating a location for object storage (p. 78)

• Creating a location for Amazon EFS (p. 79)

• Creating a location for FSx for Windows File Server (p. 81)

• Creating a location for FSx for Lustre (p. 83)

• Creating a location for Amazon S3 (p. 83)

To create an NFS location

1. On the Configure source location page, choose Create a new location or Choose existing location.

Create a new location enables you to define a new location and Choose existing location enables you to choose from locations that you have previously created in this AWS Region.

(37)

Configure a destination location

2. For Location type in the Configuration section, choose your NFS server from the list.

3. For Agents, choose your agent from the list. You can add more than one agent. For this walkthrough, we add only one agent.

NoteIn many cases, you might be transferring from an in-cloud NFS file system or an Amazon EFS file system. In such cases, make sure that you choose an agent that you created in an Amazon EC2 instance that can access this file system.

You can't use agents that are created with different endpoint types for the same task.

4. For NFS server, enter the IP address or domain name of your NFS server. An agent that's installed on-premises uses this host name to mount the NFS server in a network. The NFS server should allow full access to all files.

5. For Mount path, enter a path that's exported by the NFS server, or a subdirectory that can be mounted by other NFS clients in your network. The path is used to read data from or write data to your NFS server.

6. Choose Next to open the Configure destination location page.

Configure a destination location

A task consists of a pair of locations that data will be transferred between. The source location defines the storage system or service that you want to read data from. The destination location defines the storage system or service that you want to write data to.

For a list of all DataSync supported source and destination endpoints, see Working with locations (p. 72).

To configure a different location type, see the following topics:

• Creating a location for NFS (p. 73)

• Creating a location for SMB (p. 75)

(38)

• Creating a location for HDFS (p. 77)

• Creating a location for object storage (p. 78)

• Creating a location for Amazon EFS (p. 79)

• Creating a location for FSx for Windows File Server (p. 81)

• Creating a location for FSx for Lustre (p. 83)

• Creating a location for Amazon S3 (p. 83)

Configure task settings

After you have created an AWS DataSync agent and configured the source and destination locations, you can configure the settings for a new task. A task is a set of two locations (source and destination) and a set of options that you use to control the behavior of the task.

You configure task settings when creating a new task in the AWS DataSync console. You can also edit task settings by opening the AWS DataSync console at https://console.aws.amazon.com/datasync/, selecting the task you want to edit, and choosing Edit.

On the Configure settings page, for Task name - optional, enter a name for your task. Task name is an optional setting.

The Options section contains configuration options for running your task. The following sections provide more details about these options.

Topics

• Data verification options (p. 31)

• Ownership and permissions-related options (p. 32)

• File metadata options and file management (p. 32)

• Bandwidth options (p. 33)

• Filtering options (p. 33)

• Scheduling and queueing options (p. 33)

• Tags and logging options (p. 34)

Data verification options

As DataSync transfers data, it always performs data integrity checks during the transfer. You can enable additional verification to compare source and destination at the end of a transfer. This additional check can verify the entire dataset or only the files that were transferred as part of the task execution. For most use cases, we recommend verifying only the files transferred.

Task data verification options specify how to verify data that's transferred by the task.

Data verification options are as follows:

Verify only the data transferred (recommended) – This option calculates the checksum of transferred files and metadata on the source. It then compares this checksum to the checksum calculated on those files at the destination at the end of the transfer. We recommend this option when transferring to S3 Glacier Flexible Retrieval or S3 Glacier Deep Archive storage classes. For more information, see Considerations when working with Amazon S3 storage classes in DataSync (p. 86).

Verify all data in the destination – This option performs a scan at the end of the transfer of the entire source and entire destination to verify that source and destination are fully synchronized. You can't use this option when transferring to S3 Glacier Flexible Retrieval or S3 Glacier Deep Archive storage

參考文獻

相關文件

Compute interface normal, using method such as Gradient or least squares of Youngs or Puckett Determine interface location by iterative bisection..

A=fscanf(fid , format, size) reads data from the file specified by file identifier fid , converts it according to the specified format string, and returns it in matrix A..

„ A socket is a file descriptor that lets an application read/write data from/to the network. „ Once configured the

● Permission for files is easy to understand: read permission for read, write permission for modification, and execute permission for execute (if the file is executable). ●

For the data sets used in this thesis we find that F-score performs well when the number of features is large, and for small data the two methods using the gradient of the

The packed comparison instructions compare the destination (second) operand to the source (first) oper- and to test for equality or greater than.. These instructions compare eight

The remaining positions contain //the rest of the original array elements //the rest of the original array elements.

Put the current record with the “smaller” key field value in OutputFile if (that current record is the last record in its corresponding input file) :. Declare that input file to be