FSx for Lustre

(1)

FSx for Lustre

Lustre User Guide

(2)

FSx for Lustre: Lustre User Guide

Amazon's trademarks and trade dress may not be used in connection with any product or service that is not Amazon's, in any manner that is likely to cause confusion among customers, or in any manner that disparages or discredits Amazon. All other trademarks not owned by Amazon are the property of their respective owners, who may or may not be aﬃliated with, connected to, or sponsored by Amazon.

(3)

What is Amazon FSx for Lustre?

FSx for Lustre makes it easy and cost-effective to launch and run the popular, high-performance Lustre file system. You use Lustre for workloads where speed matters, such as machine learning, high performance computing (HPC), video processing, and financial modeling.

The open-source Lustre ﬁle system is designed for applications that require fast storage—where you want your storage to keep up with your compute. Lustre was built to solve the problem of quickly and cheaply processing the world's ever-growing datasets. It's a widely used ﬁle system designed for the fastest computers in the world. It provides sub-millisecond latencies, up to hundreds of GBps of throughput, and up to millions of IOPS. For more information on Lustre, see the Lustre website.

As a fully managed service, Amazon FSx makes it easier for you to use Lustre for workloads where storage speed matters. FSx for Lustre eliminates the traditional complexity of setting up and managing Lustre ﬁle systems, enabling you to spin up and run a battle-tested high-performance ﬁle system in minutes. It also provides multiple deployment options so you can optimize cost for your needs.

FSx for Lustre is POSIX-compliant, so you can use your current Linux-based applications without having to make any changes. FSx for Lustre provides a native file system interface and works as any file system does with your Linux operating system. It also provides read-after-write consistency and supports file locking.

Topics

• Multiple deployment options (p. 1)

• Multiple storage options (p. 1)

• FSx for Lustre and data repositories (p. 2)

• Accessing FSx for Lustre ﬁle systems (p. 2)

• Integrations with AWS services (p. 3)

• Security and compliance (p. 3)

• Assumptions (p. 3)

• Pricing for Amazon FSx for Lustre (p. 4)

• Amazon FSx for Lustre forums (p. 4)

• Are you a ﬁrst-time user of Amazon FSx for Lustre? (p. 4)

Multiple deployment options

Amazon FSx for Lustre offers a choice of scratch and persistent file systems to accommodate different data processing needs. Scratch file systems are ideal for temporary storage and shorter-term processing of data. Data is not replicated and does not persist if a file server fails. Persistent file systems are ideal for longer-term storage and throughput-focused workloads. In persistent file systems, data is replicated, and file servers are replaced if they fail. For more information, see Deployment options for FSx for Lustre file systems (p. 17).

Multiple storage options

Amazon FSx for Lustre oﬀers a choice of solid state drive (SSD) and hard disk drive (HDD) storage types that are optimized for diﬀerent data processing requirements:

• SSD storage options – For low-latency, IOPS-intensive workloads that typically have small, random ﬁle operations, choose one of the SSD storage options.

(8)

• HDD storage options – For throughput-intensive workloads that typically have large, sequential ﬁle operations, choose one of the HDD storage options.

If you are provisioning a file system with the HDD storage option, you can optionally provision a read- only SSD cache that is sized to 20 percent of your HDD storage capacity. This provides sub-millisecond latencies and higher IOPS for frequently accessed files. Both SSD-based and HDD-based file systems are provisioned with SSD-based metadata servers. As a result, all metadata operations, which represent the majority of file system operations, are delivered with sub-millisecond latencies.

For more information about performance of these storage options, see Amazon FSx for Lustre performance (p. 61).

FSx for Lustre and data repositories

You can link FSx for Lustre ﬁle systems to data repositories on Amazon S3 or to on-premises data stores.

FSx for Lustre S3 data repository integration

FSx for Lustre integrates with Amazon S3, making it easier for you to process cloud datasets using the Lustre high-performance file system. When linked to an Amazon S3 bucket, an FSx for Lustre file system transparently presents S3 objects as files. Amazon FSx imports listings of all existing files in your S3 bucket at file system creation. Amazon FSx can also import listings of files added to the data repository after the file system is created. You can set the import preferences to match your workflow needs. The file system also makes it possible for you to write file system data back to S3. Data repository tasks simplify the transfer of data and metadata between your FSx for Lustre file system and its durable data repository on Amazon S3. For more information, see Using data repositories with Amazon FSx for Lustre (p. 21) and Data repository tasks (p. 39).

FSx for Lustre and on-premises data repositories

With Amazon FSx for Lustre, you can burst your data processing workloads from on-premises into the AWS Cloud by importing data using AWS Direct Connect or AWS VPN. For more information, see Using Amazon FSx with your on-premises data repository (p. 51).

Accessing FSx for Lustre ﬁle systems

You can mix and match the compute instance types and Linux Amazon Machine Images (AMIs) that are connected to a single FSx for Lustre ﬁle system.

Amazon FSx for Lustre ﬁle systems are accessible from compute workloads running on Amazon Elastic Compute Cloud (Amazon EC2) instances, on Amazon Elastic Container Service (Amazon ECS) Docker containers, and containers running on Amazon Elastic Kubernetes Service (Amazon EKS).

• Amazon EC2 – You access your file system from your Amazon EC2 compute instances using the open- source Lustre client. Amazon EC2 instances can access your file system from other Availability Zones within the same Amazon Virtual Private Cloud (Amazon VPC), provided your networking configuration provides for access across subnets within the VPC. After your Amazon FSx for Lustre file system is mounted, you can work with its files and directories just as you do using a local file system.

• Amazon EKS – You access Amazon FSx for Lustre from containers running on Amazon EKS using the open-source FSx for Lustre CSI driver, as described in Amazon EKS User Guide. Your containers running on Amazon EKS can use high-performance persistent volumes (PVs) backed by Amazon FSx for Lustre.

(9)

Integrations with AWS services

• Amazon ECS – You access Amazon FSx for Lustre from Amazon ECS Docker containers on Amazon EC2 instances. For more information, see Mounting from Amazon Elastic Container Service (p. 86).

Amazon FSx for Lustre is compatible with the most popular Linux-based AMIs, including Amazon Linux 2 and Amazon Linux, Red Hat Enterprise Linux (RHEL), CentOS, Ubuntu, and SUSE Linux. The Lustre client is included with Amazon Linux 2 and Amazon Linux. For RHEL, CentOS, and Ubuntu, an AWS Lustre client repository provides clients that are compatible with these operating systems.

Using FSx for Lustre, you can burst your compute-intensive workloads from on-premises into the AWS Cloud by importing data over AWS Direct Connect or AWS Virtual Private Network. You can access your Amazon FSx ﬁle system from on-premises, copy data into your ﬁle system as-needed, and run compute- intensive workloads on in-cloud instances.

For more information on the clients, compute instances, and environments from which you can access FSx for Lustre ﬁle systems, see Accessing ﬁle systems (p. 71).

Integrations with AWS services

Amazon FSx for Lustre integrates with SageMaker as an input data source. When using SageMaker with FSx for Lustre, your machine learning training jobs are accelerated by eliminating the initial download step from Amazon S3. Additionally, your total cost of ownership (TCO) is reduced by avoiding the

repetitive download of common objects for iterative jobs on the same dataset as you save on S3 requests costs. For more information, see What Is SageMaker? in the Amazon SageMaker Developer Guide.

FSx for Lustre integrates with AWS Batch using EC2 Launch Templates. AWS Batch enables you to run batch computing workloads on the AWS Cloud, including high performance computing (HPC), machine learning (ML), and other asynchronous workloads. AWS Batch automatically and dynamically sizes instances based on job resource requirements. For more information, see What Is AWS Batch? in the AWS Batch User Guide.

FSx for Lustre integrates with AWS ParallelCluster. AWS ParallelCluster is an AWS-supported open-source cluster management tool used to deploy and manage HPC clusters. It can automatically create FSx for Lustre ﬁle systems or use existing ﬁle systems during the cluster creation process.

Security and compliance

FSx for Lustre file systems support encryption at rest and in transit. Amazon FSx automatically encrypts file system data at rest using keys managed in AWS Key Management Service (AWS KMS). Data in transit is also automatically encrypted on file systems in certain AWS Regions when accessed from supported Amazon EC2 instances. For more information about data encryption in FSx for Lustre, including AWS Regions where encryption of data in transit is supported, see Data encryption in Amazon FSx for Lustre (p. 129). Amazon FSx has been assessed to comply with ISO, PCI-DSS, and SOC certifications, and is HIPAA eligible. For more information, see Security in FSx for Lustre (p. 128).

Assumptions

In this guide, we make the following assumptions:

• If you use Amazon Elastic Compute Cloud (Amazon EC2), we assume that you're familiar with that service. For more information on how to use Amazon EC2, see the Amazon EC2 documentation.

• We assume that you are familiar with using Amazon Virtual Private Cloud (Amazon VPC). For more information on how to use Amazon VPC, see the Amazon VPC User Guide.

(10)

• We assume that you haven't changed the rules on the default security group for your VPC based on the Amazon VPC service. If you have, make sure that you add the necessary rules to allow network traﬃc from your Amazon EC2 instance to your Amazon FSx for Lustre ﬁle system. For more details, see File System Access Control with Amazon VPC (p. 132).

Pricing for Amazon FSx for Lustre

With Amazon FSx for Lustre, there are no upfront hardware or software costs. You pay for only the resources used, with no minimum commitments, setup costs, or additional fees. For information about the pricing and fees associated with the service, see Amazon FSx for Lustre Pricing.

Amazon FSx for Lustre forums

If you encounter issues while using Amazon FSx for Lustre, check the forums.

Are you a ﬁrst-time user of Amazon FSx for Lustre?

If you are a ﬁrst-time user of Amazon FSx for Lustre, we recommend that you read the following sections in order:

1. If you're ready to create your ﬁrst Amazon FSx for Lustre ﬁle system, try Getting started with Amazon FSx for Lustre (p. 10).

2. For information on performance, see Amazon FSx for Lustre performance (p. 61).

3. For information on linking your ﬁle system to an Amazon S3 bucket data repository, see Using data repositories with Amazon FSx for Lustre (p. 21).

4. For Amazon FSx for Lustre security details, see Security in FSx for Lustre (p. 128).

5. For information on the scalability limits of Amazon FSx for Lustre, including throughput and ﬁle system size, see Quotas (p. 158).

6. For information on the Amazon FSx for Lustre API, see the Amazon FSx for Lustre API Reference.

(11)

Sign up for AWS

Setting up

Before you use Amazon FSx for Lustre for the ﬁrst time, complete the following tasks:

1.Sign up for AWS (p. 5) 2.Create an IAM user (p. 5)

Sign up for AWS

When you sign up for Amazon Web Services, your AWS account is automatically signed up for all services in AWS, including Amazon FSx for Lustre.

If you have an AWS account already, skip to the next task. If you don't have an AWS account, use the following procedure to create one.

To create an AWS account

1. Open https://portal.aws.amazon.com/billing/signup.

2. Follow the online instructions.

Part of the sign-up procedure involves receiving a phone call and entering a veriﬁcation code on the phone keypad.

Note your AWS account number, because you need it for the next task.

Create an IAM user

Services in AWS, such as Amazon FSx for Lustre, require that you provide credentials when you access them, so that the service can determine whether you have permissions to access its resources. AWS recommends that you don't use the root credentials of your AWS account to make requests. Instead, create an AWS Identity and Access Management (IAM) user and grant that user full access. We call these users administrator users.

You can use the administrator user credentials, instead of root credentials of your account, to interact with AWS and perform tasks, such as create users and grant them permissions. For more information, see Root Account Credentials vs. IAM User Credentials in the AWS General Reference and IAM Best in the IAM User Guide.

If you signed up for AWS but have not created an IAM user for yourself, you can create one using the IAM Management Console.

To create an administrator user for yourself and add the user to an administrators group (console)

1. Sign in to the IAM console as the account owner by choosing Root user and entering your AWS account email address. On the next page, enter your password.

(12)

NoteWe strongly recommend that you adhere to the best practice of using the Administrator IAM user that follows and securely lock away the root user credentials. Sign in as the root user only to perform a few account and service management tasks.

2. In the navigation pane, choose Users and then choose Add user.

3. For User name, enter Administrator.

4. Select the check box next to AWS Management Console access. Then select Custom password, and then enter your new password in the text box.

5. (Optional) By default, AWS requires the new user to create a new password when ﬁrst signing in. You can clear the check box next to User must create a new password at next sign-in to allow the new user to reset their password after they sign in.

6. Choose Next: Permissions.

7. Under Set permissions, choose Add user to group.

8. Choose Create group.

9. In the Create group dialog box, for Group name enter Administrators.

10. Choose Filter policies, and then select AWS managed - job function to ﬁlter the table contents.

11. In the policy list, select the check box for AdministratorAccess. Then choose Create group.

Note

You must activate IAM user and role access to Billing before you can use the

AdministratorAccess permissions to access the AWS Billing and Cost Management console. To do this, follow the instructions in step 1 of the tutorial about delegating access to the billing console.

12. Back in the list of groups, select the check box for your new group. Choose Refresh if necessary to see the group in the list.

13. Choose Next: Tags.

14. (Optional) Add metadata to the user by attaching tags as key-value pairs. For more information about using tags in IAM, see Tagging IAM entities in the IAM User Guide.

15. Choose Next: Review to see the list of group memberships to be added to the new user. When you are ready to proceed, choose Create user.

You can use this same process to create more groups and users and to give your users access to your AWS account resources. To learn about using policies that restrict user permissions to speciﬁc AWS resources, see Access management and Example policies.

To sign in as this new IAM user, ﬁrst sign out of the AWS Management Console. Then use the following URL, where your_aws_account_id is your AWS account number without the hyphens (for example, if your AWS account number is 1234-5678-9012, your AWS account ID is 123456789012).

https://your_aws_account_id.signin.aws.amazon.com/console/

Enter the IAM user name and password that you just created. When you're signed in, the navigation bar displays your_user_name@your_aws_account_id.

If you don't want the URL for your sign-in page to contain your AWS account ID, you can create an account alias. To do so, from the IAM dashboard, choose Create Account Alias and enter an alias, such as your company name. To sign in after you create an account alias, use the following URL.

https://your_account_alias.signin.aws.amazon.com/console/

To verify the sign-in link for IAM users for your account, open the IAM console and check under AWS Account Alias on the dashboard.

(13)

Adding permissions to use data repositories in Amazon S3

Amazon FSx for Lustre is deeply integrated with Amazon S3. This integration means that applications that access your FSx for Lustre ﬁle system can also seamlessly access the objects stored in your linked Amazon S3 bucket. For more information, see Using data repositories with Amazon FSx for Lustre (p. 21).

To use data repositories, you must ﬁrst allow Amazon FSx for Lustre certain IAM permissions in a role associated with the account for your administrator user.

To embed an inline policy for a role using the console

1. Sign in to the AWS Management Console and open the IAM console at https://

console.aws.amazon.com//iam/.

2. In the navigation pane, choose Roles.

3. In the list, choose the name of the role to embed a policy in.

4. Choose the Permissions tab.

5. Scroll to the bottom of the page and choose Add inline policy.

NoteYou can't embed an inline policy in a service-linked role in IAM. Because the linked service deﬁnes whether you can modify the permissions of the role, you might be able to add additional policies from the service console, API, or AWS CLI. To view the service-linked role documentation for a service, see AWS Services That Work with IAM and choose Yes in the Service-Linked Role column for your service.

6. Choose Creating Policies with the Visual Editor 7. Add the following permissions policy statement.

{

"Version": "2012-10-17", "Statement": {

"Effect": "Allow", "Action": [

"iam:CreateServiceLinkedRole", "iam:AttachRolePolicy", "iam:PutRolePolicy"

],

"Resource": "arn:aws:iam::*:role/aws-service-role/s3.data- source.lustre.fsx.amazonaws.com/*"

} }

After you create an inline policy, it is automatically embedded in your role. For more information about service-linked roles, see Using service-linked roles for Amazon FSx for Lustre (p. 145).

How FSx for Lustre checks for access to linked S3 buckets

If the IAM role that you use to create the FSx for Lustre ﬁle system does not have the iam:AttachRolePolicy and iam:PutRolePolicy permissions, then Amazon FSx checks whether it can update your S3 bucket policy. Amazon FSx can update your bucket policy if the

(14)

s3:PutBucketPolicy permission is included in your IAM role to allow the Amazon FSx ﬁle system to import or export data to your S3 bucket. If allowed to modify the bucket policy, Amazon FSx adds the following permissions to the bucket policy:

• s3:AbortMultipartUpload

• s3:DeleteObject

• s3:PutObject

• s3:Get*

• s3:List*

• s3:PutBucketNotification

• s3:PutBucketPolicy

• s3:DeleteBucketPolicy

If Amazon FSx can't modify the bucket policy, it then checks if the existing bucket policy grants Amazon FSx access to the bucket.

If all of these options fail, then the request to create the ﬁle system fails. The following diagram

illustrates the checks that Amazon FSx follows when determining whether a ﬁle system can access the S3 bucket to which it will be linked.

(15)

Next step

Getting started with Amazon FSx for Lustre (p. 10)

(16)

Getting started with Amazon FSx for Lustre

Following, you can learn how to get started using Amazon FSx for Lustre. These steps walk you through creating an Amazon FSx for Lustre file system and accessing it from your compute instances. Optionally, they show how to use your Amazon FSx for Lustre file system to process the data in your Amazon S3 bucket with your file-based applications.

This getting started exercise includes the following steps.

Topics

• Prerequisites (p. 10)

• Step 1: Create your Amazon FSx for Lustre ﬁle system (p. 10)

• Step 2: Install and conﬁgure the Lustre client on your instance before mounting your ﬁle system (p. 14)

• Step 3: Run your analysis (p. 15)

• (Optional) Step 4: Check Amazon FSx ﬁle system status (p. 16)

• Step 5: Clean up resources (p. 16)

Prerequisites

To perform this getting started exercise, you need the following:

• An AWS account with the permissions necessary to create an Amazon FSx for Lustre ﬁle system and an Amazon EC2 instance. For more information, see Setting up (p. 5).

• An Amazon EC2 instance running a supported Linux release in your virtual private cloud (VPC) based on the Amazon VPC service. You will install the Lustre client on this EC2 instance, and then mount your FSx for Lustre ﬁle system on the EC2 instance. The Lustre client supports Amazon Linux, Amazon Linux 2, CentOS and Red Hat Enterprise Linux 7.5, 7.6, 7.7, 7.8, 7.9, 8.2, and 8.3, SUSE Linux Enterprise Server 12 SP3, SP4, and SP5, and Ubuntu 16.04, 18.04, and 20.04. For this getting started exercise, we recommend using Amazon Linux 2.

When creating your Amazon EC2 instance for this getting started exercise, keep the following in mind:

• We recommend that you create your instance in your default VPC.

• We recommend that you use the default security group when creating your EC2 instance.

• An Amazon S3 bucket storing the data for your workload to process. The S3 bucket will be the linked durable data repository for your FSx for Lustre ﬁle system.

• Determine which type of Amazon FSx for Lustre ﬁle system you want to create, scratch or persistent.

For more information, see File system deployment options for FSx for Lustre (p. 17).

Step 1: Create your Amazon FSx for Lustre ﬁle system

Next, you create your ﬁle system in the console.

(17)

Step 1: Create your Amazon FSx for Lustre ﬁle system

To create your ﬁle system

1. Open the Amazon FSx console at https://console.aws.amazon.com/fsx/.

2. From the dashboard, choose Create ﬁle system to start the ﬁle system creation wizard.

3. Choose FSx for Lustre and then choose Next to display the Create File System page.

4. Provide the information in the File system details section:

• For File system name-optional, provide a name for your ﬁle system. You can use up to 256 Unicode letters, white space, and numbers plus the special characters + - = . _ : /.

• For Deployment and storage type, choose one of the options:

SSD storage provides low-latency, IOPS-intensive workloads that typically have small, random ﬁle operations. HDD storage provides throughput-intensive workloads that typically have large, sequential ﬁle operations.

For more information about storage types, see Multiple storage options (p. 1).

For more information about deployment types, see Deployment options for FSx for Lustre ﬁle systems (p. 17).

For more information about the AWS Regions where encrypting data in transit is available, see Encrypting data in transit (p. 130).

• Choose the Persistent, SSD deployment type for longer-term storage and for latency-sensitive workloads requiring the highest levels of IOPS/throughput. The file servers are highly available, data is automatically replicated within the file system's Availability Zone, and this type supports encrypting data in transit. Persistent, SSD uses Persistent 2, the latest-generation of persistent file systems.

• Choose the Persistent, HDD deployment type for longer-term storage and for throughput- focused workloads that aren't latency-sensitive. The ﬁle servers are highly available, data is automatically replicated within the ﬁle system's Availability Zone, and this type supports encrypting data in transit. Persistent, HDD uses the Persistent 1 deployment type.

Choose with SSD cache to create an SSD cache that is sized to 20 percent of your HDD storage capacity to provide sub-millisecond latencies and higher IOPS for frequently accessed ﬁles.

• Choose the Scratch, SSD deployment type for temporary storage and shorter-term processing of data. Scratch, SSD uses Scratch 2 ﬁle systems, and oﬀers in-transit encryption of data.

• Choose the amount of Throughput per unit of storage that you want for your ﬁle system. This option is only valid for Persistent deployment types.

Throughput per unit of storage is the amount of read and write throughput for each 1 tebibyte (TiB) of storage provisioned, in MB/s/TiB. You pay for the amount of throughput that you provision:

• For Persistent SSD storage, choose a value of either 125, 250, 500, or 1,000 MB/s/TiB.

• For Persistent HDD storage, choose a value of 12 or 40 MB/s/TiB.

For more information about throughput per unit storage and ﬁle system performance, see Aggregate ﬁle system performance (p. 62).

• For Storage capacity, set the amount of storage capacity for your ﬁle system, in TiB:

• For a Persistent, SSD deployment type, set this to a value of 1.2 TiB, 2.4 TiB, or increments of 2.4 TiB.

• For a Persistent, HDD deployment type, this value can be increments of 6.0 TiB for 12 MB/s/TiB ﬁle systems and increments of 1.8 TiB for 40 MB/s/TiB ﬁle systems.

You can increase the amount of storage capacity as needed after you create the ﬁle system. For

(18)

• For Data compression type, choose NONE to turn oﬀ data compression or choose LZ4 to turn on data compression with the LZ4 algorithm. For more information, see Lustre data compression (p. 110).

All FSx for Lustre ﬁle systems are built on Lustre version 2.12 when created using the Amazon FSx console.

5. In the Network & security section, provide the following networking and security group information:

• Choose the VPC that you want to associate with your ﬁle system. For this getting started exercise, choose the same VPC that you chose for your Amazon EC2 instance.

• For VPC security groups, the ID for the default security group for your VPC should be already added. If you're not using the default security group, make sure that the following inbound rule is added to the security group you're using for this getting started exercise.

Type Protocol Port range Source Description

All TCP TCP 0-65535 Custom

the_ID_of_this_security_groupInbound Lustre traﬃc rule

The following screen capture shows an example of editing inbound rules.

(19)

Step 1: Create your Amazon FSx for Lustre ﬁle system

• For Subnet, choose any value from the list of available subnets.

6. For the Encryption section, the options available vary depending upon which ﬁle system type you're creating:

• For a persistent ﬁle system, you can choose an AWS Key Management Service (AWS KMS) encryption key to encrypt the data on your ﬁle system at rest.

• For a scratch ﬁle system, data at rest is encrypted using the default Amazon FSx–managed key for your account.

• For scratch 2 and persistent ﬁle systems, data in transit is encrypted automatically when the ﬁle system is accessed from a supported Amazon EC2 instance type. For more information, see Encrypting data in transit (p. 130).

7. For the Data Repository Import/Export - optional section, linking your ﬁle system to Amazon S3 data repositories is disabled by default. For information about enabling this option and creating a data repository association to an existing S3 bucket, see To link an S3 bucket while creating a ﬁle system (console) (p. 26).

Important

• Selecting this option also disables backups and you won't be able to enable backups while creating the ﬁle system.

• If you link one or more Amazon FSx for Lustre ﬁle systems to an Amazon S3 bucket, don't delete the Amazon S3 bucket until all linked ﬁle systems have been deleted.

8. For Logging - optional, logging is enabled by default. When enabled, failures and warnings for data repository activity on your ﬁle system are logged to Amazon CloudWatch Logs. For information about conﬁguring logging, see Managing logging (p. 124).

9. In Backup and maintenance - optional, you can do the following.

For daily automatic backups:

• Disable the Daily automatic backup. This option is enabled by default, unless you enabled Data Repository Import/Export,.

• Set the start time for Daily automatic backup window.

• Set the Automatic backup retention period, from 1 - 35 days.

For more information, see Working with backups (p. 95).

10. Set the Weekly maintenance window start time, or keep it set to the default No preference.

11. Create any tags that you want to apply to your ﬁle system.

12. Choose Next to display the Create ﬁle system summary page.

13. Review the settings for your Amazon FSx for Lustre ﬁle system, and choose Create ﬁle system.

(20)

Now that you've created your file system, note its fully qualified domain name and mount name for a later step. You can find the fully qualified domain name and mount name for a file system by choosing the name of the file system in the File Systems dashboard, and then choosing Attach.

Step 2: Install and conﬁgure the Lustre client on your instance before mounting your ﬁle system

To mount your Amazon FSx for Lustre ﬁle system from your Amazon EC2 instance, ﬁrst install the Lustre 2.10 client. The 2.10 versions of the Lustre client support Amazon FSx for Lustre versions 2.10 and 2.12.

To download the Lustre client onto your Amazon EC2 instance 1. Open a terminal on your client.

2. Determine which kernel is currently running on your compute instance by running the following command.

uname -r

3. Do one of the following:

• If the command returns 4.14.104-95.84.amzn2.x86_64 for x86-based EC2 instances, or 4.14.181-142.260.amzn2.aarch64 or higher for Graviton2-based EC2 instances, download and install the Lustre client with the following command.

sudo amazon-linux-extras install -y lustre2.10

• If the command returns a result less than 4.14.104-95.84.amzn2.x86_64 for x86-based EC2 instances, or less than 4.14.181-142.260.amzn2.aarch64 for Graviton2-based EC2 instances, update the kernel and reboot your Amazon EC2 instance by running the following command.

sudo yum -y update kernel && sudo reboot

Conﬁrm that the kernel has been updated using the uname -r command. Then download and install the Lustre client as described above.

For information about installing the Lustre client on other Linux distributions, see Installing the Lustre client (p. 71).

To mount your ﬁle system

1. Make a directory for the mount point with the following command.

sudo mkdir -p /mnt/fsx

2. Mount the Amazon FSx for Lustre ﬁle system to the directory that you created. Use the following command and replace the following items:

• Replace file_system_dns_name with the actual ﬁle system's Domain Name System (DNS) name.

• Replace mountname with the ﬁle system's mount name, which you can get by running the describe-ﬁle-systems AWS CLI command or the DescribeFileSystems API operation.

(21)

Step 3: Run your analysis

sudo mount -t lustre -o noatime,flock file_system_dns_name@tcp:/mountname /mnt/fsx

This command mounts your ﬁle system with two options, -o noatime and flock:

• noatime – Turns oﬀ updates to inode access times. To update inode access times, use the mount command without noatime.

• flock – Enables file locking for your file system. If you don't want file locking enabled, use the mount command without flock.

3. Verify that the mount command was successful by listing the contents of the directory to which you mounted the ﬁle system /mnt/fsx, by using the following command.

ls /mnt/fsx

import-path lustre

$

You can also use the df command, following.

dfFilesystem 1K-blocks Used Available Use% Mounted on devtmpf 1001808 0 1001808 0% /dev tmpfs 1019760 0 1019760 0% /dev/shm tmpfs 1019760 392 1019368 1% /run

tmpfs 1019760 0 1019760 0% /sys/fs/cgroup /dev/xvda1 8376300 1263180 7113120 16% /

123.456.789.0@tcp:/mountname 3547698816 13824 3547678848 1% /mnt/fsx tmpfs 203956 0 203956 0% /run/user/1000

The results show the Amazon FSx ﬁle system mounted on /mnt/fsx.

Step 3: Run your analysis

Now that your ﬁle system has been created and mounted to a compute instance, you can use it to run your high-performance compute workload.

You can create a data repository association to link your ﬁle system to an Amazon S3 data repository, For more information, see Linking your ﬁle system to an S3 bucket (p. 25).

After you've linked your file system to an Amazon S3 data repository, you can export data that you've written to your file system back to your Amazon S3 bucket at any time. From a terminal on one of your compute instances, run the following command to export a file to your Amazon S3 bucket.

sudo lfs hsm_archive file_name

For more information on how to run this command on a folder or large collection of ﬁles quickly, see Exporting ﬁles using HSM commands (p. 51).

(22)

(Optional) Step 4: Check Amazon FSx ﬁle system status

You can view the status of an Amazon FSx ﬁle system by using the Amazon FSx console, the AWS CLI command describe-ﬁle-systems, or the API operation DescribeFileSystems.

File system status Description

AVAILABLE The ﬁle system is in a healthy state, and is

reachable and available for use.

CREATING Amazon FSx is creating a new ﬁle system.

DELETING Amazon FSx is deleting an existing ﬁle system.

UPDATING The ﬁle system is undergoing a customer-initiated

update.

MISCONFIGURED The ﬁle system is in a failed but recoverable state.

FAILED This status can mean either of the following:

• The ﬁle system has failed and Amazon FSx can't recover it.

• When creating a new ﬁle system, Amazon FSx couldn't create the ﬁle system.

Step 5: Clean up resources

After you have ﬁnished this exercise, you should follow these steps to clean up your resources and protect your AWS account.

To clean up resources

1. If you want to do a ﬁnal export, run the following command.

nohup find /mnt/fsx -type f -print0 | xargs -0 -n 1 sudo lfs hsm_archive &

2. On the Amazon EC2 console, terminate your instance. For more information, see Terminate Your Instance in the Amazon EC2 User Guide for Linux Instances.

3. On the Amazon FSx for Lustre console, delete your ﬁle system with the following procedure:

a. In the navigation pane, choose File systems.

b. Choose the ﬁle system that you want to delete from list of ﬁle systems on the dashboard.

c. For Actions, choose Delete ﬁle system.

d. In the dialog box that appears, choose if you want to take a final backup of the file system. Then provide the file system ID to confirm the deletion. Choose Delete file system.

4. If you created an Amazon S3 bucket for this exercise, and if you don't want to preserve the data you exported, you can now delete it. For more information, see Deleting a bucket in the Amazon Simple Storage Service User Guide.

(23)

Deployment options

Deployment options for FSx for Lustre ﬁle systems

FSx for Lustre provides a high performance, parallel file system that stores data across multiple network file servers to maximize performance and reduce bottlenecks. These servers have multiple disks. To spread load, Amazon FSx shards file system data into smaller chunks and spreads them across disks and servers using a process called striping. For more information about FSx for Lustre data striping, see Striping data in your file system (p. 66).

It's a best practice to link a highly durable long-term data repository residing on Amazon S3 with your FSx for Lustre high-performance ﬁle system.

In this scenario, you store your datasets on the linked Amazon S3 data repository. When you create your FSx for Lustre file system, you link it to your S3 data repository. At this point, the objects in your S3 bucket are listed as files and directories on your FSx file system. Amazon FSx then automatically copies the file contents from S3 to your Lustre file system when a file is accessed for the first time on the Amazon FSx file system. After your compute workload runs, or at any time, you can use a data repository task to export changes back to S3. For more information, see Using data repositories with Amazon FSx for Lustre (p. 21) and Using data repository tasks to export data and metadata changes (p. 49).

File system deployment options for FSx for Lustre

Amazon FSx for Lustre provides two ﬁle system deployment options: scratch and persistent.

NoteBoth deployment options support solid state drive (SSD) storage. However, hard disk drive (HDD) storage is supported only in one of the persistent deployment types.

You choose the file system deployment type when you create a new file system, using the AWS Management Console, the AWS Command Line Interface (AWS CLI), or the Amazon FSx for Lustre API. For more information, see Step 1: Create your Amazon FSx for Lustre file system (p. 10) and CreateFileSystem in the Amazon FSx API Reference.

Encryption of data at rest is automatically enabled when you create an Amazon FSx for Lustre ﬁle system, regardless of the deployment type you use. Scratch 2 and persistent ﬁle systems automatically encrypt data in transit when they are accessed from Amazon EC2 instances that support encryption in transit. For more information on encryption, see Data encryption in Amazon FSx for Lustre (p. 129).

Scratch ﬁle systems

Scratch file systems are designed for temporary storage and shorter-term processing of data. Data isn't replicated and doesn't persist if a file server fails. Scratch file systems provide high burst throughput of up to six times the baseline throughput of 200 MBps per TiB of storage capacity. For more information, see Aggregate file system performance (p. 62).

Scratch deployment types are built on Lustre 2.10. Use scratch ﬁle systems when you need cost- optimized storage for short-term, processing-heavy workloads.

The following diagram shows the architecture for an Amazon FSx for Lustre scratch ﬁle system.

(24)

On a scratch file system, file servers aren't replaced if they fail and data isn't replicated. If a file server or a storage disk becomes unavailable on a scratch file system, files stored on other servers are still accessible. If clients try to access data that is on the unavailable server or disk, clients experience an immediate I/O error.

The following table illustrates the availability or durability that scratch file systems of example sizes are designed for, over the course of a day and a week. Because larger file systems have more file servers and more disks, the probabilities of failure are increased.

File system size (TiB) Number of ﬁle servers Availability/durability

over one day Availability/durability over one week

1.2 2 99.9% 99.4%

2.4 2 99.9% 99.4%

4.8 3 99.8% 99.2%

9.6 5 99.8% 98.6%

50.4 22 99.1% 93.9%

Persistent ﬁle systems

Persistent file systems are designed for longer-term storage and workloads. The file servers are highly available, and data is automatically replicated within the same Availability Zone in which the file system

(25)

Persistent 1 deployment type

is located. The data volumes attached to the ﬁle servers are replicated independently from the ﬁle servers to which they are attached.

Amazon FSx continuously monitors persistent file systems for hardware failures, and automatically replaces infrastructure components in the event of a failure. On a persistent file system, if a file server becomes unavailable, it's replaced automatically within minutes of failure. During that time, client requests for data on that server transparently retry and eventually succeed after the file server is replaced. Data on persistent file systems is replicated on disks, and any failed disks are automatically replaced transparently.

Use persistent ﬁle systems for longer-term storage and for throughput-focused workloads that run for extended periods or indeﬁnitely, and that might be sensitive to disruptions in availability.

The following diagram shows the architecture for an Amazon FSx for Lustre persistent ﬁle system, with replicated, highly available ﬁle servers and data volumes within a single Availability Zone.

Persistent deployment types automatically encrypt data in transit when they are accessed from Amazon EC2 instances that support encryption in transit.

Amazon FSx for Lustre supports two persistent deployment types, Persistent_1 and Persistent_2.

Persistent 1 deployment type

Persistent_1 deployment types can be built on Lustre 2.10 or 2.12, and support SSD (solid state drive) and HDD (hard disk drive) storage types. The Persistent_1 deployment type is well-suited for use cases that require longer-term storage, and have throughput-focused workloads that aren't latency-sensitive.

(26)

For a Persistent_1 ﬁle system with SSD storage, the throughput per unit of storage is either 50, 100, or 200 MB/s per tebibyte (TiB). For HDD storage, Persistent_1 throughput per unit of storage is 12 or 40 MB/s per TiB.

You can create Persistent_1 deployment types only using the AWS CLI and the Amazon FSx API.

Persistent_1 deployment types are available in all AWS Regions.

Persistent 2 deployment type

Persistent_2 is the latest generation of Persistent deployment type, and is best-suited for use cases that require longer-term storage, and have latency-sensitive workloads that require the highest levels of IOPS and throughput. Persistent_2 deployment types are built on Lustre v2.12 and support SSD storage.

They support higher levels of throughput per unit storage as compared to Persistent_1 ﬁle systems, with options of 125, 250, 500, and 1000 MB/s/TiB.

You can create Persistent_2 deployment types using the Amazon FSx console, AWS Command Line Interface, and API. Persistent_2 deployment types are available in the following AWS Regions.

• US East (N. Virginia)

• US East (Ohio)

• US West (Oregon)

• Canada (Central)

• Europe (Ireland)

• Europe (Frankfurt)

• Asia Paciﬁc (Tokyo)

For more information on FSx for Lustre performance, see Aggregate ﬁle system performance (p. 62).

(27)

Overview of data repositories

Using data repositories with Amazon FSx for Lustre

FSx for Lustre provides high-performance file systems optimized for fast workload processing. It can support workloads such as machine learning, high performance computing (HPC), video processing, financial modeling, and electronic design automation (EDA). These workloads commonly require data to be presented using a scalable, high-speed file system interface for data access. They typically have datasets stored on long-term durable data repositories such as Amazon S3 or on-premises storage. FSx for Lustre is natively integrated with data repositories such as Amazon S3, making it easier to process datasets with the Lustre file system.

Note

File system backups are not supported on ﬁle systems that are linked to a data repository. For more information, see Working with backups (p. 95).

Topics

• Overview of data repositories (p. 21)

• Walkthrough: Attaching POSIX permissions when uploading objects into an Amazon S3 bucket (p. 23)

• Linking your ﬁle system to an S3 bucket (p. 25)

• Importing ﬁles from your data repository (p. 34)

• Data repository tasks (p. 39)

• Exporting changes to the data repository (p. 47)

• Releasing data from your ﬁle system (p. 51)

• Using Amazon FSx with your on-premises data repository (p. 51)

• Working with older deployment types (p. 52)

Overview of data repositories

When you use Amazon FSx with multiple durable storage repositories, you can ingest and process large volumes of ﬁle data in a high-performance ﬁle system by using automatic import and import data repository tasks. At the same time, you can write results to your data repositories by using automatic export or export data repository tasks. With these features, you can restart your workload at any time using the latest data stored in your data repository.

Note

Automatic export and multiple data repositories are supported only on Persistent 2 ﬁle systems. If you're using a ﬁle system with an older FSx for Lustre deployment type, see Working with older deployment types (p. 52).

Amazon FSx is deeply integrated with Amazon S3. This integration means that you can seamlessly access the objects stored in your Amazon S3 buckets from applications that mount your Amazon FSx ﬁle system. You can also run your compute-intensive workloads on Amazon EC2 instances in the AWS Cloud and export the results to your data repository after your workload is complete.

In Amazon FSx for Lustre, you can import file and directory listings from your linked data repositories to the file system using automatic import or using an import data repository task. When you turn on automatic import on a data repository association, your file system imports file metadata as files are

(28)

created, modiﬁed, and/or deleted in the S3 data repository. Alternatively, you can import metadata for new or changed ﬁles and directories using an import data repository task. Both automatic import and import data repository tasks include POSIX metadata.

NoteAutomatic import and import data repository tasks can be used simultaneously on a ﬁle system.

In order to access objects on the Amazon S3 data repository as files and directories on the file system, file and directory metadata must be loaded into the file system. You can load metadata from a linked data repository when you create a data repository association or load metadata for batches of files and directories that you want to access using the FSx for Lustre file system at a later time using an import data repository task.

You can also export ﬁles and their associated metadata in your ﬁle system to your durable data

repository using automatic export or using an export data repository task. When you turn on automatic export on a data repository association, your file system exports file data and metadata as they are created, modified, or deleted. Alternatively, you can export a file or directory using an export data repository task. When you use an export data repository task, file data and metadata that were created or modified since the last such task are exported. Both automatic export and export data repository tasks include POSIX metadata.

NoteAutomatic export and export data repository tasks can't be used simultaneously on a ﬁle system.

Amazon FSx also supports cloud bursting workloads with on-premises ﬁle systems by enabling you to copy data from on-premises clients using AWS Direct Connect or VPN.

Important

If you have linked one or more Amazon FSx ﬁle systems to a durable data repository on Amazon S3, don't delete the Amazon S3 bucket until you have deleted all linked ﬁle systems.

POSIX metadata support for data repositories

Amazon FSx for Lustre automatically transfers Portable Operating System Interface (POSIX) metadata for files, directories, and symbolic links (symlinks) when importing and exporting data to and from a linked durable data repository on Amazon S3. When you export changes in your file system to its linked data repository, Amazon FSx also exports POSIX metadata changes along with data changes. Because of this metadata export, you can implement and maintain access controls between your FSx for Lustre file system and its data repository on S3.

Amazon FSx imports only S3 objects that have POSIX-compliant object keys, such as the following.

test/mydir/

test/

Amazon FSx stores directories and symlinks as separate objects in the linked data repository on S3. For directories, Amazon FSx creates an S3 object with a key name that ends with a slash ("/"), as follows:

• The S3 object key test/mydir/ maps to the Amazon FSx directory test/mydir.

• The S3 object key test/ maps to the Amazon FSx directory test.

For symlinks, FSx for Lustre uses the following Amazon S3 schema for symlinks:

• S3 object key – The path to the link, relative to the Amazon FSx mount directory

• S3 object data – The target path of this symlink

• S3 object metadata – The metadata for the symlink

(29)

Attaching POSIX permissions to an S3 bucket

Amazon FSx stores POSIX metadata, including ownership, permissions, and timestamps for Amazon FSx ﬁles, directories, and symbolic links, in S3 objects as follows:

• Content-Type – The HTTP entity header used to indicate the media type of the resource for web browsers.

• x-amz-meta-file-permissions – The ﬁle type and permissions in the format <octal file type><octal permission mask>, consistent with st_mode in the Linux stat(2) man page.

NoteFSx for Lustre does not import or retain setuid information.

• x-amz-meta-file-owner – The owner user ID (UID) expressed as an integer.

• x-amz-meta-file-group – The group ID (GID) expressed as an integer.

• x-amz-meta-file-atime – The last-accessed time in nanoseconds. Terminate the time value with ns; otherwise Amazon FSx interprets the value as milliseconds.

• x-amz-meta-file-mtime – The last-modiﬁed time in nanoseconds. Terminate the time value with ns; otherwise, Amazon FSx interprets the value as milliseconds.

• x-amz-meta-user-agent – The user agent, ignored during Amazon FSx import. During export, Amazon FSx sets this value to aws-fsx-lustre.

The default POSIX permission that FSx for Lustre assigns to a ﬁle is 755. This permission allows read and execute access for all users and write access for the owner of the ﬁle.

NoteAmazon FSx doesn't retain any user-deﬁned custom metadata on S3 objects.

Walkthrough: Attaching POSIX permissions when uploading objects into an Amazon S3 bucket

The following procedure walks you through the process of uploading objects into Amazon S3 with POSIX permissions. Doing so allows you to import the POSIX permissions when you create an Amazon FSx ﬁle system that is linked to that S3 bucket.

To upload objects with POSIX permissions to Amazon S3

1. From your local computer or machine, use the following example commands to create a test directory (s3cptestdir) and ﬁle (s3cptest.txt) that will be uploaded to the S3 bucket.

$ mkdir s3cptestdir

$ echo "S3cp metadata import test" >> s3cptestdir/s3cptest.txt

$ ls -ld s3cptestdir/ s3cptestdir/s3cptest.txt drwxr-xr-x 3 500 500 96 Jan 8 11:29 s3cptestdir/

-rw-r--r-- 1 500 500 26 Jan 8 11:29 s3cptestdir/s3cptest.txt

The newly created ﬁle and directory have a ﬁle owner user ID (UID) and group ID (GID) of 500 and permissions as shown in the preceding example.

2. Call the Amazon S3 API to create the directory s3cptestdir with metadata permissions. You must specify the directory name with a trailing slash (/). For information about supported POSIX metadata, see POSIX metadata support for data repositories (p. 22).

Replace bucket_name with the actual name of your S3 bucket.

$ aws s3api put-object --bucket bucket_name --key s3cptestdir/ --metadata '{"user- agent":"aws-fsx-lustre" , \

(30)

"file-atime":"1595002920000000000ns" , "file-owner":"500" , "file- permissions":"0100664","file-group":"500" , \

"file-mtime":"1595002920000000000ns"}'

3. Verify the POSIX permissions are tagged to S3 object metadata.

$ aws s3api head-object --bucket bucket_name --key s3cptestdir/

{

"AcceptRanges": "bytes",

"LastModified": "Fri, 08 Jan 2021 17:32:27 GMT", "ContentLength": 0,

"ETag": "\"d41d8cd98f00b204e9800998ecf8427e\"", "VersionId": "bAlhCoWq7aIEjc3R6Myc6UOb8sHHtJkR", "ContentType": "binary/octet-stream",

"Metadata": {

"user-agent": "aws-fsx-lustre",

"file-atime": "1595002920000000000ns", "file-owner": "500",

"file-permissions": "0100664", "file-group": "500",

"file-mtime": "1595002920000000000ns"

} }

4. Upload the test ﬁle (created in step 1) from your computer to the S3 bucket with metadata permissions.

$ aws s3 cp s3cptestdir/s3cptest.txt s3://bucket_name/s3cptestdir/s3cptest.txt \ --metadata '{"user-agent":"aws-fsx-lustre" , "file-

atime":"1595002920000000000ns" , \

"file-owner":"500" , "file-permissions":"0100664","file-group":"500" , "file- mtime":"1595002920000000000ns"}'

5. Verify the POSIX permissions are tagged to S3 object metadata.

$ aws s3api head-object --bucket bucket_name --key s3cptestdir/s3cptest.txt { "AcceptRanges": "bytes",

"LastModified": "Fri, 08 Jan 2021 17:33:35 GMT", "ContentLength": 26,

"ETag": "\"eb33f7e1f44a14a8e2f9475ae3fc45d3\"", "VersionId": "w9ztRoEhB832m8NC3a_JTlTyIx7Uzql6", "ContentType": "text/plain",

"Metadata": {

"user-agent": "aws-fsx-lustre",

"file-atime": "1595002920000000000ns", "file-owner": "500",

"file-permissions": "0100664", "file-group": "500",

"file-mtime": "1595002920000000000ns"

} }

6. Verify permissions on the Amazon FSx ﬁle system linked to the S3 bucket.

$ sudo lfs df -h /fsx

UUID bytes Used Available Use% Mounted on 3rnxfbmv-MDT0000_UUID 34.4G 6.1M 34.4G 0% /fsx[MDT:0]

3rnxfbmv-OST0000_UUID 1.1T 4.5M 1.1T 0% /fsx[OST:0]

filesystem_summary: 1.1T 4.5M 1.1T 0% /fsx

$ cd /fsx/s3cptestdir/

(31)

Linking your ﬁle system to an S3 bucket

$ ls -ld s3cptestdir/

drw-rw-r-- 2 500 500 25600 Jan 8 17:33 s3cptestdir/

$ ls -ld s3cptestdir/s3cptest.txt

-rw-rw-r-- 1 500 500 26 Jan 8 17:33 s3cptestdir/s3cptest.txt

Both the s3cptestdir directory and the s3cptest.txt ﬁle have POSIX permissions imported.

Linking your ﬁle system to an S3 bucket

You can link your Amazon FSx for Lustre file system to data repositories in Amazon S3. You can create the link when creating the file system or at any time after the file system has been created.

A link between a directory on the file system and an S3 bucket or prefix is called a data repository association (DRA). You can configure a maximum of 8 data repository associations on an Amazon FSx file system. A maximum of 8 DRA requests can be queued, but only one request can be worked on at a time for the file system. Each DRA must have a unique Amazon FSx file system directory and a unique S3 bucket or prefix associated with it.

NoteData repository associations, automatic export, and multiple data repositories are only supported on Persistent 2 ﬁle systems. If you're using a ﬁle system with an older FSx for Lustre deployment type, see Working with older deployment types (p. 52).

In order to access objects on the S3 data repository as files and directories on the file system, file and directory metadata must be loaded into the file system. You can load metadata from a linked data repository when you create the DRA or load metadata for batches of files and directories that you want to access using the FSx for Lustre file system at a later time using an import data repository task.

You can configure a DRA for automatic import only, for automatic export only, or for both. A data repository association configured with both automatic import and automatic export propagates data in both directions between the file system and the linked S3 bucket. As you make changes to data in your S3 bucket, Amazon FSx detects the changes and then automatically imports the changes to your file system. As you create, modify, or delete files, Amazon FSx automatically exports the changes to Amazon S3 asynchronously once your application finishes modifying the file.

NoteYou should not modify the same file on both the S3 bucket and the Lustre file system at the same time, otherwise the behavior is undefined.

When you create a data repository association, you can conﬁgure the following properties:

• File system path – Enter a local path on the file system that points to a directory (such as /ns1/) or subdirectory (such as /ns1/subdir/) that will be mapped one-to-one with the specified data repository path below. The leading forward slash in the name is required. Two data repository associations cannot have overlapping file system paths. For example, if a data repository is associated with file system path /ns1, then you cannot link another data repository with file system path /ns1/

ns2.

NoteIf you specify only a forward slash (/) as the file system path, you can link only one data repository to the file system. You can only specify "/" as the file system path for the first data repository associated with a file system.

• Data repository path – Enter a path in the S3 data repository. The path can be an S3 bucket or prefix in the format s3://myBucket/myPrefix/. This property specifies where in the S3 data repository files will be imported from or exported to. FSx for Lustre will append a trailing "/" to your data

(32)

repository path if you don't provide one. For example, if you provide a data repository path of s3://

myBucket/myPrefix, FSx for Lustre will interpret it as s3://myBucket/myPrefix/.

Two data repository associations cannot have overlapping data repository paths. For example, if a data repository with path s3://myBucket/myPrefix/ is linked to the ﬁle system, then you cannot create another data repository association with data repository path s3://myBucket/myPrefix/

mySubPrefix.

• Import metadata from repository – You can select this option to import metadata from the entire data repository immediately after creating the data repository association. Alternatively, you can run an import data repository task to load all or a subset of the metadata from the linked data repository into the ﬁle system at any time after the data repository association is created.

• Import settings – Choose an import policy that speciﬁes the type of updated objects (any combination of new, changed, and deleted) that will be automatically imported from the linked S3 bucket to your ﬁle system. Automatic import (new, changed, deleted) is turned on by default when you add a data repository from the console, but is disabled by default when using the AWS CLI or Amazon FSx API.

• Export settings – Choose an export policy that speciﬁes the type of updated objects (any combination of new, changed, and deleted) that will be automatically exported to the S3 bucket. Automatic export (new, changed, deleted) is turned on by default when you add a data repository from the console, but is disabled by default when using the AWS CLI or Amazon FSx API.

The File system path and Data repository path settings provide a 1:1 mapping where Amazon FSx exports data from your FSx for Lustre ﬁle system back to the corresponding preﬁxes on the S3 bucket that it was imported from.

Region and account support for linked S3 buckets

When you create links to S3 buckets, keep in mind the following Region and account support limitations:

• Automatic export supports cross-Region configurations. The Amazon FSx file system and the linked S3 bucket can be located in the same AWS Region or in different AWS Regions.

• Automatic import does not support cross-Region conﬁgurations. Both the Amazon FSx ﬁle system and the linked S3 bucket must be located in the same AWS Region.

• Both automatic export and automatic import support cross-Account configurations. The Amazon FSx file system and the linked S3 bucket can belong to the same AWS account or to different AWS accounts.

Creating a link to an S3 bucket

The following procedures walk you through the process of creating a data repository association for a Persistent 2 ﬁle system to an existing S3 bucket, using the AWS Management Console and AWS Command Line Interface (AWS CLI).

NoteData repositories cannot be linked to ﬁle systems that have ﬁle system backups enabled. Disable backups before linking to a data repository.

To link an S3 bucket while creating a ﬁle system (console)

1. Open the Amazon FSx console at https://console.aws.amazon.com/fsx/.

2. Follow the procedure for creating a new ﬁle system described in Step 1: Create your Amazon FSx for Lustre ﬁle system (p. 10) in the Getting Started section.

3. Open the Data Repository Import/Export - optional section. The feature is disabled by default.

FSx for Lustre

FSx for Lustre

Lustre User Guide

FSx for Lustre: Lustre User Guide

Table of Contents

What is Amazon FSx for Lustre?

Multiple deployment options

Multiple storage options

FSx for Lustre and data repositories

FSx for Lustre S3 data repository integration

FSx for Lustre and on-premises data repositories

Accessing FSx for Lustre ﬁle systems

Integrations with AWS services

Security and compliance

Assumptions

Pricing for Amazon FSx for Lustre

Amazon FSx for Lustre forums

Are you a ﬁrst-time user of Amazon FSx for Lustre?

Setting up

Sign up for AWS

Create an IAM user

Adding permissions to use data repositories in Amazon S3

How FSx for Lustre checks for access to linked S3 buckets

Next step

Getting started with Amazon FSx for Lustre

Prerequisites

Step 1: Create your Amazon FSx for Lustre ﬁle system

Step 2: Install and conﬁgure the Lustre client on your instance before mounting your ﬁle system

Step 3: Run your analysis

(Optional) Step 4: Check Amazon FSx ﬁle system status

Step 5: Clean up resources

Deployment options for FSx for Lustre ﬁle systems

File system deployment options for FSx for Lustre

Scratch ﬁle systems

Persistent ﬁle systems

Persistent 1 deployment type

Persistent 2 deployment type

Using data repositories with Amazon FSx for Lustre

Overview of data repositories

POSIX metadata support for data repositories

Walkthrough: Attaching POSIX permissions when uploading objects into an Amazon S3 bucket

Linking your ﬁle system to an S3 bucket

Region and account support for linked S3 buckets

Creating a link to an S3 bucket

To link an S3 bucket while creating a ﬁle system (console)