FSx for Lustre
Lustre User Guide
FSx for Lustre: Lustre User Guide
Copyright © Amazon Web Services, Inc. and/or its affiliates. All rights reserved.
Amazon's trademarks and trade dress may not be used in connection with any product or service that is not Amazon's, in any manner that is likely to cause confusion among customers, or in any manner that disparages or discredits Amazon. All other trademarks not owned by Amazon are the property of their respective owners, who may or may not be affiliated with, connected to, or sponsored by Amazon.
Table of Contents
What is Amazon FSx for Lustre? ... 1
Multiple deployment options ... 1
Multiple storage options ... 1
FSx for Lustre and data repositories ... 2
FSx for Lustre S3 data repository integration ... 2
FSx for Lustre and on-premises data repositories ... 2
Accessing file systems ... 2
Integrations with AWS services ... 3
Security and compliance ... 3
Assumptions ... 3
Pricing for Amazon FSx for Lustre ... 4
Amazon FSx for Lustre forums ... 4
Are you a first-time user of Amazon FSx for Lustre? ... 4
Setting up ... 5
Sign up for AWS ... 5
Create an IAM user ... 5
Adding permissions to use data repositories in Amazon S3 ... 7
How FSx for Lustre checks access to S3 buckets ... 7
Next step ... 9
Getting started ... 10
Prerequisites ... 10
Step 1: Create your Amazon FSx for Lustre file system ... 10
Step 2: Install the Lustre client ... 14
Step 3: Run your analysis ... 15
(Optional) Step 4: Check Amazon FSx file system status ... 16
Step 5: Clean up resources ... 16
File system deployment options ... 17
Deployment options ... 17
Scratch file systems ... 17
Persistent file systems ... 18
Persistent 1 deployment type ... 19
Persistent 2 deployment type ... 20
Using data repositories ... 21
Overview of data repositories ... 21
POSIX Metadata Support ... 22
Attaching POSIX permissions to an S3 bucket ... 23
Linking your file system to an S3 bucket ... 25
Region and account support for linked S3 buckets ... 26
Creating a link to an S3 bucket ... 26
Working with server-side encrypted Amazon S3 buckets ... 32
Importing files from your data repository ... 34
Automatically import updates from your S3 bucket ... 35
Using data repository tasks to import metadata changes ... 37
Preloading files into your file system ... 38
Data repository tasks ... 39
Types of data repository tasks ... 39
Task status and details ... 39
Using data repository tasks ... 40
Working with task completion reports ... 45
Troubleshooting import and export failures ... 45
Exporting changes to the data repository ... 47
Automatically export updates to your S3 bucket ... 48
Using data repository tasks to export data and metadata changes ... 49
Exporting files using HSM commands ... 51
Releasing data from your file system ... 51
Using Amazon FSx with your on-premises data repository ... 51
Working with older deployment types ... 52
Link your file system to an Amazon S3 bucket ... 52
Automatically import updates from your S3 bucket ... 57
Performance ... 61
How FSx for Lustre file systems work ... 61
Aggregate file system performance ... 62
Example: Aggregate baseline and burst throughput ... 66
File system storage layout ... 66
Striping data in your file system ... 66
Modifying your striping configuration ... 67
Progressive file layouts ... 68
Monitoring performance and usage ... 69
Performance tips ... 69
Accessing file systems ... 71
Installing the Lustre client ... 71
Amazon Linux 2 and Amazon Linux ... 71
CentOS and Red Hat ... 72
Ubuntu ... 78
SUSE Linux ... 83
Mount from Amazon EC2 ... 85
Mounting from Amazon ECS ... 86
Mounting from an Amazon EC2 instance hosting Amazon ECS tasks ... 86
Mounting from a Docker container ... 87
Mounting from on-premises or another VPC ... 88
Mounting Amazon FSx automatically ... 89
Automount using /etc/fstab ... 89
Mounting specific filesets ... 91
Unmounting file systems ... 91
Using EC2 Spot Instances ... 92
Handling Amazon EC2 Spot Instance interruptions ... 92
Administering file systems ... 95
Backups ... 95
Backup support in FSx for Lustre ... 96
Working with automatic daily backups ... 96
Working with user-initiated backups ... 96
Using AWS Backup with Amazon FSx ... 97
Copying backups ... 98
Restoring backups ... 100
Deleting backups ... 100
Storage quotas ... 101
Quota enforcement ... 101
Types of quotas ... 101
Quota limits and grace periods ... 102
Setting and viewing quotas ... 102
Quotas and Amazon S3 linked buckets ... 104
Quotas and restoring backups ... 105
Storage and throughput capacity ... 105
Important points to know when increasing storage capacity ... 105
When to increase storage and throughput capacity ... 106
How concurrent storage scaling and backup requests are handled ... 106
How to increase storage capacity ... 106
Monitoring storage capacity increases ... 108
Data compression ... 110
Managing data compression ... 111
Compressing previously written files ... 113
Viewing file sizes ... 113
Using CloudWatch metrics ... 113
Tag your resources ... 114
Tag basics ... 114
Tagging your resources ... 115
Tag restrictions ... 115
Permissions and tag ... 115
Maintenance ... 116
Monitoring file systems ... 117
Monitoring with CloudWatch ... 117
Amazon FSx for Lustre dimensions ... 120
How to use Amazon FSx for Lustre metrics ... 120
Accessing CloudWatch metrics ... 121
Creating alarms ... 122
Logging with CloudWatch Logs ... 123
Logging overview ... 123
Log destinations ... 123
Managing logging ... 124
Viewing logs ... 125
Logging with AWS CloudTrail ... 125
Amazon FSx for Lustre information in CloudTrail ... 126
Understanding Amazon FSx for Lustre log file entries ... 126
Security ... 128
Data Protection ... 128
Data encryption ... 129
Internetwork traffic privacy ... 132
File System Access Control with Amazon VPC ... 132
Amazon VPC Security Groups ... 133
Lustre Client VPC Security Group Rules ... 135
Amazon VPC Network ACLs ... 136
IAM-based access control ... 137
Amazon FSx for Lustre resources and operations ... 137
Understanding resource ownership ... 137
Tag resources during creation ... 138
Managing access to Amazon FSx resources ... 139
Using service-linked roles ... 145
AWS managed policies ... 146
AmazonFSxDeleteServiceLinkedRoleAccess ... 147
AmazonFSxServiceRolePolicy ... 147
AmazonFSxFullAccess ... 149
AmazonFSxConsoleFullAccess ... 151
AmazonFSxConsoleReadOnlyAccess ... 152
AmazonFSxReadOnlyAccess ... 153
Policy updates ... 154
Compliance Validation ... 156
Quotas ... 158
Quotas that you can increase ... 158
Resource quotas for each file system ... 159
Additional considerations ... 160
Troubleshooting ... 161
File system mount fails right away ... 161
File system mount hangs and then fails with timeout error ... 161
Automatic mounting fails and the instance is unresponsive ... 162
File system mount fails during system boot ... 162
File system mount using DNS name fails ... 162
You can't access your file system ... 163
The Elastic IP address attached to the file system elastic network interface was deleted ... 163
The file system elastic network interface was modified or deleted ... 163
Unable to validate access to an S3 bucket when creating a data repository association ... 163
Cannot create a file system that is linked to an S3 bucket ... 164
Troubleshooting a misconfigured linked S3 bucket ... 165
Troubleshooting storage issues ... 166
Write error due to no space on storage target ... 166
Unbalanced storage on OSTs ... 166
Additional information ... 168
Setting up a custom backup schedule ... 168
Architecture overview ... 168
AWS CloudFormation template ... 169
Automated deployment ... 169
Additional options ... 170
Document history ... 171
Multiple deployment options
What is Amazon FSx for Lustre?
FSx for Lustre makes it easy and cost-effective to launch and run the popular, high-performance Lustre file system. You use Lustre for workloads where speed matters, such as machine learning, high performance computing (HPC), video processing, and financial modeling.
The open-source Lustre file system is designed for applications that require fast storage—where you want your storage to keep up with your compute. Lustre was built to solve the problem of quickly and cheaply processing the world's ever-growing datasets. It's a widely used file system designed for the fastest computers in the world. It provides sub-millisecond latencies, up to hundreds of GBps of throughput, and up to millions of IOPS. For more information on Lustre, see the Lustre website.
As a fully managed service, Amazon FSx makes it easier for you to use Lustre for workloads where storage speed matters. FSx for Lustre eliminates the traditional complexity of setting up and managing Lustre file systems, enabling you to spin up and run a battle-tested high-performance file system in minutes. It also provides multiple deployment options so you can optimize cost for your needs.
FSx for Lustre is POSIX-compliant, so you can use your current Linux-based applications without having to make any changes. FSx for Lustre provides a native file system interface and works as any file system does with your Linux operating system. It also provides read-after-write consistency and supports file locking.
Topics
• Multiple deployment options (p. 1)
• Multiple storage options (p. 1)
• FSx for Lustre and data repositories (p. 2)
• Accessing FSx for Lustre file systems (p. 2)
• Integrations with AWS services (p. 3)
• Security and compliance (p. 3)
• Assumptions (p. 3)
• Pricing for Amazon FSx for Lustre (p. 4)
• Amazon FSx for Lustre forums (p. 4)
• Are you a first-time user of Amazon FSx for Lustre? (p. 4)
Multiple deployment options
Amazon FSx for Lustre offers a choice of scratch and persistent file systems to accommodate different data processing needs. Scratch file systems are ideal for temporary storage and shorter-term processing of data. Data is not replicated and does not persist if a file server fails. Persistent file systems are ideal for longer-term storage and throughput-focused workloads. In persistent file systems, data is replicated, and file servers are replaced if they fail. For more information, see Deployment options for FSx for Lustre file systems (p. 17).
Multiple storage options
Amazon FSx for Lustre offers a choice of solid state drive (SSD) and hard disk drive (HDD) storage types that are optimized for different data processing requirements:
• SSD storage options – For low-latency, IOPS-intensive workloads that typically have small, random file operations, choose one of the SSD storage options.
• HDD storage options – For throughput-intensive workloads that typically have large, sequential file operations, choose one of the HDD storage options.
If you are provisioning a file system with the HDD storage option, you can optionally provision a read- only SSD cache that is sized to 20 percent of your HDD storage capacity. This provides sub-millisecond latencies and higher IOPS for frequently accessed files. Both SSD-based and HDD-based file systems are provisioned with SSD-based metadata servers. As a result, all metadata operations, which represent the majority of file system operations, are delivered with sub-millisecond latencies.
For more information about performance of these storage options, see Amazon FSx for Lustre performance (p. 61).
FSx for Lustre and data repositories
You can link FSx for Lustre file systems to data repositories on Amazon S3 or to on-premises data stores.
FSx for Lustre S3 data repository integration
FSx for Lustre integrates with Amazon S3, making it easier for you to process cloud datasets using the Lustre high-performance file system. When linked to an Amazon S3 bucket, an FSx for Lustre file system transparently presents S3 objects as files. Amazon FSx imports listings of all existing files in your S3 bucket at file system creation. Amazon FSx can also import listings of files added to the data repository after the file system is created. You can set the import preferences to match your workflow needs. The file system also makes it possible for you to write file system data back to S3. Data repository tasks simplify the transfer of data and metadata between your FSx for Lustre file system and its durable data repository on Amazon S3. For more information, see Using data repositories with Amazon FSx for Lustre (p. 21) and Data repository tasks (p. 39).
FSx for Lustre and on-premises data repositories
With Amazon FSx for Lustre, you can burst your data processing workloads from on-premises into the AWS Cloud by importing data using AWS Direct Connect or AWS VPN. For more information, see Using Amazon FSx with your on-premises data repository (p. 51).
Accessing FSx for Lustre file systems
You can mix and match the compute instance types and Linux Amazon Machine Images (AMIs) that are connected to a single FSx for Lustre file system.
Amazon FSx for Lustre file systems are accessible from compute workloads running on Amazon Elastic Compute Cloud (Amazon EC2) instances, on Amazon Elastic Container Service (Amazon ECS) Docker containers, and containers running on Amazon Elastic Kubernetes Service (Amazon EKS).
• Amazon EC2 – You access your file system from your Amazon EC2 compute instances using the open- source Lustre client. Amazon EC2 instances can access your file system from other Availability Zones within the same Amazon Virtual Private Cloud (Amazon VPC), provided your networking configuration provides for access across subnets within the VPC. After your Amazon FSx for Lustre file system is mounted, you can work with its files and directories just as you do using a local file system.
• Amazon EKS – You access Amazon FSx for Lustre from containers running on Amazon EKS using the open-source FSx for Lustre CSI driver, as described in Amazon EKS User Guide. Your containers running on Amazon EKS can use high-performance persistent volumes (PVs) backed by Amazon FSx for Lustre.
Integrations with AWS services
• Amazon ECS – You access Amazon FSx for Lustre from Amazon ECS Docker containers on Amazon EC2 instances. For more information, see Mounting from Amazon Elastic Container Service (p. 86).
Amazon FSx for Lustre is compatible with the most popular Linux-based AMIs, including Amazon Linux 2 and Amazon Linux, Red Hat Enterprise Linux (RHEL), CentOS, Ubuntu, and SUSE Linux. The Lustre client is included with Amazon Linux 2 and Amazon Linux. For RHEL, CentOS, and Ubuntu, an AWS Lustre client repository provides clients that are compatible with these operating systems.
Using FSx for Lustre, you can burst your compute-intensive workloads from on-premises into the AWS Cloud by importing data over AWS Direct Connect or AWS Virtual Private Network. You can access your Amazon FSx file system from on-premises, copy data into your file system as-needed, and run compute- intensive workloads on in-cloud instances.
For more information on the clients, compute instances, and environments from which you can access FSx for Lustre file systems, see Accessing file systems (p. 71).
Integrations with AWS services
Amazon FSx for Lustre integrates with SageMaker as an input data source. When using SageMaker with FSx for Lustre, your machine learning training jobs are accelerated by eliminating the initial download step from Amazon S3. Additionally, your total cost of ownership (TCO) is reduced by avoiding the
repetitive download of common objects for iterative jobs on the same dataset as you save on S3 requests costs. For more information, see What Is SageMaker? in the Amazon SageMaker Developer Guide.
FSx for Lustre integrates with AWS Batch using EC2 Launch Templates. AWS Batch enables you to run batch computing workloads on the AWS Cloud, including high performance computing (HPC), machine learning (ML), and other asynchronous workloads. AWS Batch automatically and dynamically sizes instances based on job resource requirements. For more information, see What Is AWS Batch? in the AWS Batch User Guide.
FSx for Lustre integrates with AWS ParallelCluster. AWS ParallelCluster is an AWS-supported open-source cluster management tool used to deploy and manage HPC clusters. It can automatically create FSx for Lustre file systems or use existing file systems during the cluster creation process.
Security and compliance
FSx for Lustre file systems support encryption at rest and in transit. Amazon FSx automatically encrypts file system data at rest using keys managed in AWS Key Management Service (AWS KMS). Data in transit is also automatically encrypted on file systems in certain AWS Regions when accessed from supported Amazon EC2 instances. For more information about data encryption in FSx for Lustre, including AWS Regions where encryption of data in transit is supported, see Data encryption in Amazon FSx for Lustre (p. 129). Amazon FSx has been assessed to comply with ISO, PCI-DSS, and SOC certifications, and is HIPAA eligible. For more information, see Security in FSx for Lustre (p. 128).
Assumptions
In this guide, we make the following assumptions:
• If you use Amazon Elastic Compute Cloud (Amazon EC2), we assume that you're familiar with that service. For more information on how to use Amazon EC2, see the Amazon EC2 documentation.
• We assume that you are familiar with using Amazon Virtual Private Cloud (Amazon VPC). For more information on how to use Amazon VPC, see the Amazon VPC User Guide.
• We assume that you haven't changed the rules on the default security group for your VPC based on the Amazon VPC service. If you have, make sure that you add the necessary rules to allow network traffic from your Amazon EC2 instance to your Amazon FSx for Lustre file system. For more details, see File System Access Control with Amazon VPC (p. 132).
Pricing for Amazon FSx for Lustre
With Amazon FSx for Lustre, there are no upfront hardware or software costs. You pay for only the resources used, with no minimum commitments, setup costs, or additional fees. For information about the pricing and fees associated with the service, see Amazon FSx for Lustre Pricing.
Amazon FSx for Lustre forums
If you encounter issues while using Amazon FSx for Lustre, check the forums.
Are you a first-time user of Amazon FSx for Lustre?
If you are a first-time user of Amazon FSx for Lustre, we recommend that you read the following sections in order:
1. If you're ready to create your first Amazon FSx for Lustre file system, try Getting started with Amazon FSx for Lustre (p. 10).
2. For information on performance, see Amazon FSx for Lustre performance (p. 61).
3. For information on linking your file system to an Amazon S3 bucket data repository, see Using data repositories with Amazon FSx for Lustre (p. 21).
4. For Amazon FSx for Lustre security details, see Security in FSx for Lustre (p. 128).
5. For information on the scalability limits of Amazon FSx for Lustre, including throughput and file system size, see Quotas (p. 158).
6. For information on the Amazon FSx for Lustre API, see the Amazon FSx for Lustre API Reference.
Sign up for AWS
Setting up
Before you use Amazon FSx for Lustre for the first time, complete the following tasks:
1.Sign up for AWS (p. 5) 2.Create an IAM user (p. 5)
Sign up for AWS
When you sign up for Amazon Web Services, your AWS account is automatically signed up for all services in AWS, including Amazon FSx for Lustre.
If you have an AWS account already, skip to the next task. If you don't have an AWS account, use the following procedure to create one.
To create an AWS account
1. Open https://portal.aws.amazon.com/billing/signup.
2. Follow the online instructions.
Part of the sign-up procedure involves receiving a phone call and entering a verification code on the phone keypad.
Note your AWS account number, because you need it for the next task.
Create an IAM user
Services in AWS, such as Amazon FSx for Lustre, require that you provide credentials when you access them, so that the service can determine whether you have permissions to access its resources. AWS recommends that you don't use the root credentials of your AWS account to make requests. Instead, create an AWS Identity and Access Management (IAM) user and grant that user full access. We call these users administrator users.
You can use the administrator user credentials, instead of root credentials of your account, to interact with AWS and perform tasks, such as create users and grant them permissions. For more information, see Root Account Credentials vs. IAM User Credentials in the AWS General Reference and IAM Best in the IAM User Guide.
If you signed up for AWS but have not created an IAM user for yourself, you can create one using the IAM Management Console.
To create an administrator user for yourself and add the user to an administrators group (console)
1. Sign in to the IAM console as the account owner by choosing Root user and entering your AWS account email address. On the next page, enter your password.
NoteWe strongly recommend that you adhere to the best practice of using the Administrator IAM user that follows and securely lock away the root user credentials. Sign in as the root user only to perform a few account and service management tasks.
2. In the navigation pane, choose Users and then choose Add user.
3. For User name, enter Administrator.
4. Select the check box next to AWS Management Console access. Then select Custom password, and then enter your new password in the text box.
5. (Optional) By default, AWS requires the new user to create a new password when first signing in. You can clear the check box next to User must create a new password at next sign-in to allow the new user to reset their password after they sign in.
6. Choose Next: Permissions.
7. Under Set permissions, choose Add user to group.
8. Choose Create group.
9. In the Create group dialog box, for Group name enter Administrators.
10. Choose Filter policies, and then select AWS managed - job function to filter the table contents.
11. In the policy list, select the check box for AdministratorAccess. Then choose Create group.
Note
You must activate IAM user and role access to Billing before you can use the
AdministratorAccess permissions to access the AWS Billing and Cost Management console. To do this, follow the instructions in step 1 of the tutorial about delegating access to the billing console.
12. Back in the list of groups, select the check box for your new group. Choose Refresh if necessary to see the group in the list.
13. Choose Next: Tags.
14. (Optional) Add metadata to the user by attaching tags as key-value pairs. For more information about using tags in IAM, see Tagging IAM entities in the IAM User Guide.
15. Choose Next: Review to see the list of group memberships to be added to the new user. When you are ready to proceed, choose Create user.
You can use this same process to create more groups and users and to give your users access to your AWS account resources. To learn about using policies that restrict user permissions to specific AWS resources, see Access management and Example policies.
To sign in as this new IAM user, first sign out of the AWS Management Console. Then use the following URL, where your_aws_account_id is your AWS account number without the hyphens (for example, if your AWS account number is 1234-5678-9012, your AWS account ID is 123456789012).
https://your_aws_account_id.signin.aws.amazon.com/console/
Enter the IAM user name and password that you just created. When you're signed in, the navigation bar displays your_user_name@your_aws_account_id.
If you don't want the URL for your sign-in page to contain your AWS account ID, you can create an account alias. To do so, from the IAM dashboard, choose Create Account Alias and enter an alias, such as your company name. To sign in after you create an account alias, use the following URL.
https://your_account_alias.signin.aws.amazon.com/console/
To verify the sign-in link for IAM users for your account, open the IAM console and check under AWS Account Alias on the dashboard.
Adding permissions to use data repositories in Amazon S3
Adding permissions to use data repositories in Amazon S3
Amazon FSx for Lustre is deeply integrated with Amazon S3. This integration means that applications that access your FSx for Lustre file system can also seamlessly access the objects stored in your linked Amazon S3 bucket. For more information, see Using data repositories with Amazon FSx for Lustre (p. 21).
To use data repositories, you must first allow Amazon FSx for Lustre certain IAM permissions in a role associated with the account for your administrator user.
To embed an inline policy for a role using the console
1. Sign in to the AWS Management Console and open the IAM console at https://
console.aws.amazon.com//iam/.
2. In the navigation pane, choose Roles.
3. In the list, choose the name of the role to embed a policy in.
4. Choose the Permissions tab.
5. Scroll to the bottom of the page and choose Add inline policy.
NoteYou can't embed an inline policy in a service-linked role in IAM. Because the linked service defines whether you can modify the permissions of the role, you might be able to add additional policies from the service console, API, or AWS CLI. To view the service-linked role documentation for a service, see AWS Services That Work with IAM and choose Yes in the Service-Linked Role column for your service.
6. Choose Creating Policies with the Visual Editor 7. Add the following permissions policy statement.
{
"Version": "2012-10-17", "Statement": {
"Effect": "Allow", "Action": [
"iam:CreateServiceLinkedRole", "iam:AttachRolePolicy", "iam:PutRolePolicy"
],
"Resource": "arn:aws:iam::*:role/aws-service-role/s3.data- source.lustre.fsx.amazonaws.com/*"
} }
After you create an inline policy, it is automatically embedded in your role. For more information about service-linked roles, see Using service-linked roles for Amazon FSx for Lustre (p. 145).
How FSx for Lustre checks for access to linked S3 buckets
If the IAM role that you use to create the FSx for Lustre file system does not have the iam:AttachRolePolicy and iam:PutRolePolicy permissions, then Amazon FSx checks whether it can update your S3 bucket policy. Amazon FSx can update your bucket policy if the
s3:PutBucketPolicy permission is included in your IAM role to allow the Amazon FSx file system to import or export data to your S3 bucket. If allowed to modify the bucket policy, Amazon FSx adds the following permissions to the bucket policy:
• s3:AbortMultipartUpload
• s3:DeleteObject
• s3:PutObject
• s3:Get*
• s3:List*
• s3:PutBucketNotification
• s3:PutBucketPolicy
• s3:DeleteBucketPolicy
If Amazon FSx can't modify the bucket policy, it then checks if the existing bucket policy grants Amazon FSx access to the bucket.
If all of these options fail, then the request to create the file system fails. The following diagram
illustrates the checks that Amazon FSx follows when determining whether a file system can access the S3 bucket to which it will be linked.
Next step
Next step
Getting started with Amazon FSx for Lustre (p. 10)
Getting started with Amazon FSx for Lustre
Following, you can learn how to get started using Amazon FSx for Lustre. These steps walk you through creating an Amazon FSx for Lustre file system and accessing it from your compute instances. Optionally, they show how to use your Amazon FSx for Lustre file system to process the data in your Amazon S3 bucket with your file-based applications.
This getting started exercise includes the following steps.
Topics
• Prerequisites (p. 10)
• Step 1: Create your Amazon FSx for Lustre file system (p. 10)
• Step 2: Install and configure the Lustre client on your instance before mounting your file system (p. 14)
• Step 3: Run your analysis (p. 15)
• (Optional) Step 4: Check Amazon FSx file system status (p. 16)
• Step 5: Clean up resources (p. 16)
Prerequisites
To perform this getting started exercise, you need the following:
• An AWS account with the permissions necessary to create an Amazon FSx for Lustre file system and an Amazon EC2 instance. For more information, see Setting up (p. 5).
• An Amazon EC2 instance running a supported Linux release in your virtual private cloud (VPC) based on the Amazon VPC service. You will install the Lustre client on this EC2 instance, and then mount your FSx for Lustre file system on the EC2 instance. The Lustre client supports Amazon Linux, Amazon Linux 2, CentOS and Red Hat Enterprise Linux 7.5, 7.6, 7.7, 7.8, 7.9, 8.2, and 8.3, SUSE Linux Enterprise Server 12 SP3, SP4, and SP5, and Ubuntu 16.04, 18.04, and 20.04. For this getting started exercise, we recommend using Amazon Linux 2.
When creating your Amazon EC2 instance for this getting started exercise, keep the following in mind:
• We recommend that you create your instance in your default VPC.
• We recommend that you use the default security group when creating your EC2 instance.
• An Amazon S3 bucket storing the data for your workload to process. The S3 bucket will be the linked durable data repository for your FSx for Lustre file system.
• Determine which type of Amazon FSx for Lustre file system you want to create, scratch or persistent.
For more information, see File system deployment options for FSx for Lustre (p. 17).
Step 1: Create your Amazon FSx for Lustre file system
Next, you create your file system in the console.
Step 1: Create your Amazon FSx for Lustre file system
To create your file system
1. Open the Amazon FSx console at https://console.aws.amazon.com/fsx/.
2. From the dashboard, choose Create file system to start the file system creation wizard.
3. Choose FSx for Lustre and then choose Next to display the Create File System page.
4. Provide the information in the File system details section:
• For File system name-optional, provide a name for your file system. You can use up to 256 Unicode letters, white space, and numbers plus the special characters + - = . _ : /.
• For Deployment and storage type, choose one of the options:
SSD storage provides low-latency, IOPS-intensive workloads that typically have small, random file operations. HDD storage provides throughput-intensive workloads that typically have large, sequential file operations.
For more information about storage types, see Multiple storage options (p. 1).
For more information about deployment types, see Deployment options for FSx for Lustre file systems (p. 17).
For more information about the AWS Regions where encrypting data in transit is available, see Encrypting data in transit (p. 130).
• Choose the Persistent, SSD deployment type for longer-term storage and for latency-sensitive workloads requiring the highest levels of IOPS/throughput. The file servers are highly available, data is automatically replicated within the file system's Availability Zone, and this type supports encrypting data in transit. Persistent, SSD uses Persistent 2, the latest-generation of persistent file systems.
• Choose the Persistent, HDD deployment type for longer-term storage and for throughput- focused workloads that aren't latency-sensitive. The file servers are highly available, data is automatically replicated within the file system's Availability Zone, and this type supports encrypting data in transit. Persistent, HDD uses the Persistent 1 deployment type.
Choose with SSD cache to create an SSD cache that is sized to 20 percent of your HDD storage capacity to provide sub-millisecond latencies and higher IOPS for frequently accessed files.
• Choose the Scratch, SSD deployment type for temporary storage and shorter-term processing of data. Scratch, SSD uses Scratch 2 file systems, and offers in-transit encryption of data.
• Choose the amount of Throughput per unit of storage that you want for your file system. This option is only valid for Persistent deployment types.
Throughput per unit of storage is the amount of read and write throughput for each 1 tebibyte (TiB) of storage provisioned, in MB/s/TiB. You pay for the amount of throughput that you provision:
• For Persistent SSD storage, choose a value of either 125, 250, 500, or 1,000 MB/s/TiB.
• For Persistent HDD storage, choose a value of 12 or 40 MB/s/TiB.
For more information about throughput per unit storage and file system performance, see Aggregate file system performance (p. 62).
• For Storage capacity, set the amount of storage capacity for your file system, in TiB:
• For a Persistent, SSD deployment type, set this to a value of 1.2 TiB, 2.4 TiB, or increments of 2.4 TiB.
• For a Persistent, HDD deployment type, this value can be increments of 6.0 TiB for 12 MB/s/TiB file systems and increments of 1.8 TiB for 40 MB/s/TiB file systems.
You can increase the amount of storage capacity as needed after you create the file system. For
• For Data compression type, choose NONE to turn off data compression or choose LZ4 to turn on data compression with the LZ4 algorithm. For more information, see Lustre data compression (p. 110).
All FSx for Lustre file systems are built on Lustre version 2.12 when created using the Amazon FSx console.
5. In the Network & security section, provide the following networking and security group information:
• Choose the VPC that you want to associate with your file system. For this getting started exercise, choose the same VPC that you chose for your Amazon EC2 instance.
• For VPC security groups, the ID for the default security group for your VPC should be already added. If you're not using the default security group, make sure that the following inbound rule is added to the security group you're using for this getting started exercise.
Type Protocol Port range Source Description
All TCP TCP 0-65535 Custom
the_ID_of_this_security_groupInbound Lustre traffic rule
The following screen capture shows an example of editing inbound rules.
Step 1: Create your Amazon FSx for Lustre file system
• For Subnet, choose any value from the list of available subnets.
6. For the Encryption section, the options available vary depending upon which file system type you're creating:
• For a persistent file system, you can choose an AWS Key Management Service (AWS KMS) encryption key to encrypt the data on your file system at rest.
• For a scratch file system, data at rest is encrypted using the default Amazon FSx–managed key for your account.
• For scratch 2 and persistent file systems, data in transit is encrypted automatically when the file system is accessed from a supported Amazon EC2 instance type. For more information, see Encrypting data in transit (p. 130).
7. For the Data Repository Import/Export - optional section, linking your file system to Amazon S3 data repositories is disabled by default. For information about enabling this option and creating a data repository association to an existing S3 bucket, see To link an S3 bucket while creating a file system (console) (p. 26).
Important
• Selecting this option also disables backups and you won't be able to enable backups while creating the file system.
• If you link one or more Amazon FSx for Lustre file systems to an Amazon S3 bucket, don't delete the Amazon S3 bucket until all linked file systems have been deleted.
8. For Logging - optional, logging is enabled by default. When enabled, failures and warnings for data repository activity on your file system are logged to Amazon CloudWatch Logs. For information about configuring logging, see Managing logging (p. 124).
9. In Backup and maintenance - optional, you can do the following.
For daily automatic backups:
• Disable the Daily automatic backup. This option is enabled by default, unless you enabled Data Repository Import/Export,.
• Set the start time for Daily automatic backup window.
• Set the Automatic backup retention period, from 1 - 35 days.
For more information, see Working with backups (p. 95).
10. Set the Weekly maintenance window start time, or keep it set to the default No preference.
11. Create any tags that you want to apply to your file system.
12. Choose Next to display the Create file system summary page.
13. Review the settings for your Amazon FSx for Lustre file system, and choose Create file system.
Now that you've created your file system, note its fully qualified domain name and mount name for a later step. You can find the fully qualified domain name and mount name for a file system by choosing the name of the file system in the File Systems dashboard, and then choosing Attach.
Step 2: Install and configure the Lustre client on your instance before mounting your file system
To mount your Amazon FSx for Lustre file system from your Amazon EC2 instance, first install the Lustre 2.10 client. The 2.10 versions of the Lustre client support Amazon FSx for Lustre versions 2.10 and 2.12.
To download the Lustre client onto your Amazon EC2 instance 1. Open a terminal on your client.
2. Determine which kernel is currently running on your compute instance by running the following command.
uname -r
3. Do one of the following:
• If the command returns 4.14.104-95.84.amzn2.x86_64 for x86-based EC2 instances, or 4.14.181-142.260.amzn2.aarch64 or higher for Graviton2-based EC2 instances, download and install the Lustre client with the following command.
sudo amazon-linux-extras install -y lustre2.10
• If the command returns a result less than 4.14.104-95.84.amzn2.x86_64 for x86-based EC2 instances, or less than 4.14.181-142.260.amzn2.aarch64 for Graviton2-based EC2 instances, update the kernel and reboot your Amazon EC2 instance by running the following command.
sudo yum -y update kernel && sudo reboot
Confirm that the kernel has been updated using the uname -r command. Then download and install the Lustre client as described above.
For information about installing the Lustre client on other Linux distributions, see Installing the Lustre client (p. 71).
To mount your file system
1. Make a directory for the mount point with the following command.
sudo mkdir -p /mnt/fsx
2. Mount the Amazon FSx for Lustre file system to the directory that you created. Use the following command and replace the following items:
• Replace file_system_dns_name with the actual file system's Domain Name System (DNS) name.
• Replace mountname with the file system's mount name, which you can get by running the describe-file-systems AWS CLI command or the DescribeFileSystems API operation.
Step 3: Run your analysis
sudo mount -t lustre -o noatime,flock file_system_dns_name@tcp:/mountname /mnt/fsx
This command mounts your file system with two options, -o noatime and flock:
• noatime – Turns off updates to inode access times. To update inode access times, use the mount command without noatime.
• flock – Enables file locking for your file system. If you don't want file locking enabled, use the mount command without flock.
3. Verify that the mount command was successful by listing the contents of the directory to which you mounted the file system /mnt/fsx, by using the following command.
ls /mnt/fsx
import-path lustre
$
You can also use the df command, following.
dfFilesystem 1K-blocks Used Available Use% Mounted on devtmpf 1001808 0 1001808 0% /dev tmpfs 1019760 0 1019760 0% /dev/shm tmpfs 1019760 392 1019368 1% /run
tmpfs 1019760 0 1019760 0% /sys/fs/cgroup /dev/xvda1 8376300 1263180 7113120 16% /
123.456.789.0@tcp:/mountname 3547698816 13824 3547678848 1% /mnt/fsx tmpfs 203956 0 203956 0% /run/user/1000
The results show the Amazon FSx file system mounted on /mnt/fsx.
Step 3: Run your analysis
Now that your file system has been created and mounted to a compute instance, you can use it to run your high-performance compute workload.
You can create a data repository association to link your file system to an Amazon S3 data repository, For more information, see Linking your file system to an S3 bucket (p. 25).
After you've linked your file system to an Amazon S3 data repository, you can export data that you've written to your file system back to your Amazon S3 bucket at any time. From a terminal on one of your compute instances, run the following command to export a file to your Amazon S3 bucket.
sudo lfs hsm_archive file_name
For more information on how to run this command on a folder or large collection of files quickly, see Exporting files using HSM commands (p. 51).
(Optional) Step 4: Check Amazon FSx file system status
You can view the status of an Amazon FSx file system by using the Amazon FSx console, the AWS CLI command describe-file-systems, or the API operation DescribeFileSystems.
File system status Description
AVAILABLE The file system is in a healthy state, and is
reachable and available for use.
CREATING Amazon FSx is creating a new file system.
DELETING Amazon FSx is deleting an existing file system.
UPDATING The file system is undergoing a customer-initiated
update.
MISCONFIGURED The file system is in a failed but recoverable state.
FAILED This status can mean either of the following:
• The file system has failed and Amazon FSx can't recover it.
• When creating a new file system, Amazon FSx couldn't create the file system.
Step 5: Clean up resources
After you have finished this exercise, you should follow these steps to clean up your resources and protect your AWS account.
To clean up resources
1. If you want to do a final export, run the following command.
nohup find /mnt/fsx -type f -print0 | xargs -0 -n 1 sudo lfs hsm_archive &
2. On the Amazon EC2 console, terminate your instance. For more information, see Terminate Your Instance in the Amazon EC2 User Guide for Linux Instances.
3. On the Amazon FSx for Lustre console, delete your file system with the following procedure:
a. In the navigation pane, choose File systems.
b. Choose the file system that you want to delete from list of file systems on the dashboard.
c. For Actions, choose Delete file system.
d. In the dialog box that appears, choose if you want to take a final backup of the file system. Then provide the file system ID to confirm the deletion. Choose Delete file system.
4. If you created an Amazon S3 bucket for this exercise, and if you don't want to preserve the data you exported, you can now delete it. For more information, see Deleting a bucket in the Amazon Simple Storage Service User Guide.
Deployment options
Deployment options for FSx for Lustre file systems
FSx for Lustre provides a high performance, parallel file system that stores data across multiple network file servers to maximize performance and reduce bottlenecks. These servers have multiple disks. To spread load, Amazon FSx shards file system data into smaller chunks and spreads them across disks and servers using a process called striping. For more information about FSx for Lustre data striping, see Striping data in your file system (p. 66).
It's a best practice to link a highly durable long-term data repository residing on Amazon S3 with your FSx for Lustre high-performance file system.
In this scenario, you store your datasets on the linked Amazon S3 data repository. When you create your FSx for Lustre file system, you link it to your S3 data repository. At this point, the objects in your S3 bucket are listed as files and directories on your FSx file system. Amazon FSx then automatically copies the file contents from S3 to your Lustre file system when a file is accessed for the first time on the Amazon FSx file system. After your compute workload runs, or at any time, you can use a data repository task to export changes back to S3. For more information, see Using data repositories with Amazon FSx for Lustre (p. 21) and Using data repository tasks to export data and metadata changes (p. 49).
File system deployment options for FSx for Lustre
Amazon FSx for Lustre provides two file system deployment options: scratch and persistent.
NoteBoth deployment options support solid state drive (SSD) storage. However, hard disk drive (HDD) storage is supported only in one of the persistent deployment types.
You choose the file system deployment type when you create a new file system, using the AWS Management Console, the AWS Command Line Interface (AWS CLI), or the Amazon FSx for Lustre API. For more information, see Step 1: Create your Amazon FSx for Lustre file system (p. 10) and CreateFileSystem in the Amazon FSx API Reference.
Encryption of data at rest is automatically enabled when you create an Amazon FSx for Lustre file system, regardless of the deployment type you use. Scratch 2 and persistent file systems automatically encrypt data in transit when they are accessed from Amazon EC2 instances that support encryption in transit. For more information on encryption, see Data encryption in Amazon FSx for Lustre (p. 129).
Scratch file systems
Scratch file systems are designed for temporary storage and shorter-term processing of data. Data isn't replicated and doesn't persist if a file server fails. Scratch file systems provide high burst throughput of up to six times the baseline throughput of 200 MBps per TiB of storage capacity. For more information, see Aggregate file system performance (p. 62).
Scratch deployment types are built on Lustre 2.10. Use scratch file systems when you need cost- optimized storage for short-term, processing-heavy workloads.
The following diagram shows the architecture for an Amazon FSx for Lustre scratch file system.
On a scratch file system, file servers aren't replaced if they fail and data isn't replicated. If a file server or a storage disk becomes unavailable on a scratch file system, files stored on other servers are still accessible. If clients try to access data that is on the unavailable server or disk, clients experience an immediate I/O error.
The following table illustrates the availability or durability that scratch file systems of example sizes are designed for, over the course of a day and a week. Because larger file systems have more file servers and more disks, the probabilities of failure are increased.
File system size (TiB) Number of file servers Availability/durability
over one day Availability/durability over one week
1.2 2 99.9% 99.4%
2.4 2 99.9% 99.4%
4.8 3 99.8% 99.2%
9.6 5 99.8% 98.6%
50.4 22 99.1% 93.9%
Persistent file systems
Persistent file systems are designed for longer-term storage and workloads. The file servers are highly available, and data is automatically replicated within the same Availability Zone in which the file system
Persistent 1 deployment type
is located. The data volumes attached to the file servers are replicated independently from the file servers to which they are attached.
Amazon FSx continuously monitors persistent file systems for hardware failures, and automatically replaces infrastructure components in the event of a failure. On a persistent file system, if a file server becomes unavailable, it's replaced automatically within minutes of failure. During that time, client requests for data on that server transparently retry and eventually succeed after the file server is replaced. Data on persistent file systems is replicated on disks, and any failed disks are automatically replaced transparently.
Use persistent file systems for longer-term storage and for throughput-focused workloads that run for extended periods or indefinitely, and that might be sensitive to disruptions in availability.
The following diagram shows the architecture for an Amazon FSx for Lustre persistent file system, with replicated, highly available file servers and data volumes within a single Availability Zone.
Persistent deployment types automatically encrypt data in transit when they are accessed from Amazon EC2 instances that support encryption in transit.
Amazon FSx for Lustre supports two persistent deployment types, Persistent_1 and Persistent_2.
Persistent 1 deployment type
Persistent_1 deployment types can be built on Lustre 2.10 or 2.12, and support SSD (solid state drive) and HDD (hard disk drive) storage types. The Persistent_1 deployment type is well-suited for use cases that require longer-term storage, and have throughput-focused workloads that aren't latency-sensitive.
For a Persistent_1 file system with SSD storage, the throughput per unit of storage is either 50, 100, or 200 MB/s per tebibyte (TiB). For HDD storage, Persistent_1 throughput per unit of storage is 12 or 40 MB/s per TiB.
You can create Persistent_1 deployment types only using the AWS CLI and the Amazon FSx API.
Persistent_1 deployment types are available in all AWS Regions.
Persistent 2 deployment type
Persistent_2 is the latest generation of Persistent deployment type, and is best-suited for use cases that require longer-term storage, and have latency-sensitive workloads that require the highest levels of IOPS and throughput. Persistent_2 deployment types are built on Lustre v2.12 and support SSD storage.
They support higher levels of throughput per unit storage as compared to Persistent_1 file systems, with options of 125, 250, 500, and 1000 MB/s/TiB.
You can create Persistent_2 deployment types using the Amazon FSx console, AWS Command Line Interface, and API. Persistent_2 deployment types are available in the following AWS Regions.
• US East (N. Virginia)
• US East (Ohio)
• US West (Oregon)
• Canada (Central)
• Europe (Ireland)
• Europe (Frankfurt)
• Asia Pacific (Tokyo)
For more information on FSx for Lustre performance, see Aggregate file system performance (p. 62).
Overview of data repositories
Using data repositories with Amazon FSx for Lustre
FSx for Lustre provides high-performance file systems optimized for fast workload processing. It can support workloads such as machine learning, high performance computing (HPC), video processing, financial modeling, and electronic design automation (EDA). These workloads commonly require data to be presented using a scalable, high-speed file system interface for data access. They typically have datasets stored on long-term durable data repositories such as Amazon S3 or on-premises storage. FSx for Lustre is natively integrated with data repositories such as Amazon S3, making it easier to process datasets with the Lustre file system.
Note
File system backups are not supported on file systems that are linked to a data repository. For more information, see Working with backups (p. 95).
Topics
• Overview of data repositories (p. 21)
• Walkthrough: Attaching POSIX permissions when uploading objects into an Amazon S3 bucket (p. 23)
• Linking your file system to an S3 bucket (p. 25)
• Importing files from your data repository (p. 34)
• Data repository tasks (p. 39)
• Exporting changes to the data repository (p. 47)
• Releasing data from your file system (p. 51)
• Using Amazon FSx with your on-premises data repository (p. 51)
• Working with older deployment types (p. 52)
Overview of data repositories
When you use Amazon FSx with multiple durable storage repositories, you can ingest and process large volumes of file data in a high-performance file system by using automatic import and import data repository tasks. At the same time, you can write results to your data repositories by using automatic export or export data repository tasks. With these features, you can restart your workload at any time using the latest data stored in your data repository.
Note
Automatic export and multiple data repositories are supported only on Persistent 2 file systems. If you're using a file system with an older FSx for Lustre deployment type, see Working with older deployment types (p. 52).
Amazon FSx is deeply integrated with Amazon S3. This integration means that you can seamlessly access the objects stored in your Amazon S3 buckets from applications that mount your Amazon FSx file system. You can also run your compute-intensive workloads on Amazon EC2 instances in the AWS Cloud and export the results to your data repository after your workload is complete.
In Amazon FSx for Lustre, you can import file and directory listings from your linked data repositories to the file system using automatic import or using an import data repository task. When you turn on automatic import on a data repository association, your file system imports file metadata as files are
created, modified, and/or deleted in the S3 data repository. Alternatively, you can import metadata for new or changed files and directories using an import data repository task. Both automatic import and import data repository tasks include POSIX metadata.
NoteAutomatic import and import data repository tasks can be used simultaneously on a file system.
In order to access objects on the Amazon S3 data repository as files and directories on the file system, file and directory metadata must be loaded into the file system. You can load metadata from a linked data repository when you create a data repository association or load metadata for batches of files and directories that you want to access using the FSx for Lustre file system at a later time using an import data repository task.
You can also export files and their associated metadata in your file system to your durable data
repository using automatic export or using an export data repository task. When you turn on automatic export on a data repository association, your file system exports file data and metadata as they are created, modified, or deleted. Alternatively, you can export a file or directory using an export data repository task. When you use an export data repository task, file data and metadata that were created or modified since the last such task are exported. Both automatic export and export data repository tasks include POSIX metadata.
NoteAutomatic export and export data repository tasks can't be used simultaneously on a file system.
Amazon FSx also supports cloud bursting workloads with on-premises file systems by enabling you to copy data from on-premises clients using AWS Direct Connect or VPN.
Important
If you have linked one or more Amazon FSx file systems to a durable data repository on Amazon S3, don't delete the Amazon S3 bucket until you have deleted all linked file systems.
POSIX metadata support for data repositories
Amazon FSx for Lustre automatically transfers Portable Operating System Interface (POSIX) metadata for files, directories, and symbolic links (symlinks) when importing and exporting data to and from a linked durable data repository on Amazon S3. When you export changes in your file system to its linked data repository, Amazon FSx also exports POSIX metadata changes along with data changes. Because of this metadata export, you can implement and maintain access controls between your FSx for Lustre file system and its data repository on S3.
Amazon FSx imports only S3 objects that have POSIX-compliant object keys, such as the following.
test/mydir/
test/
Amazon FSx stores directories and symlinks as separate objects in the linked data repository on S3. For directories, Amazon FSx creates an S3 object with a key name that ends with a slash ("/"), as follows:
• The S3 object key test/mydir/ maps to the Amazon FSx directory test/mydir.
• The S3 object key test/ maps to the Amazon FSx directory test.
For symlinks, FSx for Lustre uses the following Amazon S3 schema for symlinks:
• S3 object key – The path to the link, relative to the Amazon FSx mount directory
• S3 object data – The target path of this symlink
• S3 object metadata – The metadata for the symlink
Attaching POSIX permissions to an S3 bucket
Amazon FSx stores POSIX metadata, including ownership, permissions, and timestamps for Amazon FSx files, directories, and symbolic links, in S3 objects as follows:
• Content-Type – The HTTP entity header used to indicate the media type of the resource for web browsers.
• x-amz-meta-file-permissions – The file type and permissions in the format <octal file type><octal permission mask>, consistent with st_mode in the Linux stat(2) man page.
NoteFSx for Lustre does not import or retain setuid information.
• x-amz-meta-file-owner – The owner user ID (UID) expressed as an integer.
• x-amz-meta-file-group – The group ID (GID) expressed as an integer.
• x-amz-meta-file-atime – The last-accessed time in nanoseconds. Terminate the time value with ns; otherwise Amazon FSx interprets the value as milliseconds.
• x-amz-meta-file-mtime – The last-modified time in nanoseconds. Terminate the time value with ns; otherwise, Amazon FSx interprets the value as milliseconds.
• x-amz-meta-user-agent – The user agent, ignored during Amazon FSx import. During export, Amazon FSx sets this value to aws-fsx-lustre.
The default POSIX permission that FSx for Lustre assigns to a file is 755. This permission allows read and execute access for all users and write access for the owner of the file.
NoteAmazon FSx doesn't retain any user-defined custom metadata on S3 objects.
Walkthrough: Attaching POSIX permissions when uploading objects into an Amazon S3 bucket
The following procedure walks you through the process of uploading objects into Amazon S3 with POSIX permissions. Doing so allows you to import the POSIX permissions when you create an Amazon FSx file system that is linked to that S3 bucket.
To upload objects with POSIX permissions to Amazon S3
1. From your local computer or machine, use the following example commands to create a test directory (s3cptestdir) and file (s3cptest.txt) that will be uploaded to the S3 bucket.
$ mkdir s3cptestdir
$ echo "S3cp metadata import test" >> s3cptestdir/s3cptest.txt
$ ls -ld s3cptestdir/ s3cptestdir/s3cptest.txt drwxr-xr-x 3 500 500 96 Jan 8 11:29 s3cptestdir/
-rw-r--r-- 1 500 500 26 Jan 8 11:29 s3cptestdir/s3cptest.txt
The newly created file and directory have a file owner user ID (UID) and group ID (GID) of 500 and permissions as shown in the preceding example.
2. Call the Amazon S3 API to create the directory s3cptestdir with metadata permissions. You must specify the directory name with a trailing slash (/). For information about supported POSIX metadata, see POSIX metadata support for data repositories (p. 22).
Replace bucket_name with the actual name of your S3 bucket.
$ aws s3api put-object --bucket bucket_name --key s3cptestdir/ --metadata '{"user- agent":"aws-fsx-lustre" , \
"file-atime":"1595002920000000000ns" , "file-owner":"500" , "file- permissions":"0100664","file-group":"500" , \
"file-mtime":"1595002920000000000ns"}'
3. Verify the POSIX permissions are tagged to S3 object metadata.
$ aws s3api head-object --bucket bucket_name --key s3cptestdir/
{
"AcceptRanges": "bytes",
"LastModified": "Fri, 08 Jan 2021 17:32:27 GMT", "ContentLength": 0,
"ETag": "\"d41d8cd98f00b204e9800998ecf8427e\"", "VersionId": "bAlhCoWq7aIEjc3R6Myc6UOb8sHHtJkR", "ContentType": "binary/octet-stream",
"Metadata": {
"user-agent": "aws-fsx-lustre",
"file-atime": "1595002920000000000ns", "file-owner": "500",
"file-permissions": "0100664", "file-group": "500",
"file-mtime": "1595002920000000000ns"
} }
4. Upload the test file (created in step 1) from your computer to the S3 bucket with metadata permissions.
$ aws s3 cp s3cptestdir/s3cptest.txt s3://bucket_name/s3cptestdir/s3cptest.txt \ --metadata '{"user-agent":"aws-fsx-lustre" , "file-
atime":"1595002920000000000ns" , \
"file-owner":"500" , "file-permissions":"0100664","file-group":"500" , "file- mtime":"1595002920000000000ns"}'
5. Verify the POSIX permissions are tagged to S3 object metadata.
$ aws s3api head-object --bucket bucket_name --key s3cptestdir/s3cptest.txt { "AcceptRanges": "bytes",
"LastModified": "Fri, 08 Jan 2021 17:33:35 GMT", "ContentLength": 26,
"ETag": "\"eb33f7e1f44a14a8e2f9475ae3fc45d3\"", "VersionId": "w9ztRoEhB832m8NC3a_JTlTyIx7Uzql6", "ContentType": "text/plain",
"Metadata": {
"user-agent": "aws-fsx-lustre",
"file-atime": "1595002920000000000ns", "file-owner": "500",
"file-permissions": "0100664", "file-group": "500",
"file-mtime": "1595002920000000000ns"
} }
6. Verify permissions on the Amazon FSx file system linked to the S3 bucket.
$ sudo lfs df -h /fsx
UUID bytes Used Available Use% Mounted on 3rnxfbmv-MDT0000_UUID 34.4G 6.1M 34.4G 0% /fsx[MDT:0]
3rnxfbmv-OST0000_UUID 1.1T 4.5M 1.1T 0% /fsx[OST:0]
filesystem_summary: 1.1T 4.5M 1.1T 0% /fsx
$ cd /fsx/s3cptestdir/
Linking your file system to an S3 bucket
$ ls -ld s3cptestdir/
drw-rw-r-- 2 500 500 25600 Jan 8 17:33 s3cptestdir/
$ ls -ld s3cptestdir/s3cptest.txt
-rw-rw-r-- 1 500 500 26 Jan 8 17:33 s3cptestdir/s3cptest.txt
Both the s3cptestdir directory and the s3cptest.txt file have POSIX permissions imported.
Linking your file system to an S3 bucket
You can link your Amazon FSx for Lustre file system to data repositories in Amazon S3. You can create the link when creating the file system or at any time after the file system has been created.
A link between a directory on the file system and an S3 bucket or prefix is called a data repository association (DRA). You can configure a maximum of 8 data repository associations on an Amazon FSx file system. A maximum of 8 DRA requests can be queued, but only one request can be worked on at a time for the file system. Each DRA must have a unique Amazon FSx file system directory and a unique S3 bucket or prefix associated with it.
NoteData repository associations, automatic export, and multiple data repositories are only supported on Persistent 2 file systems. If you're using a file system with an older FSx for Lustre deployment type, see Working with older deployment types (p. 52).
In order to access objects on the S3 data repository as files and directories on the file system, file and directory metadata must be loaded into the file system. You can load metadata from a linked data repository when you create the DRA or load metadata for batches of files and directories that you want to access using the FSx for Lustre file system at a later time using an import data repository task.
You can configure a DRA for automatic import only, for automatic export only, or for both. A data repository association configured with both automatic import and automatic export propagates data in both directions between the file system and the linked S3 bucket. As you make changes to data in your S3 bucket, Amazon FSx detects the changes and then automatically imports the changes to your file system. As you create, modify, or delete files, Amazon FSx automatically exports the changes to Amazon S3 asynchronously once your application finishes modifying the file.
NoteYou should not modify the same file on both the S3 bucket and the Lustre file system at the same time, otherwise the behavior is undefined.
When you create a data repository association, you can configure the following properties:
• File system path – Enter a local path on the file system that points to a directory (such as /ns1/) or subdirectory (such as /ns1/subdir/) that will be mapped one-to-one with the specified data repository path below. The leading forward slash in the name is required. Two data repository associations cannot have overlapping file system paths. For example, if a data repository is associated with file system path /ns1, then you cannot link another data repository with file system path /ns1/
ns2.
NoteIf you specify only a forward slash (/) as the file system path, you can link only one data repository to the file system. You can only specify "/" as the file system path for the first data repository associated with a file system.
• Data repository path – Enter a path in the S3 data repository. The path can be an S3 bucket or prefix in the format s3://myBucket/myPrefix/. This property specifies where in the S3 data repository files will be imported from or exported to. FSx for Lustre will append a trailing "/" to your data
repository path if you don't provide one. For example, if you provide a data repository path of s3://
myBucket/myPrefix, FSx for Lustre will interpret it as s3://myBucket/myPrefix/.
Two data repository associations cannot have overlapping data repository paths. For example, if a data repository with path s3://myBucket/myPrefix/ is linked to the file system, then you cannot create another data repository association with data repository path s3://myBucket/myPrefix/
mySubPrefix.
• Import metadata from repository – You can select this option to import metadata from the entire data repository immediately after creating the data repository association. Alternatively, you can run an import data repository task to load all or a subset of the metadata from the linked data repository into the file system at any time after the data repository association is created.
• Import settings – Choose an import policy that specifies the type of updated objects (any combination of new, changed, and deleted) that will be automatically imported from the linked S3 bucket to your file system. Automatic import (new, changed, deleted) is turned on by default when you add a data repository from the console, but is disabled by default when using the AWS CLI or Amazon FSx API.
• Export settings – Choose an export policy that specifies the type of updated objects (any combination of new, changed, and deleted) that will be automatically exported to the S3 bucket. Automatic export (new, changed, deleted) is turned on by default when you add a data repository from the console, but is disabled by default when using the AWS CLI or Amazon FSx API.
The File system path and Data repository path settings provide a 1:1 mapping where Amazon FSx exports data from your FSx for Lustre file system back to the corresponding prefixes on the S3 bucket that it was imported from.
Region and account support for linked S3 buckets
When you create links to S3 buckets, keep in mind the following Region and account support limitations:
• Automatic export supports cross-Region configurations. The Amazon FSx file system and the linked S3 bucket can be located in the same AWS Region or in different AWS Regions.
• Automatic import does not support cross-Region configurations. Both the Amazon FSx file system and the linked S3 bucket must be located in the same AWS Region.
• Both automatic export and automatic import support cross-Account configurations. The Amazon FSx file system and the linked S3 bucket can belong to the same AWS account or to different AWS accounts.
Creating a link to an S3 bucket
The following procedures walk you through the process of creating a data repository association for a Persistent 2 file system to an existing S3 bucket, using the AWS Management Console and AWS Command Line Interface (AWS CLI).
NoteData repositories cannot be linked to file systems that have file system backups enabled. Disable backups before linking to a data repository.
To link an S3 bucket while creating a file system (console)
1. Open the Amazon FSx console at https://console.aws.amazon.com/fsx/.
2. Follow the procedure for creating a new file system described in Step 1: Create your Amazon FSx for Lustre file system (p. 10) in the Getting Started section.
3. Open the Data Repository Import/Export - optional section. The feature is disabled by default.