AWS ParallelCluster

(1)

AWS ParallelCluster

AWS ParallelCluster User Guide

AWS ParallelCluster: AWS ParallelCluster User Guide

(2)

Amazon's trademarks and trade dress may not be used in connection with any product or service that is not Amazon's, in any manner that is likely to cause confusion among customers, or in any manner that disparages or discredits Amazon. All other trademarks not owned by Amazon are the property of their respective owners, who may or may not be aﬃliated with, connected to, or sponsored by Amazon.

(3)

What is AWS ParallelCluster

AWS ParallelCluster is an AWS supported open source cluster management tool that helps you to deploy and manage high performance computing (HPC) clusters in the AWS Cloud. Built on the open source CfnCluster project, AWS ParallelCluster enables you to quickly build an HPC compute environment in AWS. It automatically sets up the required compute resources and shared filesystem. You can use AWS ParallelCluster with batch schedulers, such as AWS Batch and Slurm. AWS ParallelCluster facilitates quick start proof of concept deployments and production deployments. You can also build higher level workflows, such as a genomics portal that automates an entire DNA sequencing workflow, on top of AWS ParallelCluster.

(7)

Setting up AWS ParallelCluster

AWS ParallelCluster version 3

Starting with AWS ParallelCluster version 3.1.1, you can configure clusters to use an Active Directory (AD) domain. This AD domain is managed by one of the AWS Directory Service products. You can set up this configuration to share clusters among multiple users in a way that simplifies collaboration while also reducing your costs and administrative overhead. For more information, see Multiple user access to clusters (p. 5).

Topics

• Setting up AWS ParallelCluster (p. 2)

• Using AWS ParallelCluster (p. 26)

• Reference for AWS ParallelCluster (p. 81)

• Tutorials (p. 146)

• AWS ParallelCluster Troubleshooting (p. 166)

Setting up AWS ParallelCluster

Topics

• Installing AWS ParallelCluster (p. 2)

• Steps to take after installation (p. 4)

• Multiple user access to clusters (p. 5)

• Conﬁguring AWS ParallelCluster (p. 11)

• Best practices (p. 17)

• Moving from AWS ParallelCluster 2.x to 3.x (p. 18)

• Supported Regions for AWS ParallelCluster version 3 (p. 25)

Installing AWS ParallelCluster

AWS ParallelCluster is distributed as a Python package and is installed using the Python pip package manager. For instructions on how to install Python packages, see Installing packages in the Python Packaging User Guide.

Ways to install AWS ParallelCluster:

• Install AWS ParallelCluster in a virtual environment (recommended) (p. 3)

• Installing AWS ParallelCluster in a non-virtual environment using pip (p. 4)

You can ﬁnd the version number of the most recent CLI on the releases page on GitHub. In this guide, the command examples assume that you have installed a version of Python that is later than version 3.6. The pip command examples use the pip3 version.

Manage both AWS ParallelCluster 2 and AWS ParallelCluster 3

(8)

Installing AWS ParallelCluster

For customers who use both AWS ParallelCluster 2 and AWS ParallelCluster 3 and want to manage the CLIs for both packages, we recommend that you install AWS ParallelCluster 2 and AWS ParallelCluster 3 in diﬀerent virtual environments (p. 3). This ensures that you can continue using each version of AWS ParallelCluster and any associated cluster resources.

Install AWS ParallelCluster in a virtual environment (recommended)

We recommend that you install AWS ParallelCluster in a virtual environment to avoid requirement version conﬂicts with other pip packages.

Prerequisites

• AWS ParallelCluster requires Python 3.6 or later. If you don't already have it installed, download a compatible version for your platform at python.org.

To install AWS ParallelCluster in a virtual environment

1. If virtualenv isn't installed, install virtualenv using pip3. If python3 -m virtualenv help displays help information, go to step 2.

$ python3 -m pip install --upgrade pip

$ python3 -m pip install --user --upgrade virtualenv

Run exit to leave the current terminal window and open a new terminal window to pick up changes to the environment.

2. Create a virtual environment and name it.

$ python3 -m virtualenv ~/apc-ve

Alternatively, you can use the -p option to specify a speciﬁc version of Python.

$ python3 -m virtualenv -p $(which python3) ~/apc-ve 3. Activate your new virtual environment.

$ source ~/apc-ve/bin/activate

4. Install AWS ParallelCluster into your virtual environment.

(apc-ve)~$ python3 -m pip install --upgrade "aws-parallelcluster"

5. Install Node Version Manager and Node.js. It's required due to AWS Cloud Development Kit (CDK) usage for template generation.

$ curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.38.0/install.sh | bash

$ chmod ug+x ~/.nvm/nvm.sh

$ source ~/.nvm/nvm.sh

$ nvm install --lts

$ node --version

6. Verify that AWS ParallelCluster is installed correctly.

$ pcluster version {

(9)

Steps to take after installation

"version": "3.1.2"

}

You can use the deactivate command to exit the virtual environment. Each time you start a session, you must reactivate the environment (p. 3).

To upgrade to the latest version of AWS ParallelCluster, run the installation command again.

(apc-ve)~$ python3 -m pip install --upgrade "aws-parallelcluster"

Installing AWS ParallelCluster in a non-virtual environment using pip

Prerequisites

• AWS ParallelCluster requires Python 3.6 or later. If you don't already have it installed, download a compatible version for your platform at python.org.

Install AWS ParallelCluster

1. Use pip to install AWS ParallelCluster.

$ python3 -m pip install "aws-parallelcluster" --upgrade --user

When you use the --user switch, pip installs AWS ParallelCluster to ~/.local/bin.

2. Install Node Version Manager and Node.js. It's required due to AWS Cloud Development Kit (CDK) usage for template generation.

$ curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.38.0/install.sh | bash

$ chmod ug+x ~/.nvm/nvm.sh

$ source ~/.nvm/nvm.sh

$ nvm install --lts

$ node --version

3. Verify that AWS ParallelCluster installed correctly.

$ pcluster version { "version": "3.1.2"

}

4. To upgrade to the latest version, run the installation command again.

$ python3 -m pip install "aws-parallelcluster" --upgrade --user

Steps to take after installation

You can verify that AWS ParallelCluster installed correctly by running pcluster version (p. 100).

$ pcluster version {"version": "3.1.2"

(10)

Multiple user access to clusters

}

AWS ParallelCluster is updated regularly. To update to the latest version of AWS ParallelCluster, run the installation command again. For more information about the latest version of AWS ParallelCluster, see the AWS ParallelCluster release notes.

$ pip3 install aws-parallelcluster --upgrade --user

To uninstall AWS ParallelCluster, use pip3 uninstall.

$ pip3 uninstall aws-parallelcluster

If you don't have Python and pip3, use the procedure for your environment.

Multiple user access to clusters

Learn to implement and manage multiple user access to a single cluster.

In this topic, an AWS ParallelCluster user refers to a system user for compute instances. An example is an ec2-user for an AWS EC2 instance.

AWS ParallelCluster multi-user access support is available in all the AWS Regions where AWS

ParallelCluster is currently available. It works with other AWS services, including Amazon FSx for Lustre and Amazon Elastic File System.

You can use an AWS Directory Service for Microsoft Active Directory or Simple AD to manage cluster access. To set up a cluster, specify an AWS ParallelCluster DirectoryService (p. 138) conﬁguration.

AWS Directory Service directories can be connected to multiple clusters. This allows for centralized management of identities across multiple environments and a uniﬁed login experience.

When you use AWS Directory Service for AWS ParallelCluster multiple access, you can log in to the cluster with user credentials that have been deﬁned in the directory. These credentials consist of a user name and password. After you log in to the cluster for the ﬁrst time, a user SSH key is automatically generated. You can use it to log in without a password.

You can create, delete, and modify a cluster’s users or groups after your directory service has been deployed. With AWS Directory Service, you can do this in the AWS Management Console or by using the Active Directory Users and Computers tool. This tool is accessible from any EC2 instance that's joined to your Active Directory. For more information, see Installing the Active Directory administration tools.

If you plan to use AWS ParallelCluster in a single subnet with no internet access, see AWS ParallelCluster in a single subnet with no internet access (p. 53) for additional requirements.

Create an Active Directory

Make sure that you create an Active Directory (AD) before you create your cluster. For information about how to choose the type of active directory for your cluster, see Which to choose in the AWS Directory Service Administration Guide.

If the directory is empty, add users with user names and passwords. For more information, see the documentation that's speciﬁc to AWS Directory Service for Microsoft Active Directory or Simple AD.

Create a cluster with an AD domain

Configure your cluster to integrate with a directory by specifying the relevant information in the DirectoryService section of the cluster configuration file. For more information, see the DirectoryService (p. 138) configuration section.

(11)

You can use this following example to integrate your cluster with an AWS Managed Microsoft AD over LDAP.

Specific definitions that are required for an AWS Managed Microsoft AD over LDAP configuration:

• You must set the ldap_auth_disable_tls_never_use_in_production parameter to True under AdditionalSssdConfigs.

• You can specify either controllers hostnames or IP addresses for DomainAddr.

• DomainReadOnlyUser syntax must be as follows:

cn=ReadOnly,ou=Users,ou=CORP,dc=corp,dc=pcluster,dc=com

Get your AWS Managed Microsoft AD conﬁguration data:

$ aws ds describe-directories --directory-id "d-abcdef01234567890"

{ "DirectoryDescriptions": [ {

"DirectoryId": "d-abcdef01234567890", "Name": "corp.pcluster.com",

"DnsIpAddrs": [ "203.0.113.225", "192.0.2.254"

],

"VpcSettings": {

"VpcId": "vpc-021345abcdef6789", "SubnetIds": [

"subnet-1234567890abcdef0", "subnet-abcdef01234567890"

],

"AvailabilityZones": [ "region-idb", "region-idd"

] } } ] }

Cluster conﬁguration for an AWS Managed Microsoft AD:

Region: region-id Image:

Os: alinux2 HeadNode:

InstanceType: t2.micro Networking:

SubnetId: subnet-1234567890abcdef0 Ssh:

KeyName: pcluster Scheduling:

Scheduler: slurm SlurmQueues:

- Name: queue1 ComputeResources:

- Name: t2micro

InstanceType: t2.micro

(12)

MinCount: 1 MaxCount: 10 Networking:

SubnetIds:

- subnet-abcdef01234567890 DirectoryService:

DomainName: dc=corp,dc=pcluster,dc=com

DomainAddr: ldap://203.0.113.225,ldap://192.0.2.254 PasswordSecretArn: arn:aws:secretsmanager:region- id:123456789012:secret:MicrosoftAD.Admin.Password-1234

DomainReadOnlyUser: cn=ReadOnly,ou=Users,ou=CORP,dc=corp,dc=pcluster,dc=com AdditionalSssdConfigs:

ldap_auth_disable_tls_never_use_in_production: True

To use this conﬁguration for a Simple AD, change the DomainReadOnlyUser property value in the DirectoryService section:

DirectoryService:

DomainAddr: ldap://203.0.113.225,ldap://192.0.2.254 PasswordSecretArn: arn:aws:secretsmanager:region- id:123456789012:secret:SimpleAD.Admin.Password-1234

DomainReadOnlyUser: cn=ReadOnlyUser,cn=Users,dc=corp,dc=pcluster,dc=com AdditionalSssdConfigs:

ldap_auth_disable_tls_never_use_in_production: True

Considerations:

• We recommend that you use LDAP over TSL/SSL (or LDAPS) rather than LDAP alone. TSL/SSL ensures that the connection is encrypted.

• The DomainAddr property value matches the entries in the DnsIpAddrs list from the describe- directories output.

• We recommend that your cluster use subnets that are located in the same Availability Zone that the DomainAddr points to. If you use custom Dynamic Host Conﬁguration Protocol (DHCP) conﬁguration that's recommended for directory VPCs and your subnets aren't located in the DomainAddr

Availability Zone, cross traﬃc among Availability Zones is possible. The use of custom DHCP conﬁgurations isn't required to make use of the multi-user AD integration feature.

• The DomainReadOnlyUser property value speciﬁes a user that must be created in the directory.

This user isn't created by default. We recommend that you don't give this user permission to modify directory data.

• The PasswordSecretArn property value points to an AWS Secrets Manager secret that contains the password of the user that you speciﬁed for the DomainReadOnlyUser property. If this user’s password changes, update the secret value and update the conﬁguration on the cluster instances.

Before you update the instances, make sure to stop the cluster’s compute ﬂeet. Alternatively, you can run the following command on any active cluster nodes, starting ﬁrst with the head node.

sudo cinc-client \ --local-mode \

--config /etc/chef/client.rb \ --log_level auto \

--force-formatter \ --no-color \

--chef-zero-port 8889 \

--json-attributes /etc/chef/dna.json \

--override-runlist aws-parallelcluster-config::directory_service

For another example, see also Integrate Active Directory over LDAP (p. 153).

(13)

Log in to a cluster integrated with an AD domain

If you enabled the AD domain integration feature, authentication by password is enabled on the cluster head node. The home directory of an AD user is created at the ﬁrst user login into the head node or the ﬁrst time a sudo-user switches to the AD user on the head node.

Password authentication isn't enabled for cluster compute nodes. AD users must log in to compute nodes with SSH keys.

By default, SSH keys are set up in the AD user /${HOME}/.ssh directory at the ﬁrst SSH login to the head node. This behavior can be disabled by setting GenerateSshKeysForUsers boolean property to false in the cluster conﬁguration. By default, GenerateSshKeysForUsers is set to true.

If a AWS ParallelCluster application requires password-less SSH between cluster nodes, make sure that the SSH keys have been correctly set up in the user's home directory.

NoteIf the AD integration feature doesn't work as expected, the SSSD logs can provide useful diagnostic information for troubleshooting the issue. These logs are located in the /var/

log/sssd directory on cluster nodes. By default, they're also stored in a cluster’s Amazon CloudWatch log group.

For more information, see Troubleshooting multi-user integration with Active Directory (p. 175).

Running MPI jobs

As suggested in SchedMD, MPI jobs should be bootstrapped by using Slurm as the MPI bootstrapping method. For more information, refer to the oﬃcial Slurm documentation or the oﬃcial documentation for your MPI library.

For example, in the IntelMPI oﬃcial documentation, you learn that when running a StarCCM job, you must set Slurm as process orchestrator by exporting the environment variable I_MPI_HYDRA_BOOTSTRAP=slurm.

NoteKnown issue

In the case where your MPI application relies on SSH as mechanism to spawn MPI jobs, it's possible to incur in a known bug in Slurm that causes the wrong resolution of the directory user name to "nobody".

Either conﬁgure your application to use Slurm as the MPI bootstrapping method or refer to Known issues with username resolution (p. 180) in the Troubleshooting section for further details and possible workarounds.

Example AWS Managed Microsoft AD over LDAP(S) cluster conﬁgurations

AWS ParallelCluster supports multiple user access by integrating with an AWS Directory Service over Lightweight Directory Access Protocol (LDAP), or LDAP over TSL/SSL (LDAPS).

The following examples show how to create cluster conﬁgurations to integrate with an AWS Managed Microsoft AD over LDAP(S).

AWS Managed Microsoft AD over LDAPS with certiﬁcate veriﬁcation

You can use this example to integrate your cluster with an AWS Managed Microsoft AD over LDAPS, with certiﬁcate veriﬁcation.

(14)

Specific definitions for an AWS Managed Microsoft AD over LDAPS with certificates configuration:

• LdapTlsReqCert must be set to hard (default) for LDAPS with certiﬁcate veriﬁcation.

• LdapTlsCaCert must specify the path to your certiﬁcate of authority (CA) certiﬁcate.

The CA certificate is a certificate bundle that contains the certificates of the entire CA chain that issued certificates for the AD domain controllers.

Your CA certiﬁcate and certiﬁcates must be installed on the cluster nodes.

• Controllers hostnames must be speciﬁed for DomainAddr, not IP addresses.

Example cluster conﬁguration ﬁle for using AD over LDAPS:

KeyName: pcluster Iam:

AdditionalIamPolicies:

- Policy: arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess CustomActions:

OnNodeConfigured:

Script: s3://aws-parallelcluster/scripts/pcluster-dub-msad-ldaps.post.sh Scheduling:

- Name: t2micro

InstanceType: t2.micro MinCount: 1

MaxCount: 10 Networking:

SubnetIds:

- subnet-abcdef01234567890 Iam:

- Policy: arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess CustomActions:

OnNodeConfigured:

Script: s3://aws-parallelcluster-pcluster/scripts/pcluster-dub-msad-ldaps.post.sh DirectoryService:

DomainAddr: ldaps://win-abcdef01234567890.corp.pcluster.com,ldaps://win- abcdef01234567890.corp.pcluster.com

PasswordSecretArn: arn:aws:secretsmanager:region- id:123456789012:secret:MicrosoftAD.Admin.Password-1234

DomainReadOnlyUser: cn=ReadOnly,ou=Users,ou=CORP,dc=corp,dc=pcluster,dc=com LdapTlsCaCert: /etc/openldap/cacerts/corp.pcluster.com.bundleca.cer

LdapTlsReqCert: hard

(15)

Add certiﬁcates and conﬁgure domain controllers in post install script:

*#!/bin/bash*

set -e

AD_CERTIFICATE_S3_URI="s3://corp.pcluster.com/bundle/corp.pcluster.com.bundleca.cer"

AD_CERTIFICATE_LOCAL="/etc/openldap/cacerts/corp.pcluster.com.bundleca.cer"

AD_HOSTNAME_1="win-abcdef01234567890.corp.pcluster.com"

AD_IP_1="192.0.2.254"

AD_HOSTNAME_2="win-abcdef01234567890.corp.pcluster.com"

AD_IP_2="203.0.113.225"

# Download CA certificate

mkdir -p $(dirname "${AD_CERTIFICATE_LOCAL}")

aws s3 cp "${AD_CERTIFICATE_S3_URI}" "${AD_CERTIFICATE_LOCAL}"

chmod 644 "${AD_CERTIFICATE_LOCAL}"

# Configure domain controllers reachability echo "${AD_IP_1} ${AD_HOSTNAME_1}" >> /etc/hosts echo "${AD_IP_2} ${AD_HOSTNAME_2}" >> /etc/hosts

You can retrieve the domain controllers hostnames from instances joined to the domain as shown in the following examples.

From Windows instance

$ nslookup 192.0.2.254

Server: corp.pcluster.com Address: 192.0.2.254

Name: win-abcdef01234567890.corp.pcluster.com Address: 192.0.2.254

From Linux instance

$ nslookup 192.0.2.254

192.0.2.254.in-addr.arpa name = corp.pcluster.com

192.0.2.254.in-addr.arpa name = win-abcdef01234567890.corp.pcluster.com

AWS Managed Microsoft AD over LDAPS without certiﬁcate veriﬁcation

You can use this example to integrate your cluster with an AWS Managed Microsoft AD over LDAPS, without certiﬁcate veriﬁcation.

Specific definitions for an AWS Managed Microsoft AD over LDAPS without certificate verification configuration:

• LdapTlsReqCert must be set to never.

• Either controllers hostnames or IP addresses can be speciﬁed for DomainAddr.

(16)

Conﬁguring AWS ParallelCluster

Example cluster configuration file for using AWS Managed Microsoft AD over LDAPS without certificate verification:

KeyName: pcluster Scheduling:

- Name: t2micro

InstanceType: t2.micro MinCount: 1

MaxCount: 10 Networking:

SubnetIds:

- subnet-abcdef01234567890 DirectoryService:

DomainAddr: ldaps://203.0.113.225,ldaps://192.0.2.254 PasswordSecretArn: arn:aws:secretsmanager:region- id:123456789012:secret:MicrosoftAD.Admin.Password-1234

DomainReadOnlyUser: cn=ReadOnly,ou=Users,ou=CORP,dc=corp,dc=pcluster,dc=com LdapTlsReqCert: never

Conﬁguring AWS ParallelCluster

After you install AWS ParallelCluster, complete the following conﬁguration steps.

First, set up your AWS credentials. For more information, see Conﬁguring the AWS CLI in the AWS CLI user guide.

$ aws configure

AWS Access Key ID [None]: AKIAIOSFODNN7EXAMPLE

AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY Default region name [us-east-1]: us-east-1

Default output format [None]:

The AWS Region where the cluster is launched must have at least one Amazon EC2 key pair. For more information, see Amazon EC2 key pairs in the Amazon EC2 User Guide for Linux Instances.

$ pcluster configure --config cluster-config.yaml

The conﬁgure wizard prompts you for all of the information that's required to create your cluster. The details of the sequence diﬀer when using AWS Batch as the scheduler compared to using Slurm.

Slurm

From the list of valid AWS Region identiﬁers, choose the Region where you want your cluster to run.

NoteThe list of Regions shown is based on the partition of your account, and only includes Regions that are enabled for your account. For more information about enabling Regions for your account, see Managing AWS Regions in the AWS General Reference. The example

(17)

shown is from the AWS Global partition. If your account is in the AWS GovCloud (US) partition, only Regions in that partition are listed (gov-us-east-1 and gov-us- west-1). Similarly, if your account is in the AWS China partition, only cn-north-1 and cn-northwest-1 are shown. For the complete list of Regions supported by AWS ParallelCluster, see Supported Regions for AWS ParallelCluster version 3 (p. 25).

Allowed values for AWS Region ID:

1. af-south-1 2. ap-east-1 3. ap-northeast-1 4. ap-northeast-2 5. ap-south-1 6. ap-southeast-1 7. ap-southeast-2 8. ca-central-1 9. eu-central-1 10. eu-north-1 11. eu-south-1 12. eu-west-1 13. eu-west-2 14. eu-west-3 15. me-south-1 16. sa-east-1 17. us-east-1 18. us-east-2 19. us-west-1 20. us-west-2

AWS Region ID [ap-northeast-1]:

The key pair is selected from the key pairs that are registered with Amazon EC2 in the selected Region. Choose the key pair:

Allowed values for EC2 Key Pair Name:

1. your-key-1 2. your-key-2

EC2 Key Pair Name [your-key-1]:

Choose the scheduler to use with your cluster.

Allowed values for Scheduler:

1. slurm 2. awsbatch Scheduler [slurm]:

Choose the operating system.

Allowed values for Operating System:

1. alinux2 2. centos7 3. ubuntu1804 4. ubuntu2004

Operating System [alinux2]:

Choose head node instance type:

Head node instance type [t2.micro]:

Choose the queue conﬁguration. Note: Instance type can't be speciﬁed for multiple compute resources in the same queue.

(18)

Number of queues [1]:

Name of queue 1 [queue1]:

Number of compute resources for queue1 [1]: 2

Compute instance type for compute resource 1 in queue1 [t2.micro]:

Maximum instance count [10]:

Compute instance type for compute resource 2 in queue1 [t2.micro]: t3.micro Maximum instance count [10]:

After the previous steps are completed, decide whether to use an existing VPC or let AWS ParallelCluster create a VPC for you. If you don't have a properly conﬁgured VPC, AWS

ParallelCluster can create a new one. It either uses both the head and compute nodes in the same public subnet, or only the head node in a public subnet with all nodes in a private subnet. It's possible to reach your quota for the number of VPCs allowed in a Region. The default quota is ﬁve VPCs for a Region. For more information about this quota and how to request an increase, see VPC and subnets in the Amazon VPC User Guide.

If you let AWS ParallelCluster create a VPC, you must decide if all nodes should be in a public subnet.

Important

VPCs created by AWS ParallelCluster do not enable VPC Flow Logs by default. VPC Flow Logs enable you to capture information about the IP traﬃc going to and from network interfaces in your VPCs. For more information, see VPC Flow Logs in the Amazon VPC User Guide.

Automate VPC creation? (y/n) [n]: y Allowed values for Availability Zone:

1. us-east-1a 2. us-east-1b 3. us-east-1c 4. us-east-1d 5. us-east-1e 6. us-east-1f

Availability Zone [us-east-1a]:

Allowed values for Network Configuration:

1. Head node in a public subnet and compute fleet in a private subnet 2. Head node and compute fleet in the same public subnet

Network Configuration [Head node in a public subnet and compute fleet in a private subnet]: 1

Beginning VPC creation. Please do not leave the terminal until the creation is finalized

If you don't create a new VPC, you must select an existing VPC.

If you choose to have AWS ParallelCluster create the VPC, make a note of the VPC ID so you can use the AWS CLI to delete it later.

Automate VPC creation? (y/n) [n]: n Allowed values for VPC ID:

# id name number_of_subnets --- --- --- --- 1 vpc-0b4ad9c4678d3c7ad ParallelClusterVPC-20200118031893 2 2 vpc-0e87c753286f37eef ParallelClusterVPC-20191118233938 5 VPC ID [vpc-0b4ad9c4678d3c7ad]: 1

After the VPC has been selected, decide whether to use existing subnets or create new ones.

Automate Subnet creation? (y/n) [y]: y

Creating CloudFormation stack...

(19)

Do not leave the terminal until the process has finished

AWS Batch

From the list of valid AWS Region identiﬁers, choose the Region where you want your cluster to run.

NoteThe list of Regions shown is based on the partition of your account. It only includes Regions that are enabled for your account. For more information about enabling Regions for your account, see Managing AWS Regions in the AWS General Reference. The example shown is from the AWS Global partition. If your account is in the AWS GovCloud (US) partition, only Regions in that partition are listed (gov-us-east-1 and gov-us-west-1). Similarly, if your account is in the AWS China partition, only cn-north-1 and cn-northwest-1 are shown. For the complete list of Regions supported by AWS ParallelCluster, see Supported Regions for AWS ParallelCluster version 3 (p. 25).

Allowed values for AWS Region ID:

1. af-south-1 2. ap-east-1 3. ap-northeast-1 4. ap-northeast-2 5. ap-south-1 6. ap-southeast-1 7. ap-southeast-2 8. ca-central-1 9. eu-central-1 10. eu-north-1 11. eu-south-1 12. eu-west-1 13. eu-west-2 14. eu-west-3 15. me-south-1 16. sa-east-1 17. us-east-1 18. us-east-2 19. us-west-1 20. us-west-2

AWS Region ID [us-east-1]:

The key pair is selected from the key pairs registered with Amazon EC2 in the selected Region.

Choose the key pair:

Allowed values for EC2 Key Pair Name:

1. your-key-1 2. your-key-2

EC2 Key Pair Name [your-key-1]:

Choose the scheduler to use with your cluster.

Allowed values for Scheduler:

1. slurm 2. awsbatch

Scheduler [slurm]: 2

When awsbatch is selected as the scheduler, alinux2 is used as the operating system. The head node instance type is entered:

Head node instance type [t2.micro]:

(20)

Choose the queue conﬁguration. The AWS Batch scheduler only contains a single queue. The maximum size of the cluster of compute nodes is entered. This is measured in vCPUs.

Number of queues [1]:

Name of queue 1 [queue1]:

Maximum vCPU [10]:

Decide whether to use existing VPCs or let AWS ParallelCluster create VPCs for you. If you don't have a properly conﬁgured VPC, AWS ParallelCluster can create a new one. It either uses both the head and compute nodes in the same public subnet, or only the head node in a public subnet with all nodes in a private subnet. It's possible to reach your quota on the number of VPCs allowed in a Region. The default number of VPCs is ﬁve. For more information about this quota and how to request an increase, see VPC and subnets in the Amazon VPC User Guide.

Important

VPCs created by AWS ParallelCluster do not enable VPC Flow Logs by default. VPC Flow Logs enable you to capture information about the IP traﬃc going to and from network interfaces in your VPCs. For more information, see VPC Flow Logs in the Amazon VPC User Guide.

If you let AWS ParallelCluster create a VPC, make sure that you decide whether all nodes are to be in a public subnet.

Automate VPC creation? (y/n) [n]: y Allowed values for Availability Zone:

1. us-east-1a 2. us-east-1b 3. us-east-1c 4. us-east-1d 5. us-east-1e 6. us-east-1f

Availability Zone [us-east-1a]:

Allowed values for Network Configuration:

1. Head node in a public subnet and compute fleet in a private subnet 2. Head node and compute fleet in the same public subnet

Network Configuration [Head node in a public subnet and compute fleet in a private subnet]: *1*

Beginning VPC creation. Please do not leave the terminal until the creation is finalized

If you don't create a new VPC, you must select an existing VPC.

If you choose to have AWS ParallelCluster create the VPC, make a note of the VPC ID so you can use the AWS CLI or AWS Management Console to delete it later.

Automate VPC creation? (y/n) [n]: n Allowed values for VPC ID:

# id name number_of_subnets --- --- --- --- 1 vpc-0b4ad9c4678d3c7ad ParallelClusterVPC-20200118031893 2 2 vpc-0e87c753286f37eef ParallelClusterVPC-20191118233938 5 VPC ID [vpc-0b4ad9c4678d3c7ad]: 1

After the VPC has been selected, make sure that you decide whether to use existing subnets or create new ones.

Automate Subnet creation? (y/n) [y]: y

Creating CloudFormation stack...

(21)

Do not leave the terminal until the process has finished

When you have completed the preceding steps, a simple cluster launches into a VPC. The VPC uses an existing subnet that supports public IP addresses. The route table for the subnet is 0.0.0.0/0 =>

igw-xxxxxx. Note the following conditions:

• The VPC must have DNS Resolution = yes and DNS Hostnames = yes.

• The VPC must also have DHCP options with the correct domain-name for the Region. The default DHCP Option Set already speciﬁes the required AmazonProvidedDNS. If specifying more than one domain name server, see DHCP options sets in the Amazon VPC User Guide. When using private subnets, use a NAT gateway or an internal proxy to enable web access for compute nodes. For more information, see Network conﬁgurations (p. 48).

When all settings contain valid values, you can launch the cluster by running the create command.

$ pcluster create-cluster --cluster-name test-cluster --cluster-configuration cluster- config.yaml

{

"cluster": {

"clusterName": "test-cluster",

"cloudformationStackStatus": "CREATE_IN_PROGRESS",

"cloudformationStackArn": "arn:aws:cloudformation:eu-west-1:xxx:stack/test-cluster/

abcdef0-f678-890a-5abc-021345abcdef", "region": "eu-west-1",

"version": "3.1.2",

"clusterStatus": "CREATE_IN_PROGRESS"

}, "validationMessages": []

}

Follow cluster progress:

$ pcluster describe-cluster --cluster-name test-cluster

or

$ pcluster list-clusters --query 'items[?clusterName==`test-cluster`]'

After the cluster reaches the "clusterStatus": "CREATE_COMPLETE" status, you can connect to it by using your normal SSH client settings. For more information about connecting to Amazon EC2 instances, see the EC2 User Guide in the Amazon EC2 User Guide for Linux Instances. Or you can connect the cluster through

$ pcluster ssh --cluster-name test-cluster -i ~/path/to/keyfile.pem

To delete the cluster, run the following command.

$ pcluster delete-cluster --region us-east-1 --cluster-name test-cluster

After the cluster is deleted, you can delete the network resources in the VPC by deleting the CloudFormation networking stack. The stack's name starts with "parallelclusternetworking-" and contains the creation time in "YYYYMMDDHHMMSS" format. You can list the stacks using the list-stacks command.

$ aws --region us-east-1 cloudformation list-stacks \

(22)

Best practices

--stack-status-filter "CREATE_COMPLETE" \ --query "StackSummaries[].StackName" | \ grep -e "parallelclusternetworking-"

"parallelclusternetworking-pubpriv-20191029205804"

The stack can be deleted using the delete-stack command.

$ aws --region us-east-1 cloudformation delete-stack \

--stack-name parallelclusternetworking-pubpriv-20191029205804

The VPC that pcluster configure (p. 84) creates for you isn't created in the CloudFormation networking stack. You can delete that VPC manually in the console or by using the AWS CLI.

$ aws --region us-east-1 ec2 delete-vpc --vpc-id vpc-0b4ad9c4678d3c7ad

Best practices

Best practices: head node instance type selection

Although the head node doesn't run any job, its functions and its sizing are crucial to the overall performance of the cluster. When choosing the instance type to use for your head node, you want to evaluate the following items:

Cluster size: The head node orchestrates the scaling logic of the cluster and is responsible of attaching new nodes to the scheduler. If you need to scale up and down the cluster of a considerable amount of nodes, then you want to give the head node some extra compute capacity.

Shared file systems: When using shared file systems to share artifacts between compute nodes and the head node, take into account that the head node is the node exposing the NFS server. For this reason, you want to choose an instance type with enough network bandwidth and enough dedicated Amazon EBS bandwidth to handle your workflows.

Best practices: network performance

Network performance is critical to ensuring high performance computing (HPC) applications perform as expected. We recommend these three best practices to optimize your network performance.

• Placement group: a cluster placement group is a logical grouping of instances within a single Availability Zone. For more information on placement groups, see placement groups in the Amazon EC2 User Guide for Linux Instances. If you are using Slurm, you can conﬁgure each Slurm queue to use a cluster placement group by specifying a PlacementGroup in the queue's Networking (p. 117) settings.

Networking:

PlacementGroup:

Enabled: true

Id: your-placement-group-name

Or let AWS ParallelCluster create a placement group with:

Networking:

PlacementGroup:

Enabled: true

For more information, see Networking (p. 117).

(23)

Moving from AWS ParallelCluster 2.x to 3.x

• Enhanced networking: consider choosing an instance type that supports enhanced networking. This applies to all current generation instances. For more information, see enhanced networking on Linux in the Amazon EC2 User Guide for Linux Instances.

• Instance bandwidth: the bandwidth scales with instance size, please consider to choose the instance type which better suits your needs, see Amazon EBS–optimized instances and Amazon EBS volume types in the Amazon EC2 User Guide for Linux Instances.

Moving from AWS ParallelCluster 2.x to 3.x

Custom Bootstrap Actions

With AWS ParallelCluster 3, you can specify diﬀerent custom bootstrap actions scripts for the head node and compute nodes using OnNodeStart (pre_install in AWS ParallelCluster version 2) and OnNodeConfigured (post_install in AWS ParallelCluster version 2) parameters in the HeadNode (p. 103) and Scheduling/SlurmQueues (p. 112) sections. For more information, see Custom Bootstrap Actions (p. 57).

Custom bootstrap actions scripts that are developed for AWS ParallelCluster 2 must be adapted to be used in AWS ParallelCluster 3:

• We don't recommend using /etc/parallelcluster/cfnconfig and cfn_node_type to

diﬀerentiate between head and compute nodes. Instead, we recommend that you specify two diﬀerent scripts in the HeadNode and Scheduling/SlurmQueues sections.

• If you prefer to continue loading /etc/parallelcluster/cfnconfig for use in your bootstrap actions script, note the value of cfn_node_type is changed from "MasterServer" to "HeadNode" (see:

Inclusive language (p. 22)).

• On AWS ParallelCluster 2, the first input argument to bootstrap action scripts was the S3 URL to the script and was reserved. In AWS ParallelCluster 3, only the arguments configured in the configuration are passed to the scripts.

Warning

Using internal variables provided through the /etc/parallelcluster/cfnconfig file isn't officially supported. This file might be removed as part of a future release.

AWS ParallelCluster 2.x and 3.x use different configuration file syntax

AWS ParallelCluster 3.x configuration uses YAML syntax, it's the full reference can be found at Configuration files (p. 101).

In addition to requiring a YAML file format, a number of configuration sections, settings, and parameter values have been updated in AWS ParallelCluster 3.x. In this section, we note key changes to the AWS ParallelCluster configuration along with side-by-side examples illustrating these differences across each version of AWS ParallelCluster.

Example of multiple scheduler queues conﬁguration with hyperthreading enabled and disabled AWS ParallelCluster 2:

[cluster default]

queue_settings = ht-enabled, ht-disabled ...

[queue ht-enabled]

(24)

compute_resource_settings = ht-enabled-i1 disable_hyperthreading = false

[queue ht-disabled]

compute_resource_settings = ht-disabled-i1 disable_hyperthreading = true

[compute_resource ht-enabled-i1]

instance_type = c5n.18xlarge [compute_resource ht-disabled-i1]

instance_type = c5.xlarge

AWS ParallelCluster 3:

...

Scheduling:

- Name: ht-enabled Networking:

SubnetIds:

- compute_subnet_id ComputeResources:

- Name: ht-enabled-i1

DisableSimultaneousMultithreading: true InstanceType: c5n.18xlarge

- Name: ht-disabled Networking:

SubnetIds:

- compute_subnet_id ComputeResources:

- Name: ht-disabled-i1

DisableSimultaneousMultithreading: false InstanceType: c5.xlarge

Example of new FSx for Lustre ﬁle-system conﬁguration AWS ParallelCluster 2:

[cluster default]

fsx_settings = fsx ...

[fsx fsx]

shared_dir = /shared-fsx storage_capacity = 1200

imported_file_chunk_size = 1024 import_path = s3://bucket

export_path = s3://bucket/export_dir weekly_maintenance_start_time = 3:02:30 deployment_type = PERSISTENT_1

data_compression_type = LZ4

...SharedStorage:

- Name: fsx

MountDir: /shared-fsx StorageType: FsxLustre FsxLustreSettings:

StorageCapacity: 1200

(25)

ImportedFileChunkSize: 1024 ImportPath: s3://bucket

ExportPath: s3://bucket/export_dir WeeklyMaintenanceStartTime: "3:02:30"

DeploymentType: PERSISTENT_1 DataCompressionType: LZ4

Example of a cluster conﬁguration mounting an existing FSx for Lustre ﬁle-system AWS ParallelCluster 2:

[cluster default]

fsx_settings = fsx ...

[fsx fsx]

shared_dir = /shared-fsx fsx_fs_id = fsx_fs_id

...

SharedStorage:

- Name: fsx

MountDir: /shared-fsx StorageType: FsxLustre FsxLustreSettings:

FileSystemId: fsx_fs_id

Example of a cluster with the Intel HPC Platform Speciﬁcation software stack AWS ParallelCluster 2:

[cluster default]

enable_intel_hpc_platform = true ...

...

AdditionalPackages:

IntelSoftware:

IntelHpcPlatform: true

Notes:

• The installation of Intel HPC Platform Speciﬁcation software is subject to the terms and conditions of the applicable Intel End User License Agreement

Example of custom IAM conﬁgurations including: instance proﬁle, instance role, additional policies for instances and the role for the lambda functions associated to the cluster

[cluster default]

additional_iam_policies = arn:aws:iam::aws:policy/

AmazonS3ReadOnlyAccess,arn:aws:iam::aws:policy/AmazonDynamoDBReadOnlyAccess ec2_iam_role = ec2_iam_role

iam_lambda_role = lambda_iam_role

(26)

...

...Iam:

Roles:

CustomLambdaResources: lambda_iam_role HeadNode:

...

Iam:

InstanceRole: ec2_iam_role Scheduling:

- Name: queue1 ...

Iam:

InstanceProfile: iam_instance_profile - Name: queue2

...

Iam:

- Policy: arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess - Policy: arn:aws:iam::aws:policy/AmazonDynamoDBReadOnlyAccess

Notes:

• For AWS ParallelCluster 2 the IAM settings are applied to all the instances of a cluster and additional_iam_policies can't be used in conjunction with ec2_iam_role

• For AWS ParallelCluster 3, you can have diﬀerent IAM settings for head and compute nodes and even specify diﬀerent IAM settings for each compute queue.

• For AWS ParallelCluster 3, you can use an IAM instance proﬁle as an alternative to an IAM role.

InstanceProfile, InstanceRole or AdditionalIamPolicies can't be conﬁgured together.

Example of custom bootstrap actions AWS ParallelCluster 2:

[cluster default]

s3_read_resource = arn:aws:s3:::bucket_name/*

pre_install = s3://bucket_name/scripts/pre_install.sh pre_install_args = 'R curl wget'

post_install = s3://bucket_name/scripts/post_install.sh post_install_args = "R curl wget"

...

HeadNode:

...

CustomActions:

OnNodeStart:

Script: s3://bucket_name/scripts/pre_install.sh Args:

- R - curl - wget OnNodeConfigured:

(27)

Script: s3://bucket_name/scripts/post_install.sh Args: ['R', 'curl', 'wget']

Iam:

S3Access:

- BucketName: bucket_name Scheduling:

- Name: queue1 ...

CustomActions:

OnNodeStart:

Script: s3://bucket_name/scripts/pre_install.sh Args: ['R', 'curl', 'wget']

OnNodeConfigured:

Script: s3://bucket_name/scripts/post_install.sh Args: ['R', 'curl', 'wget']

Iam:

S3Access:

- BucketName: bucket_name

Example of a cluster with read and write access to the S3 bucket resources AWS ParallelCluster 2:

[cluster default]

s3_read_resource = arn:aws:s3:::bucket/read_only/*

s3_read_write_resource = arn:aws:s3:::bucket/read_and_write/*

...

...HeadNode:

...

Iam:

S3Access:

- BucketName: bucket_name KeyName: read_only/

EnableWriteAccess: False - BucketName: bucket_name KeyName: read_and_write/

EnableWriteAccess: True Scheduling:

- Name: queue1 ...

Iam:

S3Access:

- BucketName: bucket_name KeyName: read_only/

EnableWriteAccess: False - BucketName: bucket_name KeyName: read_and_write/

EnableWriteAccess: True

Inclusive language

AWS ParallelCluster 3 uses the words "head node" in places where "master" was used in AWS ParallelCluster 2. This includes the following:

(28)

• Variable exported in the AWS Batch job environment changed: from MASTER_IP to PCLUSTER_HEAD_NODE_IP.

• All AWS CloudFormation outputs changed from Master* to HeadNode*

• All NodeType and tags changed from Master to HeadNode.

Scheduler Support

AWS ParallelCluster 3.x doesn't support Son of Grid Engine (SGE) and Torque schedulers.

The AWS Batch commands awsbhosts, awsbkill, awsbout, awsbqueues, awsbstat, and awsbsub are distributed as a separate aws-parallelcluster-awsbatch-cli PyPI package. This package is installed by AWS ParallelCluster on the head node. You can still use these AWS Batch commands from the cluster's head node. However, if you wish to use AWS Batch commands from a location other than the head node, you must ﬁrst install the aws-parallelcluster-awsbatch-cli PyPI package.

AWS ParallelCluster CLI

The AWS ParallelCluster command line interface (CLI) has been changed. The new syntax is described in AWS ParallelCluster CLI commands (p. 81). The output format for the CLI is a JSON string.

Conﬁguring a new cluster

The pcluster configure command includes diﬀerent parameters in AWS ParallelCluster 3 as compared to AWS ParallelCluster 2. For more information, see pcluster configure (p. 84).

Note also that the configuration file syntax has changed from AWS ParallelCluster 2. For a full reference of the cluster configuration settings, see Cluster configuration file (p. 101).

Creating a new cluster

AWS ParallelCluster 2's pcluster create command has been replaced by the pcluster create- cluster (p. 84) command.

Note the default behavior in AWS ParallelCluster 2.x, without the -nw option, is to wait on cluster creation events, while AWS ParallelCluster 3.x command returns immediately. The progress of the cluster creation can be monitored using pcluster describe-cluster (p. 96)

An AWS ParallelCluster 3 configuration file contains a single cluster definition, so the -t parameter is no more needed.

The following is an example conﬁguration ﬁle.

# AWS ParallelCluster v2

$ pcluster create \ -r REGION \

-c V2_CONFIG_FILE \ -nw \

-t CLUSTER_TEMPLATE \ CLUSTER_NAME

$ pcluster create-cluster \ --region REGION \

--cluster-configuration V3_CONFIG_FILE \ --cluster-name CLUSTER_NAME

Listing clusters

The pcluster list AWS ParallelCluster 2.x command must be replaced with pcluster list- clusters (p. 89) command.

(29)

Note: You need AWS ParallelCluster v2 CLI to list clusters created with 2.x versions of AWS ParallelCluster. See Install AWS ParallelCluster in a virtual environment (recommended) (p. 3) for how to install multiple versions of AWS ParallelCluster using virtual environments.

$ pcluster list -r REGION

$ pcluster list-clusters --region REGION

Starting and Stopping a cluster

The pcluster start and pcluster stop AWS ParallelCluster 2.x commands must be replaced with pcluster update-compute-fleet (p. 92) commands.

Starting a compute ﬂeet:

$ pcluster start \ -r REGION \ CLUSTER_NAME

# AWS ParallelCluster v3 - Slurm fleets

$ pcluster update-compute-fleet \ --region REGION \

--cluster-name CLUSTER_NAME \ --status START_REQUESTED

# AWS ParallelCluster v3 - AWS Batch fleets

--cluster-name CLUSTER_NAME \ --status ENABLED

Stopping a compute ﬂeet:

$ pcluster stop \ -r REGION \ CLUSTER_NAME

# AWS ParallelCluster v3 - Slurm fleets

--cluster-name CLUSTER_NAME \ --status STOP_REQUESTED

# AWS ParallelCluster v3 - AWS Batch fleets

--cluster-name CLUSTER_NAME \ --status DISABLED

Connecting to a cluster

The pcluster ssh AWS ParallelCluster 2.x command has diﬀerent parameters names in AWS ParallelCluster 3.x. See pcluster ssh (p. 91)

Connecting to a cluster:

(30)

Supported Regions for AWS ParallelCluster version 3

$ pcluster ssh \ -r REGION \ CLUSTER_NAME \ -i ~/.ssh/id_rsa

$ pcluster ssh \ --region REGION \

--cluster-name CLUSTER_NAME \ -i ~/.ssh/id_rsa

IMDS conﬁguration update

Starting with version 3.0.0, AWS ParallelCluster introduced support for restricting access to the head node’s IMDS (and the instance proﬁle credentials) to a subset of superusers, by default. For more information, see Imds Properties (p. 111).

Supported Regions for AWS ParallelCluster version 3

AWS ParallelCluster version 3 is available in the following AWS Regions:

Region Name Region

US East (Ohio) us-east-2

US East (N. Virginia) us-east-1

US West (N. California) us-west-1

US West (Oregon) us-west-2

Africa (Cape Town) af-south-1

Asia Paciﬁc (Hong Kong) ap-east-1

Asia Paciﬁc (Mumbai) ap-south-1

Asia Paciﬁc (Seoul) ap-northeast-2

Asia Paciﬁc (Singapore) ap-southeast-1

Asia Paciﬁc (Sydney) ap-southeast-2

Asia Paciﬁc (Tokyo) ap-northeast-1

Canada (Central) ca-central-1

China (Beijing) cn-north-1

China (Ningxia) cn-northwest-1

Europe (Frankfurt) eu-central-1

Europe (Ireland) eu-west-1

Europe (London) eu-west-2

Europe (Milan) eu-south-1

Europe (Paris) eu-west-3

(31)

Using AWS ParallelCluster

Region Name Region

Europe (Stockholm) eu-north-1

Middle East (Bahrain) me-south-1

South America (São Paulo) sa-east-1

AWS GovCloud (US-East) us-gov-east-1

AWS GovCloud (US-West) us-gov-west-1

Using AWS ParallelCluster

Topics

• AWS Identity and Access Management roles in AWS ParallelCluster 3.x (p. 26)

• Network conﬁgurations (p. 48)

• Custom Bootstrap Actions (p. 57)

• Schedulers supported by AWS ParallelCluster (p. 59)

• Conﬁguration of Multiple Queues (p. 71)

• AWS ParallelCluster API (p. 72)

• Connect to the head node through NICE DCV (p. 77)

• Using pcluster update-cluster (p. 78)

AWS Identity and Access Management roles in AWS ParallelCluster 3.x

AWS ParallelCluster uses AWS Identity and Access Management (IAM) roles to control permissions that are associated with the AWS resources deployed to the AWS account. In AWS ParallelCluster we can identify two types of IAM roles: the one that is assumed by the user that invokes the CLI commands and the ones that are associated with AWS ParallelCluster resources, such as the EC2 instances launched in a cluster.

By default, AWS ParallelCluster takes care of creating all needed IAM roles that are conﬁgured with the minimal set of policies required by AWS ParallelCluster resources. However, the user that invokes the various AWS ParallelCluster operations must have the right level of permissions to create or modify all of the necessary resources.

Topics

• Using existing IAM roles with AWS ParallelCluster (p. 26)

• AWS ParallelCluster example user policies (p. 27)

• AWS ParallelCluster parameters to control IAM permissions (p. 39)

Using existing IAM roles with AWS ParallelCluster

You can use existing IAM roles when creating a cluster or building a custom EC2 image. Typically, you choose existing IAM roles to fully control the permissions that are granted to AWS ParallelCluster

resources and to the users of the cluster. The following examples show the IAM policies and roles that are

(32)

AWS Identity and Access Management roles in AWS ParallelCluster 3.x

required to both invoke AWS ParallelCluster features and customize permissions associated with cluster EC2 instances.

In the policies, replace <REGION>, <AWS ACCOUNT ID>, and similar strings with the appropriate values.

AWS ParallelCluster example user policies

The AWS ParallelCluster user role refers to the IAM role assumed by the user of the AWS ParallelCluster CLI.

The following example policies include Amazon Resource Names (ARNs) for the resources. If you're working in the AWS GovCloud (US) or AWS China partitions, the ARNs must be changed. Speciﬁcally, they must be changed from "arn:aws" to "arn:aws-us-gov" for the AWS GovCloud (US) partition or "arn:aws- cn" for the AWS China partition. For more information, see Amazon Resource Names (ARNs) in AWS GovCloud (US) Regions in the AWS GovCloud (US) User Guide and ARNs for AWS services in China in Getting Started with AWS services in China.

Topics

• Base user policy required to invoke AWS ParallelCluster features (p. 27)

• Additional user policy when using AWS Batch scheduler (p. 31)

• User Policy to use AWS ParallelCluster image build features (p. 32)

• User Policy to manage IAM resources (p. 35)

Base user policy required to invoke AWS ParallelCluster features

The following policy shows the permissions required to execute AWS ParallelCluster commands.

{

"Version": "2012-10-17", "Statement": [

{

"Action": [

"ec2:Describe*"

],

"Resource": "*", "Effect": "Allow", "Sid": "EC2Read"

}, {

"Action": [

"ec2:AllocateAddress", "ec2:AssociateAddress", "ec2:AttachNetworkInterface", "ec2:AuthorizeSecurityGroupEgress", "ec2:AuthorizeSecurityGroupIngress", "ec2:CreateLaunchTemplate",

"ec2:CreateLaunchTemplateVersion", "ec2:CreateNetworkInterface", "ec2:CreatePlacementGroup", "ec2:CreateSecurityGroup", "ec2:CreateSnapshot", "ec2:CreateTags", "ec2:CreateVolume",

"ec2:DeleteLaunchTemplate", "ec2:DeleteNetworkInterface", "ec2:DeletePlacementGroup", "ec2:DeleteSecurityGroup", "ec2:DeleteVolume", "ec2:DisassociateAddress", "ec2:ModifyLaunchTemplate",

(33)

AWS Identity and Access Management roles in AWS ParallelCluster 3.x "ec2:ModifyNetworkInterfaceAttribute", "ec2:ModifyVolume",

"ec2:ModifyVolumeAttribute", "ec2:ReleaseAddress",

"ec2:RevokeSecurityGroupEgress", "ec2:RevokeSecurityGroupIngress", "ec2:RunInstances",

"ec2:TerminateInstances"

],

"Resource": "*", "Effect": "Allow", "Sid": "EC2Write"

}, {

"Action": [

"dynamodb:DescribeTable", "dynamodb:ListTagsOfResource", "dynamodb:CreateTable",

"dynamodb:DeleteTable", "dynamodb:GetItem", "dynamodb:PutItem", "dynamodb:Query", "dynamodb:TagResource"

],

"Resource": "arn:aws:dynamodb:*:<AWS ACCOUNT ID>:table/parallelcluster-*", "Effect": "Allow",

"Sid": "DynamoDB"

}, {

"Action": [

"route53:ChangeResourceRecordSets", "route53:ChangeTagsForResource", "route53:CreateHostedZone", "route53:DeleteHostedZone", "route53:GetChange", "route53:GetHostedZone",

"route53:ListResourceRecordSets", "route53:ListQueryLoggingConfigs"

],

"Resource": "*", "Effect": "Allow",

"Sid": "Route53HostedZones"

}, {

"Action": [

"cloudformation:*"

],

"Resource": "*", "Effect": "Allow", "Sid": "CloudFormation"

}, {

"Action": [

"cloudwatch:PutDashboard", "cloudwatch:ListDashboards", "cloudwatch:DeleteDashboards", "cloudwatch:GetDashboard"

],

"Resource": "*", "Effect": "Allow", "Sid": "CloudWatch"

}, {

"Action": [

"iam:GetRole", "iam:GetRolePolicy",

(34)

AWS Identity and Access Management roles in AWS ParallelCluster 3.x "iam:GetPolicy",

"iam:SimulatePrincipalPolicy", "iam:GetInstanceProfile"

],

"Resource": [

"arn:aws:iam::<AWS ACCOUNT ID>:role/*", "arn:aws:iam::<AWS ACCOUNT ID>:policy/*", "arn:aws:iam::aws:policy/*",

"arn:aws:iam::<AWS ACCOUNT ID>:instance-profile/*"

],

"Effect": "Allow", "Sid": "IamRead"

}, {

"Action": [

"iam:CreateInstanceProfile", "iam:DeleteInstanceProfile", "iam:AddRoleToInstanceProfile", "iam:RemoveRoleFromInstanceProfile"

],

"Resource": [

"arn:aws:iam::<AWS ACCOUNT ID>:instance-profile/parallelcluster/*"

],

"Effect": "Allow",

"Sid": "IamInstanceProfile"

}, {

"Condition": {

"StringEqualsIfExists": { "iam:PassedToService": [ "lambda.amazonaws.com", "ec2.amazonaws.com", "spotfleet.amazonaws.com"

] } },

"Action": [

"iam:PassRole"

],

"Resource": [

"arn:aws:iam::<AWS ACCOUNT ID>:role/parallelcluster/*"

],

"Effect": "Allow", "Sid": "IamPassRole"

}, {

"Condition": { "StringEquals": {

"iam:AWSServiceName": [ "fsx.amazonaws.com",

"s3.data-source.lustre.fsx.amazonaws.com"

] } },

"Action": [

"iam:CreateServiceLinkedRole", "iam:DeleteServiceLinkedRole"

],

"Resource": "*", "Effect": "Allow"

}, {

"Action": [

"lambda:CreateFunction", "lambda:DeleteFunction",

"lambda:GetFunctionConfiguration",

(35)

AWS Identity and Access Management roles in AWS ParallelCluster 3.x "lambda:GetFunction",

"lambda:InvokeFunction", "lambda:AddPermission", "lambda:RemovePermission",

"lambda:UpdateFunctionConfiguration"

],

"Resource": [

"arn:aws:lambda:*:<AWS ACCOUNT ID>:function:parallelcluster-*", "arn:aws:lambda:*:<AWS ACCOUNT ID>:function:pcluster-*"

],

"Effect": "Allow", "Sid": "Lambda"

}, {

"Action": [ "s3:*"

],

"Resource": [

"arn:aws:s3:::parallelcluster-*", "arn:aws:s3:::aws-parallelcluster-*"

],

"Effect": "Allow",

"Sid": "S3ResourcesBucket"

}, {

"Action": [ "s3:Get*", "s3:List*"

],

"Resource": "arn:aws:s3:::*-aws-parallelcluster*", "Effect": "Allow",

"Sid": "S3ParallelClusterReadOnly"

}, {

"Action": [ "fsx:*"

],

"Resource": [

"arn:aws:fsx:*:<AWS ACCOUNT ID>:*"

],

"Effect": "Allow", "Sid": "FSx"

}, {

"Action": [

"elasticfilesystem:*"

],

"Resource": [

"arn:aws:elasticfilesystem:*:<AWS ACCOUNT ID>:*"

],

"Effect": "Allow", "Sid": "EFS"

}, {

"Action": [

"logs:DeleteLogGroup", "logs:PutRetentionPolicy", "logs:DescribeLogGroups", "logs:CreateLogGroup", "logs:FilterLogEvents", "logs:GetLogEvents", "logs:CreateExportTask", "logs:DescribeLogStreams", "logs:DescribeExportTasks"

],

"Resource": "*",

AWS ParallelCluster

AWS ParallelCluster

AWS ParallelCluster User Guide

AWS ParallelCluster: AWS ParallelCluster User Guide

Table of Contents

What is AWS ParallelCluster

AWS ParallelCluster version 3

Setting up AWS ParallelCluster

Installing AWS ParallelCluster

Install AWS ParallelCluster in a virtual environment (recommended)

Installing AWS ParallelCluster in a non-virtual environment using pip

Steps to take after installation

Multiple user access to clusters

Create an Active Directory

Create a cluster with an AD domain

Log in to a cluster integrated with an AD domain

Running MPI jobs

Example AWS Managed Microsoft AD over LDAP(S) cluster conﬁgurations

Conﬁguring AWS ParallelCluster

Best practices

Best practices: head node instance type selection

Best practices: network performance

Moving from AWS ParallelCluster 2.x to 3.x

Custom Bootstrap Actions

AWS ParallelCluster 2.x and 3.x use different configuration file syntax

Inclusive language

Scheduler Support

AWS ParallelCluster CLI

IMDS conﬁguration update

Supported Regions for AWS ParallelCluster version 3

Using AWS ParallelCluster

AWS Identity and Access Management roles in AWS ParallelCluster 3.x

Using existing IAM roles with AWS ParallelCluster

AWS ParallelCluster example user policies

Base user policy required to invoke AWS ParallelCluster features