• 沒有找到結果。

AWS ParallelCluster

N/A
N/A
Protected

Academic year: 2022

Share "AWS ParallelCluster"

Copied!
411
0
0

加載中.... (立即查看全文)

全文

(1)

AWS ParallelCluster

AWS ParallelCluster User Guide

AWS ParallelCluster: AWS ParallelCluster User Guide

Copyright © Amazon Web Services, Inc. and/or its affiliates. All rights reserved.

(2)

Amazon's trademarks and trade dress may not be used in connection with any product or service that is not Amazon's, in any manner that is likely to cause confusion among customers, or in any manner that disparages or discredits Amazon. All other trademarks not owned by Amazon are the property of their respective owners, who may or may not be affiliated with, connected to, or sponsored by Amazon.

(3)

Table of Contents

What is AWS ParallelCluster ... 1

AWS ParallelCluster version 3 ... 2

Setting up AWS ParallelCluster ... 2

Installing AWS ParallelCluster ... 2

Steps to take after installation ... 4

Multiple user access to clusters ... 5

Configuring AWS ParallelCluster ... 11

Best practices ... 17

Moving from AWS ParallelCluster 2.x to 3.x ... 18

Supported Regions for AWS ParallelCluster version 3 ... 25

Using AWS ParallelCluster ... 26

AWS Identity and Access Management roles in AWS ParallelCluster 3.x ... 26

Network configurations ... 48

Custom Bootstrap Actions ... 57

Schedulers supported by AWS ParallelCluster ... 59

Configuration of Multiple Queues ... 71

AWS ParallelCluster API ... 72

Connect to the head node through NICE DCV ... 77

Using pcluster update-cluster ... 78

Reference for AWS ParallelCluster ... 81

AWS ParallelCluster version 3 CLI commands ... 81

Configuration files ... 101

Tutorials ... 146

Running your first job on AWS ParallelCluster ... 146

Building a Custom AWS ParallelCluster AMI ... 148

Integrate Active Directory over LDAP ... 153

AWS ParallelCluster Troubleshooting ... 166

Retrieving and preserving logs ... 166

Troubleshooting cluster deployment issues ... 167

Troubleshooting scaling issues ... 170

Placement groups and instance launch issues ... 174

Directories that cannot be replaced ... 174

Troubleshooting issues in NICE DCV ... 174

Troubleshooting issues in clusters with AWS Batch integration ... 175

Troubleshooting multi-user integration with Active Directory ... 175

Additional support ... 182

AWS ParallelCluster version 2 ... 183

Setting up AWS ParallelCluster ... 183

Installing AWS ParallelCluster ... 183

Configuring AWS ParallelCluster ... 192

Best practices ... 197

Moving from CfnCluster to AWS ParallelCluster ... 198

Supported Regions ... 199

Using AWS ParallelCluster ... 200

Network configurations ... 200

Custom Bootstrap Actions ... 205

Working with Amazon S3 ... 207

Working with Spot Instances ... 207

AWS Identity and Access Management roles in AWS ParallelCluster ... 209

Schedulers supported by AWS ParallelCluster ... 237

Multiple queue mode ... 250

Amazon CloudWatch dashboard ... 250

Integration with Amazon CloudWatch Logs ... 251

Elastic Fabric Adapter ... 252

(4)

Intel Select Solutions ... 253

Enable Intel MPI ... 254

Intel HPC Platform Specification ... 255

Arm Performance Libraries ... 255

Connect to the head node through NICE DCV ... 256

Using pcluster update ... 257

AWS ParallelCluster CLI commands ... 259

pcluster ... 259

pcluster-config ... 272

Configuration ... 205

Layout ... 273

[global] section ... 274

[aws] section ... 275

[aliases] section ... 275

[cluster] section ... 275

[compute_resource] section ... 296

[cw_log] section ... 298

[dashboard] section ... 298

[dcv] section ... 299

[ebs] section ... 300

[efs] section ... 304

[fsx] section ... 307

[queue] section ... 316

[raid] section ... 319

[scaling] section ... 323

[vpc] section ... 323

Examples ... 207

How AWS ParallelCluster works ... 328

AWS ParallelCluster processes ... 328

AWS services used by AWS ParallelCluster ... 333

AWS ParallelCluster Auto Scaling ... 337

Tutorials ... 339

Running your first job on AWS ParallelCluster ... 339

Building a Custom AWS ParallelCluster AMI ... 342

Running an MPI job with AWS ParallelCluster and awsbatch scheduler ... 344

Disk encryption with a custom KMS Key ... 349

Multiple queue mode tutorial ... 351

Development ... 359

Setting up a custom AWS ParallelCluster cookbook ... 359

Setting up a custom AWS ParallelCluster node package ... 360

Troubleshooting ... 361

Retrieving and preserving logs ... 361

Troubleshooting stack deployment issues ... 362

Troubleshooting issues in multiple queue mode clusters ... 362

Troubleshooting issues in single queue mode clusters ... 366

Placement groups and instance launch issues ... 367

Directories that cannot be replaced ... 368

Troubleshooting issues in NICE DCV ... 368

Troubleshooting issues in clusters with AWS Batch integration ... 368

Troubleshooting when a resource fails to create ... 369

Additional support ... 370

AWS ParallelCluster support policy ... 371

Security ... 372

Security information for services used by AWS ParallelCluster ... 372

Data protection ... 373

Data encryption ... 373

See also ... 374

(5)

Identity and Access Management ... 374

Compliance validation ... 375

Enforcing TLS 1.2 ... 375

Determine Your Currently Supported Protocols ... 376

Compile OpenSSL and Python ... 377

Document history ... 378

(6)

What is AWS ParallelCluster

AWS ParallelCluster is an AWS supported open source cluster management tool that helps you to deploy and manage high performance computing (HPC) clusters in the AWS Cloud. Built on the open source CfnCluster project, AWS ParallelCluster enables you to quickly build an HPC compute environment in AWS. It automatically sets up the required compute resources and shared filesystem. You can use AWS ParallelCluster with batch schedulers, such as AWS Batch and Slurm. AWS ParallelCluster facilitates quick start proof of concept deployments and production deployments. You can also build higher level workflows, such as a genomics portal that automates an entire DNA sequencing workflow, on top of AWS ParallelCluster.

(7)

Setting up AWS ParallelCluster

AWS ParallelCluster version 3

Starting with AWS ParallelCluster version 3.1.1, you can configure clusters to use an Active Directory (AD) domain. This AD domain is managed by one of the AWS Directory Service products. You can set up this configuration to share clusters among multiple users in a way that simplifies collaboration while also reducing your costs and administrative overhead. For more information, see Multiple user access to clusters (p. 5).

Topics

• Setting up AWS ParallelCluster (p. 2)

• Using AWS ParallelCluster (p. 26)

• Reference for AWS ParallelCluster (p. 81)

• Tutorials (p. 146)

• AWS ParallelCluster Troubleshooting (p. 166)

Setting up AWS ParallelCluster

Topics

• Installing AWS ParallelCluster (p. 2)

• Steps to take after installation (p. 4)

• Multiple user access to clusters (p. 5)

• Configuring AWS ParallelCluster (p. 11)

• Best practices (p. 17)

• Moving from AWS ParallelCluster 2.x to 3.x (p. 18)

• Supported Regions for AWS ParallelCluster version 3 (p. 25)

Installing AWS ParallelCluster

AWS ParallelCluster is distributed as a Python package and is installed using the Python pip package manager. For instructions on how to install Python packages, see Installing packages in the Python Packaging User Guide.

Ways to install AWS ParallelCluster:

• Install AWS ParallelCluster in a virtual environment (recommended) (p. 3)

• Installing AWS ParallelCluster in a non-virtual environment using pip (p. 4)

You can find the version number of the most recent CLI on the releases page on GitHub. In this guide, the command examples assume that you have installed a version of Python that is later than version 3.6. The pip command examples use the pip3 version.

Manage both AWS ParallelCluster 2 and AWS ParallelCluster 3

(8)

Installing AWS ParallelCluster

For customers who use both AWS ParallelCluster 2 and AWS ParallelCluster 3 and want to manage the CLIs for both packages, we recommend that you install AWS ParallelCluster 2 and AWS ParallelCluster 3 in different virtual environments (p. 3). This ensures that you can continue using each version of AWS ParallelCluster and any associated cluster resources.

Install AWS ParallelCluster in a virtual environment (recommended)

We recommend that you install AWS ParallelCluster in a virtual environment to avoid requirement version conflicts with other pip packages.

Prerequisites

• AWS ParallelCluster requires Python 3.6 or later. If you don't already have it installed, download a compatible version for your platform at python.org.

To install AWS ParallelCluster in a virtual environment

1. If virtualenv isn't installed, install virtualenv using pip3. If python3 -m virtualenv help displays help information, go to step 2.

$ python3 -m pip install --upgrade pip

$ python3 -m pip install --user --upgrade virtualenv

Run exit to leave the current terminal window and open a new terminal window to pick up changes to the environment.

2. Create a virtual environment and name it.

$ python3 -m virtualenv ~/apc-ve

Alternatively, you can use the -p option to specify a specific version of Python.

$ python3 -m virtualenv -p $(which python3) ~/apc-ve 3. Activate your new virtual environment.

$ source ~/apc-ve/bin/activate

4. Install AWS ParallelCluster into your virtual environment.

(apc-ve)~$ python3 -m pip install --upgrade "aws-parallelcluster"

5. Install Node Version Manager and Node.js. It's required due to AWS Cloud Development Kit (CDK) usage for template generation.

$ curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.38.0/install.sh | bash

$ chmod ug+x ~/.nvm/nvm.sh

$ source ~/.nvm/nvm.sh

$ nvm install --lts

$ node --version

6. Verify that AWS ParallelCluster is installed correctly.

$ pcluster version {

(9)

Steps to take after installation

"version": "3.1.2"

}

You can use the deactivate command to exit the virtual environment. Each time you start a session, you must reactivate the environment (p. 3).

To upgrade to the latest version of AWS ParallelCluster, run the installation command again.

(apc-ve)~$ python3 -m pip install --upgrade "aws-parallelcluster"

Installing AWS ParallelCluster in a non-virtual environment using pip

Prerequisites

• AWS ParallelCluster requires Python 3.6 or later. If you don't already have it installed, download a compatible version for your platform at python.org.

Install AWS ParallelCluster

1. Use pip to install AWS ParallelCluster.

$ python3 -m pip install "aws-parallelcluster" --upgrade --user

When you use the --user switch, pip installs AWS ParallelCluster to ~/.local/bin.

2. Install Node Version Manager and Node.js. It's required due to AWS Cloud Development Kit (CDK) usage for template generation.

$ curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.38.0/install.sh | bash

$ chmod ug+x ~/.nvm/nvm.sh

$ source ~/.nvm/nvm.sh

$ nvm install --lts

$ node --version

3. Verify that AWS ParallelCluster installed correctly.

$ pcluster version { "version": "3.1.2"

}

4. To upgrade to the latest version, run the installation command again.

$ python3 -m pip install "aws-parallelcluster" --upgrade --user

Steps to take after installation

You can verify that AWS ParallelCluster installed correctly by running pcluster version (p. 100).

$ pcluster version {"version": "3.1.2"

(10)

Multiple user access to clusters

}

AWS ParallelCluster is updated regularly. To update to the latest version of AWS ParallelCluster, run the installation command again. For more information about the latest version of AWS ParallelCluster, see the AWS ParallelCluster release notes.

$ pip3 install aws-parallelcluster --upgrade --user

To uninstall AWS ParallelCluster, use pip3 uninstall.

$ pip3 uninstall aws-parallelcluster

If you don't have Python and pip3, use the procedure for your environment.

Multiple user access to clusters

Learn to implement and manage multiple user access to a single cluster.

In this topic, an AWS ParallelCluster user refers to a system user for compute instances. An example is an ec2-user for an AWS EC2 instance.

AWS ParallelCluster multi-user access support is available in all the AWS Regions where AWS

ParallelCluster is currently available. It works with other AWS services, including Amazon FSx for Lustre and Amazon Elastic File System.

You can use an AWS Directory Service for Microsoft Active Directory or Simple AD to manage cluster access. To set up a cluster, specify an AWS ParallelCluster DirectoryService (p. 138) configuration.

AWS Directory Service directories can be connected to multiple clusters. This allows for centralized management of identities across multiple environments and a unified login experience.

When you use AWS Directory Service for AWS ParallelCluster multiple access, you can log in to the cluster with user credentials that have been defined in the directory. These credentials consist of a user name and password. After you log in to the cluster for the first time, a user SSH key is automatically generated. You can use it to log in without a password.

You can create, delete, and modify a cluster’s users or groups after your directory service has been deployed. With AWS Directory Service, you can do this in the AWS Management Console or by using the Active Directory Users and Computers tool. This tool is accessible from any EC2 instance that's joined to your Active Directory. For more information, see Installing the Active Directory administration tools.

If you plan to use AWS ParallelCluster in a single subnet with no internet access, see AWS ParallelCluster in a single subnet with no internet access (p. 53) for additional requirements.

Create an Active Directory

Make sure that you create an Active Directory (AD) before you create your cluster. For information about how to choose the type of active directory for your cluster, see Which to choose in the AWS Directory Service Administration Guide.

If the directory is empty, add users with user names and passwords. For more information, see the documentation that's specific to AWS Directory Service for Microsoft Active Directory or Simple AD.

Create a cluster with an AD domain

Configure your cluster to integrate with a directory by specifying the relevant information in the DirectoryService section of the cluster configuration file. For more information, see the DirectoryService (p. 138) configuration section.

(11)

Multiple user access to clusters

You can use this following example to integrate your cluster with an AWS Managed Microsoft AD over LDAP.

Specific definitions that are required for an AWS Managed Microsoft AD over LDAP configuration:

• You must set the ldap_auth_disable_tls_never_use_in_production parameter to True under AdditionalSssdConfigs.

• You can specify either controllers hostnames or IP addresses for DomainAddr.

• DomainReadOnlyUser syntax must be as follows:

cn=ReadOnly,ou=Users,ou=CORP,dc=corp,dc=pcluster,dc=com

Get your AWS Managed Microsoft AD configuration data:

$ aws ds describe-directories --directory-id "d-abcdef01234567890"

{ "DirectoryDescriptions": [ {

"DirectoryId": "d-abcdef01234567890", "Name": "corp.pcluster.com",

"DnsIpAddrs": [ "203.0.113.225", "192.0.2.254"

],

"VpcSettings": {

"VpcId": "vpc-021345abcdef6789", "SubnetIds": [

"subnet-1234567890abcdef0", "subnet-abcdef01234567890"

],

"AvailabilityZones": [ "region-idb", "region-idd"

] } } ] }

Cluster configuration for an AWS Managed Microsoft AD:

Region: region-id Image:

Os: alinux2 HeadNode:

InstanceType: t2.micro Networking:

SubnetId: subnet-1234567890abcdef0 Ssh:

KeyName: pcluster Scheduling:

Scheduler: slurm SlurmQueues:

- Name: queue1 ComputeResources:

- Name: t2micro

InstanceType: t2.micro

(12)

Multiple user access to clusters

MinCount: 1 MaxCount: 10 Networking:

SubnetIds:

- subnet-abcdef01234567890 DirectoryService:

DomainName: dc=corp,dc=pcluster,dc=com

DomainAddr: ldap://203.0.113.225,ldap://192.0.2.254 PasswordSecretArn: arn:aws:secretsmanager:region- id:123456789012:secret:MicrosoftAD.Admin.Password-1234

DomainReadOnlyUser: cn=ReadOnly,ou=Users,ou=CORP,dc=corp,dc=pcluster,dc=com AdditionalSssdConfigs:

ldap_auth_disable_tls_never_use_in_production: True

To use this configuration for a Simple AD, change the DomainReadOnlyUser property value in the DirectoryService section:

DirectoryService:

DomainName: dc=corp,dc=pcluster,dc=com

DomainAddr: ldap://203.0.113.225,ldap://192.0.2.254 PasswordSecretArn: arn:aws:secretsmanager:region- id:123456789012:secret:SimpleAD.Admin.Password-1234

DomainReadOnlyUser: cn=ReadOnlyUser,cn=Users,dc=corp,dc=pcluster,dc=com AdditionalSssdConfigs:

ldap_auth_disable_tls_never_use_in_production: True

Considerations:

• We recommend that you use LDAP over TSL/SSL (or LDAPS) rather than LDAP alone. TSL/SSL ensures that the connection is encrypted.

• The DomainAddr property value matches the entries in the DnsIpAddrs list from the describe- directories output.

• We recommend that your cluster use subnets that are located in the same Availability Zone that the DomainAddr points to. If you use custom Dynamic Host Configuration Protocol (DHCP) configuration that's recommended for directory VPCs and your subnets aren't located in the DomainAddr

Availability Zone, cross traffic among Availability Zones is possible. The use of custom DHCP configurations isn't required to make use of the multi-user AD integration feature.

• The DomainReadOnlyUser property value specifies a user that must be created in the directory.

This user isn't created by default. We recommend that you don't give this user permission to modify directory data.

• The PasswordSecretArn property value points to an AWS Secrets Manager secret that contains the password of the user that you specified for the DomainReadOnlyUser property. If this user’s password changes, update the secret value and update the configuration on the cluster instances.

Before you update the instances, make sure to stop the cluster’s compute fleet. Alternatively, you can run the following command on any active cluster nodes, starting first with the head node.

sudo cinc-client \ --local-mode \

--config /etc/chef/client.rb \ --log_level auto \

--force-formatter \ --no-color \

--chef-zero-port 8889 \

--json-attributes /etc/chef/dna.json \

--override-runlist aws-parallelcluster-config::directory_service

For another example, see also Integrate Active Directory over LDAP (p. 153).

(13)

Multiple user access to clusters

Log in to a cluster integrated with an AD domain

If you enabled the AD domain integration feature, authentication by password is enabled on the cluster head node. The home directory of an AD user is created at the first user login into the head node or the first time a sudo-user switches to the AD user on the head node.

Password authentication isn't enabled for cluster compute nodes. AD users must log in to compute nodes with SSH keys.

By default, SSH keys are set up in the AD user /${HOME}/.ssh directory at the first SSH login to the head node. This behavior can be disabled by setting GenerateSshKeysForUsers boolean property to false in the cluster configuration. By default, GenerateSshKeysForUsers is set to true.

If a AWS ParallelCluster application requires password-less SSH between cluster nodes, make sure that the SSH keys have been correctly set up in the user's home directory.

NoteIf the AD integration feature doesn't work as expected, the SSSD logs can provide useful diagnostic information for troubleshooting the issue. These logs are located in the /var/

log/sssd directory on cluster nodes. By default, they're also stored in a cluster’s Amazon CloudWatch log group.

For more information, see Troubleshooting multi-user integration with Active Directory (p. 175).

Running MPI jobs

As suggested in SchedMD, MPI jobs should be bootstrapped by using Slurm as the MPI bootstrapping method. For more information, refer to the official Slurm documentation or the official documentation for your MPI library.

For example, in the IntelMPI official documentation, you learn that when running a StarCCM job, you must set Slurm as process orchestrator by exporting the environment variable I_MPI_HYDRA_BOOTSTRAP=slurm.

NoteKnown issue

In the case where your MPI application relies on SSH as mechanism to spawn MPI jobs, it's possible to incur in a known bug in Slurm that causes the wrong resolution of the directory user name to "nobody".

Either configure your application to use Slurm as the MPI bootstrapping method or refer to Known issues with username resolution (p. 180) in the Troubleshooting section for further details and possible workarounds.

Example AWS Managed Microsoft AD over LDAP(S) cluster configurations

AWS ParallelCluster supports multiple user access by integrating with an AWS Directory Service over Lightweight Directory Access Protocol (LDAP), or LDAP over TSL/SSL (LDAPS).

The following examples show how to create cluster configurations to integrate with an AWS Managed Microsoft AD over LDAP(S).

AWS Managed Microsoft AD over LDAPS with certificate verification

You can use this example to integrate your cluster with an AWS Managed Microsoft AD over LDAPS, with certificate verification.

(14)

Multiple user access to clusters

Specific definitions for an AWS Managed Microsoft AD over LDAPS with certificates configuration:

• LdapTlsReqCert must be set to hard (default) for LDAPS with certificate verification.

• LdapTlsCaCert must specify the path to your certificate of authority (CA) certificate.

The CA certificate is a certificate bundle that contains the certificates of the entire CA chain that issued certificates for the AD domain controllers.

Your CA certificate and certificates must be installed on the cluster nodes.

• Controllers hostnames must be specified for DomainAddr, not IP addresses.

• DomainReadOnlyUser syntax must be as follows:

cn=ReadOnly,ou=Users,ou=CORP,dc=corp,dc=pcluster,dc=com

Example cluster configuration file for using AD over LDAPS:

Region: region-id Image:

Os: alinux2 HeadNode:

InstanceType: t2.micro Networking:

SubnetId: subnet-1234567890abcdef0 Ssh:

KeyName: pcluster Iam:

AdditionalIamPolicies:

- Policy: arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess CustomActions:

OnNodeConfigured:

Script: s3://aws-parallelcluster/scripts/pcluster-dub-msad-ldaps.post.sh Scheduling:

Scheduler: slurm SlurmQueues:

- Name: queue1 ComputeResources:

- Name: t2micro

InstanceType: t2.micro MinCount: 1

MaxCount: 10 Networking:

SubnetIds:

- subnet-abcdef01234567890 Iam:

AdditionalIamPolicies:

- Policy: arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess CustomActions:

OnNodeConfigured:

Script: s3://aws-parallelcluster-pcluster/scripts/pcluster-dub-msad-ldaps.post.sh DirectoryService:

DomainName: dc=corp,dc=pcluster,dc=com

DomainAddr: ldaps://win-abcdef01234567890.corp.pcluster.com,ldaps://win- abcdef01234567890.corp.pcluster.com

PasswordSecretArn: arn:aws:secretsmanager:region- id:123456789012:secret:MicrosoftAD.Admin.Password-1234

DomainReadOnlyUser: cn=ReadOnly,ou=Users,ou=CORP,dc=corp,dc=pcluster,dc=com LdapTlsCaCert: /etc/openldap/cacerts/corp.pcluster.com.bundleca.cer

LdapTlsReqCert: hard

(15)

Multiple user access to clusters

Add certificates and configure domain controllers in post install script:

*#!/bin/bash*

set -e

AD_CERTIFICATE_S3_URI="s3://corp.pcluster.com/bundle/corp.pcluster.com.bundleca.cer"

AD_CERTIFICATE_LOCAL="/etc/openldap/cacerts/corp.pcluster.com.bundleca.cer"

AD_HOSTNAME_1="win-abcdef01234567890.corp.pcluster.com"

AD_IP_1="192.0.2.254"

AD_HOSTNAME_2="win-abcdef01234567890.corp.pcluster.com"

AD_IP_2="203.0.113.225"

# Download CA certificate

mkdir -p $(dirname "${AD_CERTIFICATE_LOCAL}")

aws s3 cp "${AD_CERTIFICATE_S3_URI}" "${AD_CERTIFICATE_LOCAL}"

chmod 644 "${AD_CERTIFICATE_LOCAL}"

# Configure domain controllers reachability echo "${AD_IP_1} ${AD_HOSTNAME_1}" >> /etc/hosts echo "${AD_IP_2} ${AD_HOSTNAME_2}" >> /etc/hosts

You can retrieve the domain controllers hostnames from instances joined to the domain as shown in the following examples.

From Windows instance

$ nslookup 192.0.2.254

Server: corp.pcluster.com Address: 192.0.2.254

Name: win-abcdef01234567890.corp.pcluster.com Address: 192.0.2.254

From Linux instance

$ nslookup 192.0.2.254

192.0.2.254.in-addr.arpa name = corp.pcluster.com

192.0.2.254.in-addr.arpa name = win-abcdef01234567890.corp.pcluster.com

AWS Managed Microsoft AD over LDAPS without certificate verification

You can use this example to integrate your cluster with an AWS Managed Microsoft AD over LDAPS, without certificate verification.

Specific definitions for an AWS Managed Microsoft AD over LDAPS without certificate verification configuration:

• LdapTlsReqCert must be set to never.

• Either controllers hostnames or IP addresses can be specified for DomainAddr.

• DomainReadOnlyUser syntax must be as follows:

cn=ReadOnly,ou=Users,ou=CORP,dc=corp,dc=pcluster,dc=com

(16)

Configuring AWS ParallelCluster

Example cluster configuration file for using AWS Managed Microsoft AD over LDAPS without certificate verification:

Region: region-id Image:

Os: alinux2 HeadNode:

InstanceType: t2.micro Networking:

SubnetId: subnet-1234567890abcdef0 Ssh:

KeyName: pcluster Scheduling:

Scheduler: slurm SlurmQueues:

- Name: queue1 ComputeResources:

- Name: t2micro

InstanceType: t2.micro MinCount: 1

MaxCount: 10 Networking:

SubnetIds:

- subnet-abcdef01234567890 DirectoryService:

DomainName: dc=corp,dc=pcluster,dc=com

DomainAddr: ldaps://203.0.113.225,ldaps://192.0.2.254 PasswordSecretArn: arn:aws:secretsmanager:region- id:123456789012:secret:MicrosoftAD.Admin.Password-1234

DomainReadOnlyUser: cn=ReadOnly,ou=Users,ou=CORP,dc=corp,dc=pcluster,dc=com LdapTlsReqCert: never

Configuring AWS ParallelCluster

After you install AWS ParallelCluster, complete the following configuration steps.

First, set up your AWS credentials. For more information, see Configuring the AWS CLI in the AWS CLI user guide.

$ aws configure

AWS Access Key ID [None]: AKIAIOSFODNN7EXAMPLE

AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY Default region name [us-east-1]: us-east-1

Default output format [None]:

The AWS Region where the cluster is launched must have at least one Amazon EC2 key pair. For more information, see Amazon EC2 key pairs in the Amazon EC2 User Guide for Linux Instances.

$ pcluster configure --config cluster-config.yaml

The configure wizard prompts you for all of the information that's required to create your cluster. The details of the sequence differ when using AWS Batch as the scheduler compared to using Slurm.

Slurm

From the list of valid AWS Region identifiers, choose the Region where you want your cluster to run.

NoteThe list of Regions shown is based on the partition of your account, and only includes Regions that are enabled for your account. For more information about enabling Regions for your account, see Managing AWS Regions in the AWS General Reference. The example

(17)

Configuring AWS ParallelCluster

shown is from the AWS Global partition. If your account is in the AWS GovCloud (US) partition, only Regions in that partition are listed (gov-us-east-1 and gov-us- west-1). Similarly, if your account is in the AWS China partition, only cn-north-1 and cn-northwest-1 are shown. For the complete list of Regions supported by AWS ParallelCluster, see Supported Regions for AWS ParallelCluster version 3 (p. 25).

Allowed values for AWS Region ID:

1. af-south-1 2. ap-east-1 3. ap-northeast-1 4. ap-northeast-2 5. ap-south-1 6. ap-southeast-1 7. ap-southeast-2 8. ca-central-1 9. eu-central-1 10. eu-north-1 11. eu-south-1 12. eu-west-1 13. eu-west-2 14. eu-west-3 15. me-south-1 16. sa-east-1 17. us-east-1 18. us-east-2 19. us-west-1 20. us-west-2

AWS Region ID [ap-northeast-1]:

The key pair is selected from the key pairs that are registered with Amazon EC2 in the selected Region. Choose the key pair:

Allowed values for EC2 Key Pair Name:

1. your-key-1 2. your-key-2

EC2 Key Pair Name [your-key-1]:

Choose the scheduler to use with your cluster.

Allowed values for Scheduler:

1. slurm 2. awsbatch Scheduler [slurm]:

Choose the operating system.

Allowed values for Operating System:

1. alinux2 2. centos7 3. ubuntu1804 4. ubuntu2004

Operating System [alinux2]:

Choose head node instance type:

Head node instance type [t2.micro]:

Choose the queue configuration. Note: Instance type can't be specified for multiple compute resources in the same queue.

(18)

Configuring AWS ParallelCluster

Number of queues [1]:

Name of queue 1 [queue1]:

Number of compute resources for queue1 [1]: 2

Compute instance type for compute resource 1 in queue1 [t2.micro]:

Maximum instance count [10]:

Compute instance type for compute resource 2 in queue1 [t2.micro]: t3.micro Maximum instance count [10]:

After the previous steps are completed, decide whether to use an existing VPC or let AWS ParallelCluster create a VPC for you. If you don't have a properly configured VPC, AWS

ParallelCluster can create a new one. It either uses both the head and compute nodes in the same public subnet, or only the head node in a public subnet with all nodes in a private subnet. It's possible to reach your quota for the number of VPCs allowed in a Region. The default quota is five VPCs for a Region. For more information about this quota and how to request an increase, see VPC and subnets in the Amazon VPC User Guide.

If you let AWS ParallelCluster create a VPC, you must decide if all nodes should be in a public subnet.

Important

VPCs created by AWS ParallelCluster do not enable VPC Flow Logs by default. VPC Flow Logs enable you to capture information about the IP traffic going to and from network interfaces in your VPCs. For more information, see VPC Flow Logs in the Amazon VPC User Guide.

Automate VPC creation? (y/n) [n]: y Allowed values for Availability Zone:

1. us-east-1a 2. us-east-1b 3. us-east-1c 4. us-east-1d 5. us-east-1e 6. us-east-1f

Availability Zone [us-east-1a]:

Allowed values for Network Configuration:

1. Head node in a public subnet and compute fleet in a private subnet 2. Head node and compute fleet in the same public subnet

Network Configuration [Head node in a public subnet and compute fleet in a private subnet]: 1

Beginning VPC creation. Please do not leave the terminal until the creation is finalized

If you don't create a new VPC, you must select an existing VPC.

If you choose to have AWS ParallelCluster create the VPC, make a note of the VPC ID so you can use the AWS CLI to delete it later.

Automate VPC creation? (y/n) [n]: n Allowed values for VPC ID:

# id name number_of_subnets --- --- --- --- 1 vpc-0b4ad9c4678d3c7ad ParallelClusterVPC-20200118031893 2 2 vpc-0e87c753286f37eef ParallelClusterVPC-20191118233938 5 VPC ID [vpc-0b4ad9c4678d3c7ad]: 1

After the VPC has been selected, decide whether to use existing subnets or create new ones.

Automate Subnet creation? (y/n) [y]: y

Creating CloudFormation stack...

(19)

Configuring AWS ParallelCluster

Do not leave the terminal until the process has finished

AWS Batch

From the list of valid AWS Region identifiers, choose the Region where you want your cluster to run.

NoteThe list of Regions shown is based on the partition of your account. It only includes Regions that are enabled for your account. For more information about enabling Regions for your account, see Managing AWS Regions in the AWS General Reference. The example shown is from the AWS Global partition. If your account is in the AWS GovCloud (US) partition, only Regions in that partition are listed (gov-us-east-1 and gov-us-west-1). Similarly, if your account is in the AWS China partition, only cn-north-1 and cn-northwest-1 are shown. For the complete list of Regions supported by AWS ParallelCluster, see Supported Regions for AWS ParallelCluster version 3 (p. 25).

Allowed values for AWS Region ID:

1. af-south-1 2. ap-east-1 3. ap-northeast-1 4. ap-northeast-2 5. ap-south-1 6. ap-southeast-1 7. ap-southeast-2 8. ca-central-1 9. eu-central-1 10. eu-north-1 11. eu-south-1 12. eu-west-1 13. eu-west-2 14. eu-west-3 15. me-south-1 16. sa-east-1 17. us-east-1 18. us-east-2 19. us-west-1 20. us-west-2

AWS Region ID [us-east-1]:

The key pair is selected from the key pairs registered with Amazon EC2 in the selected Region.

Choose the key pair:

Allowed values for EC2 Key Pair Name:

1. your-key-1 2. your-key-2

EC2 Key Pair Name [your-key-1]:

Choose the scheduler to use with your cluster.

Allowed values for Scheduler:

1. slurm 2. awsbatch

Scheduler [slurm]: 2

When awsbatch is selected as the scheduler, alinux2 is used as the operating system. The head node instance type is entered:

Head node instance type [t2.micro]:

(20)

Configuring AWS ParallelCluster

Choose the queue configuration. The AWS Batch scheduler only contains a single queue. The maximum size of the cluster of compute nodes is entered. This is measured in vCPUs.

Number of queues [1]:

Name of queue 1 [queue1]:

Maximum vCPU [10]:

Decide whether to use existing VPCs or let AWS ParallelCluster create VPCs for you. If you don't have a properly configured VPC, AWS ParallelCluster can create a new one. It either uses both the head and compute nodes in the same public subnet, or only the head node in a public subnet with all nodes in a private subnet. It's possible to reach your quota on the number of VPCs allowed in a Region. The default number of VPCs is five. For more information about this quota and how to request an increase, see VPC and subnets in the Amazon VPC User Guide.

Important

VPCs created by AWS ParallelCluster do not enable VPC Flow Logs by default. VPC Flow Logs enable you to capture information about the IP traffic going to and from network interfaces in your VPCs. For more information, see VPC Flow Logs in the Amazon VPC User Guide.

If you let AWS ParallelCluster create a VPC, make sure that you decide whether all nodes are to be in a public subnet.

Automate VPC creation? (y/n) [n]: y Allowed values for Availability Zone:

1. us-east-1a 2. us-east-1b 3. us-east-1c 4. us-east-1d 5. us-east-1e 6. us-east-1f

Availability Zone [us-east-1a]:

Allowed values for Network Configuration:

1. Head node in a public subnet and compute fleet in a private subnet 2. Head node and compute fleet in the same public subnet

Network Configuration [Head node in a public subnet and compute fleet in a private subnet]: *1*

Beginning VPC creation. Please do not leave the terminal until the creation is finalized

If you don't create a new VPC, you must select an existing VPC.

If you choose to have AWS ParallelCluster create the VPC, make a note of the VPC ID so you can use the AWS CLI or AWS Management Console to delete it later.

Automate VPC creation? (y/n) [n]: n Allowed values for VPC ID:

# id name number_of_subnets --- --- --- --- 1 vpc-0b4ad9c4678d3c7ad ParallelClusterVPC-20200118031893 2 2 vpc-0e87c753286f37eef ParallelClusterVPC-20191118233938 5 VPC ID [vpc-0b4ad9c4678d3c7ad]: 1

After the VPC has been selected, make sure that you decide whether to use existing subnets or create new ones.

Automate Subnet creation? (y/n) [y]: y

Creating CloudFormation stack...

(21)

Configuring AWS ParallelCluster

Do not leave the terminal until the process has finished

When you have completed the preceding steps, a simple cluster launches into a VPC. The VPC uses an existing subnet that supports public IP addresses. The route table for the subnet is 0.0.0.0/0 =>

igw-xxxxxx. Note the following conditions:

• The VPC must have DNS Resolution = yes and DNS Hostnames = yes.

• The VPC must also have DHCP options with the correct domain-name for the Region. The default DHCP Option Set already specifies the required AmazonProvidedDNS. If specifying more than one domain name server, see DHCP options sets in the Amazon VPC User Guide. When using private subnets, use a NAT gateway or an internal proxy to enable web access for compute nodes. For more information, see Network configurations (p. 48).

When all settings contain valid values, you can launch the cluster by running the create command.

$ pcluster create-cluster --cluster-name test-cluster --cluster-configuration cluster- config.yaml

{

"cluster": {

"clusterName": "test-cluster",

"cloudformationStackStatus": "CREATE_IN_PROGRESS",

"cloudformationStackArn": "arn:aws:cloudformation:eu-west-1:xxx:stack/test-cluster/

abcdef0-f678-890a-5abc-021345abcdef", "region": "eu-west-1",

"version": "3.1.2",

"clusterStatus": "CREATE_IN_PROGRESS"

}, "validationMessages": []

}

Follow cluster progress:

$ pcluster describe-cluster --cluster-name test-cluster

or

$ pcluster list-clusters --query 'items[?clusterName==`test-cluster`]'

After the cluster reaches the "clusterStatus": "CREATE_COMPLETE" status, you can connect to it by using your normal SSH client settings. For more information about connecting to Amazon EC2 instances, see the EC2 User Guide in the Amazon EC2 User Guide for Linux Instances. Or you can connect the cluster through

$ pcluster ssh --cluster-name test-cluster -i ~/path/to/keyfile.pem

To delete the cluster, run the following command.

$ pcluster delete-cluster --region us-east-1 --cluster-name test-cluster

After the cluster is deleted, you can delete the network resources in the VPC by deleting the CloudFormation networking stack. The stack's name starts with "parallelclusternetworking-" and contains the creation time in "YYYYMMDDHHMMSS" format. You can list the stacks using the list-stacks command.

$ aws --region us-east-1 cloudformation list-stacks \

(22)

Best practices

--stack-status-filter "CREATE_COMPLETE" \ --query "StackSummaries[].StackName" | \ grep -e "parallelclusternetworking-"

"parallelclusternetworking-pubpriv-20191029205804"

The stack can be deleted using the delete-stack command.

$ aws --region us-east-1 cloudformation delete-stack \

--stack-name parallelclusternetworking-pubpriv-20191029205804

The VPC that pcluster configure (p. 84) creates for you isn't created in the CloudFormation networking stack. You can delete that VPC manually in the console or by using the AWS CLI.

$ aws --region us-east-1 ec2 delete-vpc --vpc-id vpc-0b4ad9c4678d3c7ad

Best practices

Best practices: head node instance type selection

Although the head node doesn't run any job, its functions and its sizing are crucial to the overall performance of the cluster. When choosing the instance type to use for your head node, you want to evaluate the following items:

Cluster size: The head node orchestrates the scaling logic of the cluster and is responsible of attaching new nodes to the scheduler. If you need to scale up and down the cluster of a considerable amount of nodes, then you want to give the head node some extra compute capacity.

Shared file systems: When using shared file systems to share artifacts between compute nodes and the head node, take into account that the head node is the node exposing the NFS server. For this reason, you want to choose an instance type with enough network bandwidth and enough dedicated Amazon EBS bandwidth to handle your workflows.

Best practices: network performance

Network performance is critical to ensuring high performance computing (HPC) applications perform as expected. We recommend these three best practices to optimize your network performance.

Placement group: a cluster placement group is a logical grouping of instances within a single Availability Zone. For more information on placement groups, see placement groups in the Amazon EC2 User Guide for Linux Instances. If you are using Slurm, you can configure each Slurm queue to use a cluster placement group by specifying a PlacementGroup in the queue's Networking (p. 117) settings.

Networking:

PlacementGroup:

Enabled: true

Id: your-placement-group-name

Or let AWS ParallelCluster create a placement group with:

Networking:

PlacementGroup:

Enabled: true

For more information, see Networking (p. 117).

(23)

Moving from AWS ParallelCluster 2.x to 3.x

Enhanced networking: consider choosing an instance type that supports enhanced networking. This applies to all current generation instances. For more information, see enhanced networking on Linux in the Amazon EC2 User Guide for Linux Instances.

Instance bandwidth: the bandwidth scales with instance size, please consider to choose the instance type which better suits your needs, see Amazon EBS–optimized instances and Amazon EBS volume types in the Amazon EC2 User Guide for Linux Instances.

Moving from AWS ParallelCluster 2.x to 3.x

Custom Bootstrap Actions

With AWS ParallelCluster 3, you can specify different custom bootstrap actions scripts for the head node and compute nodes using OnNodeStart (pre_install in AWS ParallelCluster version 2) and OnNodeConfigured (post_install in AWS ParallelCluster version 2) parameters in the HeadNode (p. 103) and Scheduling/SlurmQueues (p. 112) sections. For more information, see Custom Bootstrap Actions (p. 57).

Custom bootstrap actions scripts that are developed for AWS ParallelCluster 2 must be adapted to be used in AWS ParallelCluster 3:

• We don't recommend using /etc/parallelcluster/cfnconfig and cfn_node_type to

differentiate between head and compute nodes. Instead, we recommend that you specify two different scripts in the HeadNode and Scheduling/SlurmQueues sections.

• If you prefer to continue loading /etc/parallelcluster/cfnconfig for use in your bootstrap actions script, note the value of cfn_node_type is changed from "MasterServer" to "HeadNode" (see:

Inclusive language (p. 22)).

• On AWS ParallelCluster 2, the first input argument to bootstrap action scripts was the S3 URL to the script and was reserved. In AWS ParallelCluster 3, only the arguments configured in the configuration are passed to the scripts.

Warning

Using internal variables provided through the /etc/parallelcluster/cfnconfig file isn't officially supported. This file might be removed as part of a future release.

AWS ParallelCluster 2.x and 3.x use different configuration file syntax

AWS ParallelCluster 3.x configuration uses YAML syntax, it's the full reference can be found at Configuration files (p. 101).

In addition to requiring a YAML file format, a number of configuration sections, settings, and parameter values have been updated in AWS ParallelCluster 3.x. In this section, we note key changes to the AWS ParallelCluster configuration along with side-by-side examples illustrating these differences across each version of AWS ParallelCluster.

Example of multiple scheduler queues configuration with hyperthreading enabled and disabled AWS ParallelCluster 2:

[cluster default]

queue_settings = ht-enabled, ht-disabled ...

[queue ht-enabled]

(24)

Moving from AWS ParallelCluster 2.x to 3.x

compute_resource_settings = ht-enabled-i1 disable_hyperthreading = false

[queue ht-disabled]

compute_resource_settings = ht-disabled-i1 disable_hyperthreading = true

[compute_resource ht-enabled-i1]

instance_type = c5n.18xlarge [compute_resource ht-disabled-i1]

instance_type = c5.xlarge

AWS ParallelCluster 3:

...

Scheduling:

Scheduler: slurm SlurmQueues:

- Name: ht-enabled Networking:

SubnetIds:

- compute_subnet_id ComputeResources:

- Name: ht-enabled-i1

DisableSimultaneousMultithreading: true InstanceType: c5n.18xlarge

- Name: ht-disabled Networking:

SubnetIds:

- compute_subnet_id ComputeResources:

- Name: ht-disabled-i1

DisableSimultaneousMultithreading: false InstanceType: c5.xlarge

Example of new FSx for Lustre file-system configuration AWS ParallelCluster 2:

[cluster default]

fsx_settings = fsx ...

[fsx fsx]

shared_dir = /shared-fsx storage_capacity = 1200

imported_file_chunk_size = 1024 import_path = s3://bucket

export_path = s3://bucket/export_dir weekly_maintenance_start_time = 3:02:30 deployment_type = PERSISTENT_1

data_compression_type = LZ4

AWS ParallelCluster 3:

...SharedStorage:

- Name: fsx

MountDir: /shared-fsx StorageType: FsxLustre FsxLustreSettings:

StorageCapacity: 1200

(25)

Moving from AWS ParallelCluster 2.x to 3.x

ImportedFileChunkSize: 1024 ImportPath: s3://bucket

ExportPath: s3://bucket/export_dir WeeklyMaintenanceStartTime: "3:02:30"

DeploymentType: PERSISTENT_1 DataCompressionType: LZ4

Example of a cluster configuration mounting an existing FSx for Lustre file-system AWS ParallelCluster 2:

[cluster default]

fsx_settings = fsx ...

[fsx fsx]

shared_dir = /shared-fsx fsx_fs_id = fsx_fs_id

AWS ParallelCluster 3:

...

SharedStorage:

- Name: fsx

MountDir: /shared-fsx StorageType: FsxLustre FsxLustreSettings:

FileSystemId: fsx_fs_id

Example of a cluster with the Intel HPC Platform Specification software stack AWS ParallelCluster 2:

[cluster default]

enable_intel_hpc_platform = true ...

AWS ParallelCluster 3:

...

AdditionalPackages:

IntelSoftware:

IntelHpcPlatform: true

Notes:

• The installation of Intel HPC Platform Specification software is subject to the terms and conditions of the applicable Intel End User License Agreement

Example of custom IAM configurations including: instance profile, instance role, additional policies for instances and the role for the lambda functions associated to the cluster

AWS ParallelCluster 2:

[cluster default]

additional_iam_policies = arn:aws:iam::aws:policy/

AmazonS3ReadOnlyAccess,arn:aws:iam::aws:policy/AmazonDynamoDBReadOnlyAccess ec2_iam_role = ec2_iam_role

iam_lambda_role = lambda_iam_role

(26)

Moving from AWS ParallelCluster 2.x to 3.x

...

AWS ParallelCluster 3:

...Iam:

Roles:

CustomLambdaResources: lambda_iam_role HeadNode:

...

Iam:

InstanceRole: ec2_iam_role Scheduling:

Scheduler: slurm SlurmQueues:

- Name: queue1 ...

Iam:

InstanceProfile: iam_instance_profile - Name: queue2

...

Iam:

AdditionalIamPolicies:

- Policy: arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess - Policy: arn:aws:iam::aws:policy/AmazonDynamoDBReadOnlyAccess

Notes:

• For AWS ParallelCluster 2 the IAM settings are applied to all the instances of a cluster and additional_iam_policies can't be used in conjunction with ec2_iam_role

• For AWS ParallelCluster 3, you can have different IAM settings for head and compute nodes and even specify different IAM settings for each compute queue.

• For AWS ParallelCluster 3, you can use an IAM instance profile as an alternative to an IAM role.

InstanceProfile, InstanceRole or AdditionalIamPolicies can't be configured together.

Example of custom bootstrap actions AWS ParallelCluster 2:

[cluster default]

s3_read_resource = arn:aws:s3:::bucket_name/*

pre_install = s3://bucket_name/scripts/pre_install.sh pre_install_args = 'R curl wget'

post_install = s3://bucket_name/scripts/post_install.sh post_install_args = "R curl wget"

...

AWS ParallelCluster 3:

...

HeadNode:

...

CustomActions:

OnNodeStart:

Script: s3://bucket_name/scripts/pre_install.sh Args:

- R - curl - wget OnNodeConfigured:

(27)

Moving from AWS ParallelCluster 2.x to 3.x

Script: s3://bucket_name/scripts/post_install.sh Args: ['R', 'curl', 'wget']

Iam:

S3Access:

- BucketName: bucket_name Scheduling:

Scheduler: slurm SlurmQueues:

- Name: queue1 ...

CustomActions:

OnNodeStart:

Script: s3://bucket_name/scripts/pre_install.sh Args: ['R', 'curl', 'wget']

OnNodeConfigured:

Script: s3://bucket_name/scripts/post_install.sh Args: ['R', 'curl', 'wget']

Iam:

S3Access:

- BucketName: bucket_name

Example of a cluster with read and write access to the S3 bucket resources AWS ParallelCluster 2:

[cluster default]

s3_read_resource = arn:aws:s3:::bucket/read_only/*

s3_read_write_resource = arn:aws:s3:::bucket/read_and_write/*

...

AWS ParallelCluster 3:

...HeadNode:

...

Iam:

S3Access:

- BucketName: bucket_name KeyName: read_only/

EnableWriteAccess: False - BucketName: bucket_name KeyName: read_and_write/

EnableWriteAccess: True Scheduling:

Scheduler: slurm SlurmQueues:

- Name: queue1 ...

Iam:

S3Access:

- BucketName: bucket_name KeyName: read_only/

EnableWriteAccess: False - BucketName: bucket_name KeyName: read_and_write/

EnableWriteAccess: True

Inclusive language

AWS ParallelCluster 3 uses the words "head node" in places where "master" was used in AWS ParallelCluster 2. This includes the following:

(28)

Moving from AWS ParallelCluster 2.x to 3.x

• Variable exported in the AWS Batch job environment changed: from MASTER_IP to PCLUSTER_HEAD_NODE_IP.

• All AWS CloudFormation outputs changed from Master* to HeadNode*

• All NodeType and tags changed from Master to HeadNode.

Scheduler Support

AWS ParallelCluster 3.x doesn't support Son of Grid Engine (SGE) and Torque schedulers.

The AWS Batch commands awsbhosts, awsbkill, awsbout, awsbqueues, awsbstat, and awsbsub are distributed as a separate aws-parallelcluster-awsbatch-cli PyPI package. This package is installed by AWS ParallelCluster on the head node. You can still use these AWS Batch commands from the cluster's head node. However, if you wish to use AWS Batch commands from a location other than the head node, you must first install the aws-parallelcluster-awsbatch-cli PyPI package.

AWS ParallelCluster CLI

The AWS ParallelCluster command line interface (CLI) has been changed. The new syntax is described in AWS ParallelCluster CLI commands (p. 81). The output format for the CLI is a JSON string.

Configuring a new cluster

The pcluster configure command includes different parameters in AWS ParallelCluster 3 as compared to AWS ParallelCluster 2. For more information, see pcluster configure (p. 84).

Note also that the configuration file syntax has changed from AWS ParallelCluster 2. For a full reference of the cluster configuration settings, see Cluster configuration file (p. 101).

Creating a new cluster

AWS ParallelCluster 2's pcluster create command has been replaced by the pcluster create- cluster (p. 84) command.

Note the default behavior in AWS ParallelCluster 2.x, without the -nw option, is to wait on cluster creation events, while AWS ParallelCluster 3.x command returns immediately. The progress of the cluster creation can be monitored using pcluster describe-cluster (p. 96)

An AWS ParallelCluster 3 configuration file contains a single cluster definition, so the -t parameter is no more needed.

The following is an example configuration file.

# AWS ParallelCluster v2

$ pcluster create \ -r REGION \

-c V2_CONFIG_FILE \ -nw \

-t CLUSTER_TEMPLATE \ CLUSTER_NAME

# AWS ParallelCluster v3

$ pcluster create-cluster \ --region REGION \

--cluster-configuration V3_CONFIG_FILE \ --cluster-name CLUSTER_NAME

Listing clusters

The pcluster list AWS ParallelCluster 2.x command must be replaced with pcluster list- clusters (p. 89) command.

(29)

Moving from AWS ParallelCluster 2.x to 3.x

Note: You need AWS ParallelCluster v2 CLI to list clusters created with 2.x versions of AWS ParallelCluster. See Install AWS ParallelCluster in a virtual environment (recommended) (p. 3) for how to install multiple versions of AWS ParallelCluster using virtual environments.

# AWS ParallelCluster v2

$ pcluster list -r REGION

# AWS ParallelCluster v3

$ pcluster list-clusters --region REGION

Starting and Stopping a cluster

The pcluster start and pcluster stop AWS ParallelCluster 2.x commands must be replaced with pcluster update-compute-fleet (p. 92) commands.

Starting a compute fleet:

# AWS ParallelCluster v2

$ pcluster start \ -r REGION \ CLUSTER_NAME

# AWS ParallelCluster v3 - Slurm fleets

$ pcluster update-compute-fleet \ --region REGION \

--cluster-name CLUSTER_NAME \ --status START_REQUESTED

# AWS ParallelCluster v3 - AWS Batch fleets

$ pcluster update-compute-fleet \ --region REGION \

--cluster-name CLUSTER_NAME \ --status ENABLED

Stopping a compute fleet:

# AWS ParallelCluster v2

$ pcluster stop \ -r REGION \ CLUSTER_NAME

# AWS ParallelCluster v3 - Slurm fleets

$ pcluster update-compute-fleet \ --region REGION \

--cluster-name CLUSTER_NAME \ --status STOP_REQUESTED

# AWS ParallelCluster v3 - AWS Batch fleets

$ pcluster update-compute-fleet \ --region REGION \

--cluster-name CLUSTER_NAME \ --status DISABLED

Connecting to a cluster

The pcluster ssh AWS ParallelCluster 2.x command has different parameters names in AWS ParallelCluster 3.x. See pcluster ssh (p. 91)

Connecting to a cluster:

# AWS ParallelCluster v2

(30)

Supported Regions for AWS ParallelCluster version 3

$ pcluster ssh \ -r REGION \ CLUSTER_NAME \ -i ~/.ssh/id_rsa

# AWS ParallelCluster v3

$ pcluster ssh \ --region REGION \

--cluster-name CLUSTER_NAME \ -i ~/.ssh/id_rsa

IMDS configuration update

Starting with version 3.0.0, AWS ParallelCluster introduced support for restricting access to the head node’s IMDS (and the instance profile credentials) to a subset of superusers, by default. For more information, see Imds Properties (p. 111).

Supported Regions for AWS ParallelCluster version 3

AWS ParallelCluster version 3 is available in the following AWS Regions:

Region Name Region

US East (Ohio) us-east-2

US East (N. Virginia) us-east-1

US West (N. California) us-west-1

US West (Oregon) us-west-2

Africa (Cape Town) af-south-1

Asia Pacific (Hong Kong) ap-east-1

Asia Pacific (Mumbai) ap-south-1

Asia Pacific (Seoul) ap-northeast-2

Asia Pacific (Singapore) ap-southeast-1

Asia Pacific (Sydney) ap-southeast-2

Asia Pacific (Tokyo) ap-northeast-1

Canada (Central) ca-central-1

China (Beijing) cn-north-1

China (Ningxia) cn-northwest-1

Europe (Frankfurt) eu-central-1

Europe (Ireland) eu-west-1

Europe (London) eu-west-2

Europe (Milan) eu-south-1

Europe (Paris) eu-west-3

(31)

Using AWS ParallelCluster

Region Name Region

Europe (Stockholm) eu-north-1

Middle East (Bahrain) me-south-1

South America (São Paulo) sa-east-1

AWS GovCloud (US-East) us-gov-east-1

AWS GovCloud (US-West) us-gov-west-1

Using AWS ParallelCluster

Topics

• AWS Identity and Access Management roles in AWS ParallelCluster 3.x (p. 26)

• Network configurations (p. 48)

• Custom Bootstrap Actions (p. 57)

• Schedulers supported by AWS ParallelCluster (p. 59)

• Configuration of Multiple Queues (p. 71)

• AWS ParallelCluster API (p. 72)

• Connect to the head node through NICE DCV (p. 77)

• Using pcluster update-cluster (p. 78)

AWS Identity and Access Management roles in AWS ParallelCluster 3.x

AWS ParallelCluster uses AWS Identity and Access Management (IAM) roles to control permissions that are associated with the AWS resources deployed to the AWS account. In AWS ParallelCluster we can identify two types of IAM roles: the one that is assumed by the user that invokes the CLI commands and the ones that are associated with AWS ParallelCluster resources, such as the EC2 instances launched in a cluster.

By default, AWS ParallelCluster takes care of creating all needed IAM roles that are configured with the minimal set of policies required by AWS ParallelCluster resources. However, the user that invokes the various AWS ParallelCluster operations must have the right level of permissions to create or modify all of the necessary resources.

Topics

• Using existing IAM roles with AWS ParallelCluster (p. 26)

• AWS ParallelCluster example user policies (p. 27)

• AWS ParallelCluster parameters to control IAM permissions (p. 39)

Using existing IAM roles with AWS ParallelCluster

You can use existing IAM roles when creating a cluster or building a custom EC2 image. Typically, you choose existing IAM roles to fully control the permissions that are granted to AWS ParallelCluster

resources and to the users of the cluster. The following examples show the IAM policies and roles that are

(32)

AWS Identity and Access Management roles in AWS ParallelCluster 3.x

required to both invoke AWS ParallelCluster features and customize permissions associated with cluster EC2 instances.

In the policies, replace <REGION>, <AWS ACCOUNT ID>, and similar strings with the appropriate values.

AWS ParallelCluster example user policies

The AWS ParallelCluster user role refers to the IAM role assumed by the user of the AWS ParallelCluster CLI.

The following example policies include Amazon Resource Names (ARNs) for the resources. If you're working in the AWS GovCloud (US) or AWS China partitions, the ARNs must be changed. Specifically, they must be changed from "arn:aws" to "arn:aws-us-gov" for the AWS GovCloud (US) partition or "arn:aws- cn" for the AWS China partition. For more information, see Amazon Resource Names (ARNs) in AWS GovCloud (US) Regions in the AWS GovCloud (US) User Guide and ARNs for AWS services in China in Getting Started with AWS services in China.

Topics

• Base user policy required to invoke AWS ParallelCluster features (p. 27)

• Additional user policy when using AWS Batch scheduler (p. 31)

• User Policy to use AWS ParallelCluster image build features (p. 32)

• User Policy to manage IAM resources (p. 35)

Base user policy required to invoke AWS ParallelCluster features

The following policy shows the permissions required to execute AWS ParallelCluster commands.

{

"Version": "2012-10-17", "Statement": [

{

"Action": [

"ec2:Describe*"

],

"Resource": "*", "Effect": "Allow", "Sid": "EC2Read"

}, {

"Action": [

"ec2:AllocateAddress", "ec2:AssociateAddress", "ec2:AttachNetworkInterface", "ec2:AuthorizeSecurityGroupEgress", "ec2:AuthorizeSecurityGroupIngress", "ec2:CreateLaunchTemplate",

"ec2:CreateLaunchTemplateVersion", "ec2:CreateNetworkInterface", "ec2:CreatePlacementGroup", "ec2:CreateSecurityGroup", "ec2:CreateSnapshot", "ec2:CreateTags", "ec2:CreateVolume",

"ec2:DeleteLaunchTemplate", "ec2:DeleteNetworkInterface", "ec2:DeletePlacementGroup", "ec2:DeleteSecurityGroup", "ec2:DeleteVolume", "ec2:DisassociateAddress", "ec2:ModifyLaunchTemplate",

(33)

AWS Identity and Access Management roles in AWS ParallelCluster 3.x "ec2:ModifyNetworkInterfaceAttribute", "ec2:ModifyVolume",

"ec2:ModifyVolumeAttribute", "ec2:ReleaseAddress",

"ec2:RevokeSecurityGroupEgress", "ec2:RevokeSecurityGroupIngress", "ec2:RunInstances",

"ec2:TerminateInstances"

],

"Resource": "*", "Effect": "Allow", "Sid": "EC2Write"

}, {

"Action": [

"dynamodb:DescribeTable", "dynamodb:ListTagsOfResource", "dynamodb:CreateTable",

"dynamodb:DeleteTable", "dynamodb:GetItem", "dynamodb:PutItem", "dynamodb:Query", "dynamodb:TagResource"

],

"Resource": "arn:aws:dynamodb:*:<AWS ACCOUNT ID>:table/parallelcluster-*", "Effect": "Allow",

"Sid": "DynamoDB"

}, {

"Action": [

"route53:ChangeResourceRecordSets", "route53:ChangeTagsForResource", "route53:CreateHostedZone", "route53:DeleteHostedZone", "route53:GetChange", "route53:GetHostedZone",

"route53:ListResourceRecordSets", "route53:ListQueryLoggingConfigs"

],

"Resource": "*", "Effect": "Allow",

"Sid": "Route53HostedZones"

}, {

"Action": [

"cloudformation:*"

],

"Resource": "*", "Effect": "Allow", "Sid": "CloudFormation"

}, {

"Action": [

"cloudwatch:PutDashboard", "cloudwatch:ListDashboards", "cloudwatch:DeleteDashboards", "cloudwatch:GetDashboard"

],

"Resource": "*", "Effect": "Allow", "Sid": "CloudWatch"

}, {

"Action": [

"iam:GetRole", "iam:GetRolePolicy",

(34)

AWS Identity and Access Management roles in AWS ParallelCluster 3.x "iam:GetPolicy",

"iam:SimulatePrincipalPolicy", "iam:GetInstanceProfile"

],

"Resource": [

"arn:aws:iam::<AWS ACCOUNT ID>:role/*", "arn:aws:iam::<AWS ACCOUNT ID>:policy/*", "arn:aws:iam::aws:policy/*",

"arn:aws:iam::<AWS ACCOUNT ID>:instance-profile/*"

],

"Effect": "Allow", "Sid": "IamRead"

}, {

"Action": [

"iam:CreateInstanceProfile", "iam:DeleteInstanceProfile", "iam:AddRoleToInstanceProfile", "iam:RemoveRoleFromInstanceProfile"

],

"Resource": [

"arn:aws:iam::<AWS ACCOUNT ID>:instance-profile/parallelcluster/*"

],

"Effect": "Allow",

"Sid": "IamInstanceProfile"

}, {

"Condition": {

"StringEqualsIfExists": { "iam:PassedToService": [ "lambda.amazonaws.com", "ec2.amazonaws.com", "spotfleet.amazonaws.com"

] } },

"Action": [

"iam:PassRole"

],

"Resource": [

"arn:aws:iam::<AWS ACCOUNT ID>:role/parallelcluster/*"

],

"Effect": "Allow", "Sid": "IamPassRole"

}, {

"Condition": { "StringEquals": {

"iam:AWSServiceName": [ "fsx.amazonaws.com",

"s3.data-source.lustre.fsx.amazonaws.com"

] } },

"Action": [

"iam:CreateServiceLinkedRole", "iam:DeleteServiceLinkedRole"

],

"Resource": "*", "Effect": "Allow"

}, {

"Action": [

"lambda:CreateFunction", "lambda:DeleteFunction",

"lambda:GetFunctionConfiguration",

(35)

AWS Identity and Access Management roles in AWS ParallelCluster 3.x "lambda:GetFunction",

"lambda:InvokeFunction", "lambda:AddPermission", "lambda:RemovePermission",

"lambda:UpdateFunctionConfiguration"

],

"Resource": [

"arn:aws:lambda:*:<AWS ACCOUNT ID>:function:parallelcluster-*", "arn:aws:lambda:*:<AWS ACCOUNT ID>:function:pcluster-*"

],

"Effect": "Allow", "Sid": "Lambda"

}, {

"Action": [ "s3:*"

],

"Resource": [

"arn:aws:s3:::parallelcluster-*", "arn:aws:s3:::aws-parallelcluster-*"

],

"Effect": "Allow",

"Sid": "S3ResourcesBucket"

}, {

"Action": [ "s3:Get*", "s3:List*"

],

"Resource": "arn:aws:s3:::*-aws-parallelcluster*", "Effect": "Allow",

"Sid": "S3ParallelClusterReadOnly"

}, {

"Action": [ "fsx:*"

],

"Resource": [

"arn:aws:fsx:*:<AWS ACCOUNT ID>:*"

],

"Effect": "Allow", "Sid": "FSx"

}, {

"Action": [

"elasticfilesystem:*"

],

"Resource": [

"arn:aws:elasticfilesystem:*:<AWS ACCOUNT ID>:*"

],

"Effect": "Allow", "Sid": "EFS"

}, {

"Action": [

"logs:DeleteLogGroup", "logs:PutRetentionPolicy", "logs:DescribeLogGroups", "logs:CreateLogGroup", "logs:FilterLogEvents", "logs:GetLogEvents", "logs:CreateExportTask", "logs:DescribeLogStreams", "logs:DescribeExportTasks"

],

"Resource": "*",

參考文獻

相關文件

Thus when we implemented the advanced version, we didn’t really have much trouble caused by being not familiar with the environment, and therefore we can focus ourselves on

多組樣本重複測量分析方法 多組樣本重複測量分析方法 Repeated measures ANOVA Repeated measures ANOVA..

If you see difficult sentences/ a difficult sentence or have (any) questions / a question, going over/through (=browsing) the article(s) again.. can/may help you

 If I buy a call option from you, I am paying you a certain amount of money in return for the right to force you to sell me a share of the stock, if I want it, at the strike price,

Your problem may be modest, but if it challenges your curiosity and brings into play your inventive faculties, and if you solve it by your own means, you may experience the tension

(a) In your group, discuss what impact the social issues in Learning Activity 1 (and any other socials issues you can think of) have on the world, Hong Kong and you.. Choose the

If w e sell you land, you m ust rem em ber that it is sacred, and you m ust teach your children that it is sacred and that each ghostly reflection in the clear w ater of the lakes tells

可以設定遊戲音 效以及是否離開