• 沒有找到結果。

Deep Learning AMI

N/A
N/A
Protected

Academic year: 2022

Share "Deep Learning AMI"

Copied!
153
0
0

加載中.... (立即查看全文)

全文

(1)

Deep Learning AMI

Developer Guide

(2)

Deep Learning AMI: Developer Guide

Copyright © Amazon Web Services, Inc. and/or its affiliates. All rights reserved.

Amazon's trademarks and trade dress may not be used in connection with any product or service that is not Amazon's, in any manner that is likely to cause confusion among customers, or in any manner that disparages or discredits Amazon. All other trademarks not owned by Amazon are the property of their respective owners, who may or may not be affiliated with, connected to, or sponsored by Amazon.

(3)

Table of Contents

What Is the AWS Deep Learning AMI? ... 1

About This Guide ... 1

Prerequisites ... 1

Example Uses ... 1

Features ... 2

Preinstalled Frameworks ... 2

Preinstalled GPU Software ... 2

Elastic Inference Support ... 3

Model Serving and Visualization ... 2

Getting Started ... 4

How to Get Started with the DLAMI ... 4

DLAMI Selection ... 4

CUDA ... 5

Base ... 6

Conda ... 6

Architecture ... 8

OS ... 8

AMI Options ... 8

Conda ... 9

Base ... 10

CUDA 10.2 ... 10

CUDA 10.1 ... 11

CUDA 10 ... 11

Ubuntu 18.04 ... 12

Ubuntu 16.04 ... 12

Amazon Linux ... 13

Amazon Linux 2 ... 14

Instance Selection ... 15

Pricing ... 16

Region Availability ... 16

GPU ... 16

CPU ... 17

Inferentia ... 17

Habana ... 18

Launching a DLAMI ... 19

Step 1: Launch a DLAMI ... 19

DLAMI ID ... 20

EC2 Console ... 20

Marketplace Search ... 21

Step 2: Connect to the DLAMI ... 21

Step 3: Secure Your DLAMI Instance ... 21

Step 4: Test Your DLAMI ... 22

Clean Up ... 22

Jupyter Setup ... 22

Secure Jupyter ... 23

Start Server ... 23

Configure Client ... 24

Log in to the Jupyter notebook server ... 25

Using a DLAMI ... 29

Conda DLAMI ... 29

Introduction to the Deep Learning AMI with Conda ... 29

Log in to Your DLAMI ... 29

Start the TensorFlow Environment ... 30

Switch to the PyTorch Python 3 Environment ... 31

(4)

Switch to the MXNet Python 3 Environment ... 31

Removing Environments ... 32

Base DLAMI ... 32

Using the Deep Learning Base AMI ... 32

Configuring CUDA Versions ... 33

Jupyter Notebooks ... 33

Navigating the Installed Tutorials ... 34

Switching Environments with Jupyter ... 34

Tutorials ... 34

10 Minute Tutorials ... 35

Activating Frameworks ... 35

Debugging and Visualization ... 49

Distributed Training ... 52

Elastic Fabric Adapter ... 69

GPU Monitoring and Optimization ... 81

AWS Inferentia ... 87

Graviton DLAMI ... 102

Habana DLAMI ... 109

Inference ... 110

Using Frameworks with ONNX ... 114

Model Serving ... 122

Upgrading Your DLAMI ... 129

DLAMI Upgrade ... 129

Software Updates ... 129

Security ... 131

Data Protection ... 131

Identity and Access Management ... 132

Authenticating With Identities ... 132

Managing Access Using Policies ... 134

IAM with Amazon EMR ... 136

Logging and Monitoring ... 136

Usage Tracking ... 136

Compliance Validation ... 136

Resilience ... 137

Infrastructure Security ... 137

Related Information ... 138

Forums ... 138

Blogs ... 138

FAQ ... 138

Release Notes for DLAMI ... 141

Single-framework DLAMI ... 141

Multi-framework DLAMI ... 141

GPU DLAMI ... 142

Habana DLAMI ... 142

Base DLAMI ... 142

Deprecations for DLAMI ... 144

Document History ... 146

AWS glossary ... 149

(5)

About This Guide

What Is the AWS Deep Learning AMI?

Welcome to the User Guide for the AWS Deep Learning AMI.

The AWS Deep Learning AMI (DLAMI) is your one-stop shop for deep learning in the cloud. This customized machine instance is available in most Amazon EC2 regions for a variety of instance types, from a small CPU-only instance to the latest high-powered multi-GPU instances. It comes preconfigured with NVIDIA CUDA and NVIDIA cuDNN, as well as the latest releases of the most popular deep learning frameworks.

About This Guide

This guide will help you launch and use the DLAMI. It covers several use cases that are common for deep learning, for both training and inference. Choosing the right AMI for your purpose and the kind of instances you may prefer is also covered. The DLAMI comes with several tutorials for each of the frameworks. It also has tutorials on distributed training, debugging, using AWS Inferentia, and other key concepts. You will find instructions on how to configure Jupyter to run the tutorials in your browser.

Prerequisites

You should be familiar with command line tools and basic Python to successfully run the DLAMI.

Tutorials on how to use each framework are provided by the frameworks themselves, however, this guide can show you how to activate each one and find the appropriate tutorials to get started.

Example DLAMI Uses

Learning about deep learning: The DLAMI is a great choice for learning or teaching machine learning and deep learning frameworks. It takes the headache away from troubleshooting the installations of each framework and getting them to play along on the same computer. The DLAMI comes with a Jupyter notebook and makes it easy to run the tutorials provided by the frameworks for people new to machine learning and deep learning.

App development: If you're an app developer and are interested in using deep learning to make your apps utilize the latest advances in AI, the DLAMI is the perfect test bed for you. Each framework comes with tutorials on how to get started with deep learning, and many of them have model zoos that make it easy to try out deep learning without having to create the neural networks yourself or to do any of the model training. Some examples show you how to build an image detection application in just a few minutes, or how to build a speech recognition app for your own chatbot.

Machine learning and data analytics: If you're a data scientist or interested in processing your data with deep learning, you'll find that many of the frameworks have support for R and Spark. You will find tutorials on how to do simple regressions, all the way up to building scalable data processing systems for personalization and predictions systems.

Research: If you're a researcher and want to try out a new framework, test out a new model, or train new models, the DLAMI and AWS capabilities for scale can alleviate the pain of tedious installations and management of multiple training nodes. You can use EMR and AWS CloudFormation templates to easily launch a full cluster of instances that are ready to go for scalable training.

(6)

Features

NoteWhile your initial choice might be to upgrade your instance type up to a larger instance with more GPUs (up to 8), you can also scale horizontally by creating a cluster of DLAMI instances. To quickly set up a cluster, you can use the predefined AWS CloudFormation template. Check out Related Information (p. 138) for more information on cluster builds.

Features of the DLAMI

Preinstalled Frameworks

There are currently two primary flavors of the DLAMI with other variations related to the operating system (OS) and software versions:

• Deep Learning AMI with Conda (p. 6) - frameworks installed separately using conda packages and separate Python environments

• Deep Learning Base AMI (p. 6) - no frameworks installed; only NVIDIA CUDA and other dependencies

The Deep Learning AMI with Conda uses Anaconda environments to isolate each framework, so you can switch between them at will and not worry about their dependencies conflicting.

For more information on selecting the best DLAMI for you, take a look at Getting Started (p. 4).

This is the full list of supported frameworks by Deep Learning AMI with Conda:

• Apache MXNet (Incubating)

• Chainer

• Keras

• PyTorch

• TensorFlow

• TensorFlow 2

NoteWe no longer include the CNTK, Caffe, Caffe2 and Theano Conda environments in the AWS Deep Learning AMI starting with the v28 release. Previous releases of the AWS Deep Learning AMI that contain these environments will continue to be available. However, we will only provide updates to these environments if there are security fixes published by the open source community for these frameworks.

Preinstalled GPU Software

Even if you use a CPU-only instance, the DLAMI will have NVIDIA CUDA and NVIDIA cuDNN. The installed software is the same regardless of the instance type. Keep in mind that GPU-specific tools only work on an instance that has at least one GPU. More information on this is covered in the Selecting the Instance Type for DLAMI (p. 15).

• Latest version of NVIDIA CUDA

• Latest version of NVIDIA cuDNN

• Older versions of CUDA are available as well. See the CUDA Installations and Framework Bindings (p. 5) for more information.

(7)

Elastic Inference Support

Elastic Inference Support

The Deep Learning AMI with Conda comes with environments that support Elastic Inference for both AWS Deep Learning AMI with Ubuntu 16.04 Options (p. 12) and AWS Deep Learning AMI Amazon Linux Options (p. 13). Elastic Inference environments are not currently supported for AWS Deep Learning AMI with Ubuntu 18.04 Options (p. 12) and AWS Deep Learning AMI Amazon Linux 2 Options (p. 14). For tutorials and more information on Elastic Inference, see the Elastic Inference Documentation.

Model Serving and Visualization

Deep Learning AMI with Conda comes preinstalled with two kinds of model servers, one for MXNet and one for TensorFlow, as well as TensorBoard, for model visualizations.

• Model Server for Apache MXNet (MMS) (p. 123)

• TensorFlow Serving (p. 125)

• TensorBoard (p. 51)

(8)

How to Get Started with the DLAMI

Getting Started

How to Get Started with the DLAMI

This guide includes tips about picking the DLAMI that's right for you, selecting an instance type that fits your use case and budget, and Related Information (p. 138) that describes custom setups that may be of interest.

If you're new to using AWS or using Amazon EC2, start with the Deep Learning AMI with Conda (p. 6).

If you're familiar with Amazon EC2 and other AWS services like Amazon EMR, Amazon EFS, or Amazon S3, and are interested in integrating those services for projects that need distributed training or inference, then check out Related Information (p. 138) to see if one fits your use case.

We recommend that you check out Choosing Your DLAMI (p. 4) to get an idea of which instance type might be best for your application.

Another option is this quick tutorial: Launch a AWS Deep Learning AMI (in 10 minutes).

Next Step

Choosing Your DLAMI (p. 4)

Choosing Your DLAMI

We offer a range of DLAMI options. To help you select the correct DLAMI for your use case, we group images by the hardware type or functionality for which they were developed. Our top level groupings are:

DLAMI Type: CUDA versus Base versus Single-Framework versus Multi-Framework (Conda DLAMI)

Compute Architecture: x86-based versus Arm-based AWS Graviton

Processor Type: GPU versus CPU versus Inferentia versus Habana

SDK: CUDA versus AWS Neuron versus SynapsesAI

OS: Amazon Linux versus Ubuntu

The rest of the topics in this guide help further inform you and go into more details.

Topics

• CUDA Installations and Framework Bindings (p. 5)

• Deep Learning Base AMI (p. 6)

• Deep Learning AMI with Conda (p. 6)

• DLAMI CPU Architecture Options (p. 8)

• DLAMI Operating System Options (p. 8)

Next Up

(9)

CUDA

Deep Learning AMI with Conda (p. 6)

CUDA Installations and Framework Bindings

While deep learning is all pretty cutting edge, each framework offers "stable" versions. These stable versions may not work with the latest CUDA or cuDNN implementation and features. Your use case and the features you require can help you choose a framework. If you are not sure, then use the latest Deep Learning AMI with Conda. It has official pip binaries for all frameworks with CUDA 10, using whichever most recent version is supported by each framework. If you want the latest versions, and to customize your deep learning environment, use the Deep Learning Base AMI.

Look at our guide on Stable Versus Release Candidates (p. 6) for further guidance.

Choose a DLAMI with CUDA

The Deep Learning Base AMI (p. 6) has CUDA 10, 10.1, and 10.2.

The Deep Learning AMI with Conda (p. 6) has CUDA 10, 10.1, and 10.2.

• CUDA 10.1 with cuDNN 7: Apache MXNet (Incubating), PyTorch, and TensorFlow 2

• CUDA 10 with cuDNN 7: TensorFlow and Chainer

NoteWe no longer include the CNTK, Caffe, Caffe2, and Theano Conda environments in the AWS Deep Learning AMI starting with the v28 release. Previous releases of the AWS Deep Learning AMI that contain these environments continue to be available. However, we only provide updates to these environments if there are security fixes published by the open-source community for these frameworks.

For installation options for DLAMI types and operating systems, refer to each of the CUDA version and DLAMI options pages:

• Deep Learning AMI with CUDA 10.2 Options (p. 10)

• Deep Learning AMI with CUDA 10.1 Options (p. 11)

• Deep Learning AMI with CUDA 10 Options (p. 11)

• Deep Learning AMI with Conda Options (p. 9)

• Deep Learning Base AMI Options (p. 10)

For specific framework version numbers, see the Release Notes for DLAMI (p. 141)

Choose this DLAMI type or learn more about the different DLAMIs with the Next Up option.

Choose one of the CUDA versions and review the full list of DLAMIs that have that version in the Appendix, or learn more about the different DLAMIs with the Next Up option.

Next Up

Deep Learning Base AMI (p. 6)

Related Topics

• For instructions on switching between CUDA versions, refer to the Using the Deep Learning Base AMI (p. 32) tutorial.

(10)

Base

Deep Learning Base AMI

The Deep Learning Base AMI is like an empty canvas for deep learning. It comes with everything you need up until the point of the installation of a particular framework, and has your choice of CUDA versions.

Why to Choose the Base DLAMI

This AMI group is useful for project contributors who want to fork a deep learning project and build the latest. It's for someone who wants to roll their own environment with the confidence that the latest NVIDIA software is installed and working so they can focus on picking which frameworks and versions they want to install.

Choose this DLAMI type or learn more about the different DLAMIs with the Next Up option.

• Deep Learning Base AMI Options (p. 10)

Next Up

Deep Learning AMI with Conda (p. 6)

Related Topics

• Using the Deep Learning Base AMI (p. 32)

Deep Learning AMI with Conda

The Conda DLAMI uses Anaconda virtual environments. These environments are configured to keep the different framework installations separate and streamline switching between frameworks. This is great for learning and experimenting with all of the frameworks the DLAMI has to offer. Most users find that the new Deep Learning AMI with Conda is perfect for them.

These AMIs are the primary DLAMIs. They are updated often with the latest versions from the frameworks, and have the latest GPU drivers and software. They are generally referred to as the AWS Deep Learning AMI in most documents.

• The Ubuntu 18.04 DLAMI has the following frameworks: Apache MXNet (Incubating), Chainer, PyTorch, TensorFlow, and TensorFlow 2.

• The Ubuntu 16.04 and Amazon Linux DLAMI has the following frameworks: Apache MXNet (Incubating), Chainer, Keras, PyTorch, TensorFlow, and TensorFlow 2.

• The Amazon Linux 2 DLAMI has the following frameworks: Apache MXNet (Incubating), Chainer, PyTorch, TensorFlow, TensorFlow 2, and Keras.

NoteWe no longer include the CNTK, Caffe, Caffe2, and Theano Conda environments in the AWS Deep Learning AMI starting with the v28 release. Previous releases of the AWS Deep Learning AMI that contain these environments continue to be available. However, we only provide updates to these environments if there are security fixes published by the open-source community for these frameworks.

Stable Versus Release Candidates

The Conda AMIs use optimized binaries of the most recent formal releases from each framework.

Release candidates and experimental features are not to be expected. The optimizations depend on the

(11)

Conda

framework's support for acceleration technologies like Intel's MKL DNN, which speeds up training and inference on C5 and C4 CPU instance types. The binaries are also compiled to support advanced Intel instruction sets including but not limited to AVX, AVX-2, SSE4.1, and SSE4.2. These accelerate vector and floating point operations on Intel CPU architectures. Additionally, for GPU instance types, the CUDA and cuDNN are updated with whichever version the latest official release supports.

The Deep Learning AMI with Conda automatically installs the most optimized version of the framework for your Amazon EC2 instance upon the framework's first activation. For more information, refer to Using the Deep Learning AMI with Conda (p. 29).

If you want to install from source, using custom or optimized build options, the Deep Learning Base AMI (p. 6)s might be a better option for you.

Python 2 Deprecation

The Python open source community has officially ended support for Python 2 on January 1, 2020. The TensorFlow and PyTorch community have announced that the TensorFlow 2.1 and PyTorch 1.4 releases are the last ones supporting Python 2. Previous releases of the DLAMI (v26, v25, etc) that contain Python 2 Conda environments continue to be available. However, we provide updates to the Python 2 Conda environments on previously published DLAMI versions only if there are security fixes published by the open-source community for those versions. DLAMI releases with the latest versions of the TensorFlow and PyTorch frameworks do not contain the Python 2 Conda environments.

Elastic Inference Support

The Deep Learning AMI with Conda comes with environments that support Elastic Inference for both AWS Deep Learning AMI with Ubuntu 16.04 Options (p. 12) and AWS Deep Learning AMI Amazon Linux Options (p. 13). Elastic Inference environments are not currently supported for AWS Deep Learning AMI with Ubuntu 18.04 Options (p. 12) and AWS Deep Learning AMI Amazon Linux 2 Options (p. 14). For tutorials and more information about Elastic Inference, see the Elastic Inference Documentation.

CUDA Support

The Deep Learning AMI with Conda's CUDA version and the frameworks supported for each:

• CUDA 10.1 with cuDNN 7: Apache MXNet

• CUDA 10 with cuDNN 7: PyTorch, TensorFlow, TensorFlow 2, Apache MXNet, Chainer

Specific framework version numbers can be found in the Release Notes for DLAMI (p. 141) Choose this DLAMI type or learn more about the different DLAMIs with the Next Up option.

• Deep Learning AMI with Conda Options (p. 9)

Next Up

DLAMI CPU Architecture Options (p. 8)

Related Topics

• For a tutorial on using a Deep Learning AMI with Conda, see the Using the Deep Learning AMI with Conda (p. 29) tutorial.

(12)

Architecture

DLAMI CPU Architecture Options

AWS Deep Learning AMIs are offered with either x86-based or Arm-based AWS Graviton2 CPU architectures.

Choose one of the Graviton GPU DLAMIs to work with an Arm-based CPU architecture. All other GPU DLAMIs are currently x86-based.

• AWS Deep Learning AMI Graviton GPU CUDA 11.4 (Ubuntu 20.04)

• AWS Deep Learning AMI Graviton GPU TensorFlow 2.6 (Ubuntu 20.04)

• AWS Deep Learning AMI Graviton GPU PyTorch 1.10 (Ubuntu 20.04)

For information about getting started with the Graviton GPU DLAMI, see The Graviton DLAMI (p. 102).

For more details on available instance types, see Selecting the Instance Type for DLAMI (p. 15).

Next Up

DLAMI Operating System Options (p. 8)

DLAMI Operating System Options

DLAMIs are offered in the following operating systems. If you're more familiar with CentOS or RedHat, see AWS Deep Learning AMI Amazon Linux Options (p. 13) or AWS Deep Learning AMI Amazon Linux 2 Options (p. 14). Otherwise, see AWS Deep Learning AMI with Ubuntu 18.04 Options (p. 12) or AWS Deep Learning AMI with Ubuntu 16.04 Options (p. 12).

Choose one of the operating systems and review their full list in the Appendix, or see the next steps for picking your AMI and instance type.

• AWS Deep Learning AMI Amazon Linux Options (p. 13)

• AWS Deep Learning AMI Amazon Linux 2 Options (p. 14)

• AWS Deep Learning AMI with Ubuntu 18.04 Options (p. 12)

• AWS Deep Learning AMI with Ubuntu 16.04 Options (p. 12)

As mentioned in the Getting Started overview, you have multiple options for accessing a DLAMI. Before choosing a DLAMI, assess what instance type you need and identify your AWS Region.

Next Up

Selecting the Instance Type for DLAMI (p. 15)

AMI Options

The following topics describe the categories of AWS Deep Learning AMIs.

Topics

• Deep Learning AMI with Conda Options (p. 9)

• Deep Learning Base AMI Options (p. 10)

• Deep Learning AMI with CUDA 10.2 Options (p. 10)

(13)

Conda

• Deep Learning AMI with CUDA 10.1 Options (p. 11)

• Deep Learning AMI with CUDA 10 Options (p. 11)

• AWS Deep Learning AMI with Ubuntu 18.04 Options (p. 12)

• AWS Deep Learning AMI with Ubuntu 16.04 Options (p. 12)

• AWS Deep Learning AMI Amazon Linux Options (p. 13)

• AWS Deep Learning AMI Amazon Linux 2 Options (p. 14)

Deep Learning AMI with Conda Options

Use the Launching and Configuring a DLAMI (p. 19) guide to continue with one of these DLAMI.

• Deep Learning AMI (Ubuntu 18.04) on the AWS Marketplace

• Deep Learning AMI (Ubuntu 16.04) on the AWS Marketplace

• Deep Learning AMI (Amazon Linux) on the AWS Marketplace

• Deep Learning AMI (Amazon Linux 2) on the AWS Marketplace

These DLAMIs are available in these Regions:

Region Code

US East (Ohio) us-east-2

US East (N. Virginia) us-east-1

GovCloud us-gov-west-1

US West (N. California) us-west-1

US West (Oregon) us-west-2

Beijing (China) cn-north-1

Ningxia (China) cn-northwest-1

Asia Pacific (Mumbai) ap-south-1 Asia Pacific (Seoul) ap-northeast-2 Asia Pacific (Singapore) ap-southeast-1 Asia Pacific (Sydney) ap-southeast-2 Asia Pacific (Tokyo) ap-northeast-1

Canada (Central) ca-central-1

EU (Frankfurt) eu-central-1

EU (Ireland) eu-west-1

EU (London) eu-west-2

EU (Paris) eu-west-3

SA (Sao Paulo) sa-east-1

(14)

Base

Deep Learning Base AMI Options

Use the Launching and Configuring a DLAMI (p. 19) guide to continue with one of these DLAMI.

• Deep Learning Base AMI (Ubuntu 18.04) on the AWS Marketplace

• Deep Learning Base AMI (Ubuntu 16.04) on the AWS Marketplace

• Deep Learning Base AMI (Amazon Linux) on the AWS Marketplace

• Deep Learning Base AMI (Amazon Linux 2) on the AWS Marketplace

These DLAMIs are available in these Regions:

Region Code

US East (Ohio) us-east-2

US East (N. Virginia) us-east-1

GovCloud us-gov-west-1

US West (N. California) us-west-1

US West (Oregon) us-west-2

Beijing (China) cn-north-1

Ningxia (China) cn-northwest-1

Asia Pacific (Mumbai) ap-south-1 Asia Pacific (Seoul) ap-northeast-2 Asia Pacific (Singapore) ap-southeast-1 Asia Pacific (Sydney) ap-southeast-2 Asia Pacific (Tokyo) ap-northeast-1

Canada (Central) ca-central-1

EU (Frankfurt) eu-central-1

EU (Ireland) eu-west-1

EU (London) eu-west-2

EU (Paris) eu-west-3

SA (Sao Paulo) sa-east-1

Deep Learning AMI with CUDA 10.2 Options

Use the Launching and Configuring a DLAMI (p. 19) guide to continue with one of these DLAMIs.

• Deep Learning AMI (Ubuntu 18.04) on the AWS Marketplace

• Deep Learning Base AMI (Ubuntu 18.04) on the AWS Marketplace

• Deep Learning AMI (Ubuntu 16.04) on the AWS Marketplace

(15)

CUDA 10.1

• Deep Learning Base AMI (Ubuntu 16.04) on the AWS Marketplace

• Deep Learning AMI (Amazon Linux) on the AWS Marketplace

• Deep Learning Base AMI (Amazon Linux) on the AWS Marketplace

• Deep Learning AMI (Amazon Linux 2) on the AWS Marketplace

• Deep Learning Base AMI (Amazon Linux 2) on the AWS Marketplace

NoteThe Deep Learning AMI with Conda has CUDA 10, CUDA 10.1, and CUDA 10.2. The frameworks use the latest CUDA that they support.

The Deep Learning Base AMI has CUDA 10, CUDA 10.1, and CUDA 10.2. To switch between them, follow the directions on Using the Deep Learning Base AMI (p. 32).

Deep Learning AMI with CUDA 10.1 Options

Use the Launching and Configuring a DLAMI (p. 19) guide to continue with one of these DLAMIs.

• Deep Learning AMI (Ubuntu 18.04) on the AWS Marketplace

• Deep Learning Base AMI (Ubuntu 18.04) on the AWS Marketplace

• Deep Learning AMI (Ubuntu 16.04) on the AWS Marketplace

• Deep Learning Base AMI (Ubuntu 16.04) on the AWS Marketplace

• Deep Learning AMI (Amazon Linux) on the AWS Marketplace

• Deep Learning Base AMI (Amazon Linux) on the AWS Marketplace

• Deep Learning AMI (Amazon Linux 2) on the AWS Marketplace

• Deep Learning Base AMI (Amazon Linux 2) on the AWS Marketplace

NoteThe Deep Learning AMI with Conda has CUDA 10, CUDA 10.1, and CUDA 10.2. The frameworks use the latest CUDA that they support.

The Deep Learning Base AMI has CUDA 10, CUDA 10.1, and CUDA 10.2. To switch between them, follow the directions on Using the Deep Learning Base AMI (p. 32).

Deep Learning AMI with CUDA 10 Options

Use the Launching and Configuring a DLAMI (p. 19) guide to continue with one of these DLAMI.

• Deep Learning AMI (Ubuntu 18.04) on the AWS Marketplace

• Deep Learning AMI (Ubuntu 16.04) on the AWS Marketplace

• Deep Learning AMI (Amazon Linux) on the AWS Marketplace

• Deep Learning AMI (Amazon Linux 2) on the AWS Marketplace

• Deep Learning Base AMI (Ubuntu 18.04) on the AWS Marketplace

• Deep Learning Base AMI (Ubuntu 16.04) on the AWS Marketplace

• Deep Learning Base AMI (Amazon Linux) on the AWS Marketplace

• Deep Learning Base AMI (Amazon Linux 2) on the AWS Marketplace

NoteThe Deep Learning AMI with Conda has CUDA 10, CUDA 10.1, and CUDA 10.2. The frameworks use the latest CUDA that they support.

(16)

Ubuntu 18.04

The Deep Learning Base AMI has CUDA 10, CUDA 10.1, and CUDA 10.2. To switch between them, follow the directions on Using the Deep Learning Base AMI (p. 32).

AWS Deep Learning AMI with Ubuntu 18.04 Options

Use the Launching and Configuring a DLAMI (p. 19) guide to continue with one of these DLAMI. The Deep Learning AMI with Conda does not support Elastic Inference for Ubuntu 18.04.

• Deep Learning AMI (Ubuntu 18.04) on the AWS Marketplace

• Deep Learning Base AMI (Ubuntu 18.04) on the AWS Marketplace

These DLAMIs are available in these Regions:

Region Code

US East (Ohio) us-east-2

US East (N. Virginia) us-east-1

GovCloud us-gov-west-1

US West (N. California) us-west-1

US West (Oregon) us-west-2

Beijing (China) cn-north-1

Ningxia (China) cn-northwest-1

Asia Pacific (Mumbai) ap-south-1 Asia Pacific (Seoul) ap-northeast-2 Asia Pacific (Singapore) ap-southeast-1 Asia Pacific (Sydney) ap-southeast-2 Asia Pacific (Tokyo) ap-northeast-1

Canada (Central) ca-central-1

EU (Frankfurt) eu-central-1

EU (Ireland) eu-west-1

EU (London) eu-west-2

EU (Paris) eu-west-3

SA (Sao Paulo) sa-east-1

AWS Deep Learning AMI with Ubuntu 16.04 Options

NoteUbuntu Linux 16.04 LTS reached the end of its five-year LTS window on April 30, 2021 and is no longer supported by its vendor. There are no longer updates to the Deep Learning Base AMI (Ubuntu 16.04) in new releases as of October 2021. Previous releases continue to be available.

(17)

Amazon Linux

Use the Launching and Configuring a DLAMI (p. 19) guide to continue with one of these DLAMIs.

The Deep Learning AMI with Conda comes with environments that support Elastic Inference for Ubuntu 16.04. For tutorials and more information about Elastic Inference, see the Elastic Inference Documentation.

• Deep Learning AMI (Ubuntu 16.04) on the AWS Marketplace

• Deep Learning Base AMI (Ubuntu 16.04) on the AWS Marketplace

These DLAMIs are available in these Regions:

Region Code

US East (Ohio) us-east-2

US East (N. Virginia) us-east-1

GovCloud us-gov-west-1

US West (N. California) us-west-1

US West (Oregon) us-west-2

Beijing (China) cn-north-1

Ningxia (China) cn-northwest-1

Asia Pacific (Mumbai) ap-south-1 Asia Pacific (Seoul) ap-northeast-2 Asia Pacific (Singapore) ap-southeast-1 Asia Pacific (Sydney) ap-southeast-2 Asia Pacific (Tokyo) ap-northeast-1

Canada (Central) ca-central-1

EU (Frankfurt) eu-central-1

EU (Ireland) eu-west-1

EU (London) eu-west-2

EU (Paris) eu-west-3

SA (Sao Paulo) sa-east-1

AWS Deep Learning AMI Amazon Linux Options

NoteAmazon Linux is end-of-life as of December 2020. There are no longer updates to the Deep Learning AMI (Amazon Linux) in new releases as of October 2021. Previous releases of the Deep Learning AMI (Amazon Linux) continue to be available.

Use the Launching and Configuring a DLAMI (p. 19) guide to continue with one of these DLAMIs.

The Deep Learning AMI with Conda comes with environments that support Elastic Inference for Amazon Linux. For tutorials and more information about Elastic Inference, see the Elastic Inference Documentation.

(18)

Amazon Linux 2

• Deep Learning AMI (Amazon Linux) on the AWS Marketplace

• Deep Learning Base AMI (Amazon Linux) on the AWS Marketplace

These DLAMIs are available in these Regions:

Region Code

US East (Ohio) us-east-2

US East (N. Virginia) us-east-1

GovCloud us-gov-west-1

US West (N. California) us-west-1

US West (Oregon) us-west-2

Beijing (China) cn-north-1

Ningxia (China) cn-northwest-1

Asia Pacific (Mumbai) ap-south-1 Asia Pacific (Seoul) ap-northeast-2 Asia Pacific (Singapore) ap-southeast-1 Asia Pacific (Sydney) ap-southeast-2 Asia Pacific (Tokyo) ap-northeast-1

Canada (Central) ca-central-1

EU (Frankfurt) eu-central-1

EU (Ireland) eu-west-1

EU (London) eu-west-2

EU (Paris) eu-west-3

SA (Sao Paulo) sa-east-1

AWS Deep Learning AMI Amazon Linux 2 Options

Use the Launching and Configuring a DLAMI (p. 19) guide to continue with one of these DLAMIs. The Deep Learning AMI with Conda does not support Elastic Inference for Amazon Linux 2.

• Deep Learning AMI (Amazon Linux 2) on the AWS Marketplace

• Deep Learning Base AMI (Amazon Linux 2) on the AWS Marketplace

These DLAMIs are available in these Regions:

Region Code

US East (Ohio) us-east-2

(19)

Instance Selection

Region Code

US East (N. Virginia) us-east-1 US West (N. California) us-west-1

US West (Oregon) us-west-2

Asia Pacific (Mumbai) ap-south-1 Asia Pacific (Seoul) ap-northeast-2 Asia Pacific (Osaka) ap-northeast-3 Asia Pacific (Singapore) ap-southeast-1 Asia Pacific (Sydney) ap-southeast-2 Asia Pacific (Tokyo) ap-northeast-1

Canada (Central) ca-central-1

EU (Frankfurt) eu-central-1

EU (Ireland) eu-west-1

EU (London) eu-west-2

EU (Paris) eu-west-3

SA (Sao Paulo) sa-east-1

Amazon Linux 2 Release Notes

Selecting the Instance Type for DLAMI

Consider the following when selecting an instance type for DLAMI.

• If you're new to deep learning, then an instance with a single GPU might suit your needs.

• If you're budget conscious, then you can use CPU-only instances.

• If you're looking to optimize high performance and cost efficiency for deep learning model inference, then you can use instances with AWS Inferentia chips.

• If you're looking to optimize high performance and cost efficiency for deep learning model training, then you can use instances with Habana accelerators.

• If you're looking for a high performance GPU instance with an Arm-based CPU architecture, then you can use the G5g instance type.

• If you're interested in running a pretrained model for inference and predictions, then you can attach an Amazon Elastic Inference to your Amazon EC2 instance. Amazon Elastic Inference gives you access to an accelerator with a fraction of a GPU.

• For high-volume inference services, a single CPU instance with a lot of memory, or a cluster of such instances, might be a better solution.

• If you're using a large model with a lot of data or a high batch size, then you need a larger instance with more memory. You can also distribute your model to a cluster of GPUs. You may find that using an instance with less memory is a better solution for you if you decrease your batch size. This may impact your accuracy and training speed.

(20)

Pricing

• If you’re interested in running machine learning applications using NVIDIA Collective Communications Library (NCCL) requiring high levels of inter-node communications at scale, you might want to use Elastic Fabric Adapter (EFA).

For more detail on instances, see EC2 Instance Types.

The following topics provide information about instance type considerations.

Important

The Deep Learning AMIs include drivers, software, or toolkits developed, owned, or provided by NVIDIA Corporation. You agree to use these NVIDIA drivers, software, or toolkits only on Amazon EC2 instances that include NVIDIA hardware.

Topics

• Pricing for the DLAMI (p. 16)

• DLAMI Region Availability (p. 16)

• Recommended GPU Instances (p. 16)

• Recommended CPU Instances (p. 17)

• Recommended Inferentia Instances (p. 17)

• Recommended Habana Instances (p. 18)

Pricing for the DLAMI

The deep learning frameworks included in the DLAMI are free, and each has its own open-source licenses.

Although the software included in the DLAMI is free, you still have to pay for the underlying Amazon EC2 instance hardware.

Some Amazon EC2 instance types are labeled as free. It is possible to run the DLAMI on one of these free instances. This means that using the DLAMI is entirely free when you only use that instance's capacity.

If you need a more powerful instance with more CPU cores, more disk space, more RAM, or one or more GPUs, then you need an instance that is not in the free-tier instance class.

For more information about instance selection and pricing, see Amazon EC2 pricing.

DLAMI Region Availability

Each Region supports a different range of instance types and often an instance type has a slightly different cost in different Regions. DLAMIs are not available in every Region, but it is possible to copy DLAMIs to the Region of your choice. See Copying an AMI for more information. Note the Region selection list and be sure you pick a Region that's close to you or your customers. If you plan to use more than one DLAMI and potentially create a cluster, be sure to use the same Region for all of nodes in the cluster.

For a more info on Regions, visit EC2 Regions.

Next Up

Recommended GPU Instances (p. 16)

Recommended GPU Instances

We recommend a GPU instance for most deep learning purposes. Training new models is faster on a GPU instance than a CPU instance. You can scale sub-linearly when you have multi-GPU instances or if you

(21)

CPU

use distributed training across many instances with GPUs. To set up distributed training, see Distributed Training (p. 52).

The following instance types support the DLAMI. For information about GPU instance type options and their uses, see EC2 Instance Types and select Accelerated Computing.

NoteThe size of your model should be a factor in selecting an instance. If your model exceeds an instance's available RAM, select a different instance type with enough memory for your application.

• Amazon EC2 P3 Instances have up to 8 NVIDIA Tesla V100 GPUs.

• Amazon EC2 P4 Instances have up to 8 NVIDIA Tesla A100 GPUs.

• Amazon EC2 G3 Instances have up to 4 NVIDIA Tesla M60 GPUs.

• Amazon EC2 G4 Instances have up to 4 NVIDIA T4 GPUs.

• Amazon EC2 G5 Instances have up to 8 NVIDIA A10G GPUs.

• Amazon EC2 G5g Instances have Arm-based AWS Graviton2 processors.

DLAMI instances provide tooling to monitor and optimize your GPU processes. For more information about monitoring your GPU processes, see GPU Monitoring and Optimization (p. 81).

For specific tutorials on working with G5g instances, see The Graviton DLAMI (p. 102).

Next Up

Recommended CPU Instances (p. 17)

Recommended CPU Instances

Whether you're on a budget, learning about deep learning, or just want to run a prediction service, you have many affordable options in the CPU category. Some frameworks take advantage of Intel's MKL DNN, which speeds up training and inference on C5 (not available in all Regions), C4, and C3 CPU instance types. For information about CPU instance types, see EC2 Instance Types and select Compute Optimized.

NoteThe size of your model should be a factor in selecting an instance. If your model exceeds an instance's available RAM, select a different instance type with enough memory for your application.

• Amazon EC2 C5 Instances have up to 72 Intel vCPUs. C5 instances excel at scientific modeling, batch processing, distributed analytics, high-performance computing (HPC), and machine and deep learning inference.

• Amazon EC2 C4 Instances have up to 36 Intel vCPUs.

Next Up

Recommended Inferentia Instances (p. 17)

Recommended Inferentia Instances

AWS Inferentia instances are designed to provide high performance and cost efficiency for deep learning model inference workloads. Specifically, Inf1 instance types use AWS Inferentia chips and the AWS Neuron SDK, which is integrated with popular machine learning frameworks such as TensorFlow, PyTorch, and MXNet.

(22)

Habana

Customers can use Inf1 instances to run large scale machine learning inference applications such as search, recommendation engines, computer vision, speech recognition, natural language processing, personalization, and fraud detection, at the lowest cost in the cloud.

NoteThe size of your model should be a factor in selecting an instance. If your model exceeds an instance's available RAM, select a different instance type with enough memory for your application.

• Amazon EC2 Inf1 Instances have up to up to 16 AWS Inferentia chips and 100 Gbps of networking throughput.

For more information about getting started with AWS Inferentia DLAMIs, see The AWS Inferentia Chip With DLAMI (p. 87).

Next Up

Recommended Habana Instances (p. 18)

Recommended Habana Instances

Instances with Habana accelerators are designed to provide high performance and cost efficiency for deep learning model training workloads. Specifically, DL1 instance types use Habana Gaudi accelerators from Habana Labs, an Intel company. DL1 instances are ideal for training machine learning models used in applications such as natural language processing, object detection and classification, recommendation engines, and autonomous vehicle perception.

Instances with Habana accelerators are configured with Habana SynapseAI software and pre-integrated with popular machine learning frameworks such as TensorFlow and PyTorch. If you are looking for an optimal combination of performance and price for training deep learning models, consider instances with Habana accelerators for the lowest cost to train.

Note

The size of your model should be a factor in selecting an instance. If your model exceeds an instance's available RAM, select a different instance type with enough memory for your application.

• Amazon EC2 DL1 Instances have up to eight Habana Gaudi accelerators, 256GB of accelerator memory, 4TB of local NVMe storage, and 400 Gbps of networking throughput.

For more information about getting started with Habana DLAMIs, see The Habana DLAMI (p. 109).

(23)

Step 1: Launch a DLAMI

Launching and Configuring a DLAMI

If you're here you should already have a good idea of which AMI you want to launch. If not, choose a DLAMI using the AMI selection guidelines found throughout Getting Started (p. 4) or use the full listing of AMIs in the Appendix section, AMI Options (p. 8).

You should also know which instance type and region you're going to choose. If not, browse Selecting the Instance Type for DLAMI (p. 15).

Note

We will use p2.xlarge as the default instance type in the examples. Just replace this with whichever instance type you have in mind.

Important

If you plan to use Elastic Inference, you have Elastic Inference Setup that must be completed prior to launching your DLAMI.

Topics

• Step 1: Launch a DLAMI (p. 19)

• DLAMI ID (p. 20)

• EC2 Console (p. 20)

• Marketplace Search (p. 21)

• Step 2: Connect to the DLAMI (p. 21)

• Step 3: Secure Your DLAMI Instance (p. 21)

• Step 4: Test Your DLAMI (p. 22)

• Clean Up (p. 22)

• Set up a Jupyter Notebook Server (p. 22)

Step 1: Launch a DLAMI

Note

For this walkthrough, we might make references specific to the Deep Learning AMI (Ubuntu 16.04). Even if you select a different DLAMI, you should be able to follow this guide.

Launch the instance

1. You have a couple routes for launching DLAMI. Choose one:

• EC2 Console (p. 20)

• Marketplace Search (p. 21)

TipCLI Option: If you choose to spin up a DLAMI using the AWS CLI, you will need the AMI's ID, the region and instance type, and your security token information. Be sure you have your AMI and instance IDs ready. If you haven't set up the AWS CLI yet, do that first using the guide for Installing the AWS Command Line Interface.

2. After you have completed the steps of one of those options, wait for the instance to be ready. This usually takes only a few minutes. You can verify the status of the instance in the EC2 Console.

(24)

DLAMI ID

DLAMI ID

Find the ID for the DLAMI of your choice with the AWS Command Line Interface (AWS CLI). If you do not already have the AWS CLI installed, see Getting started with the AWS CLI.

1. Make sure that your AWS credentials are configured.

aws configure

2. Choose a DLAMI and check the details in the release notes. Use the following command to get the ID for the DLAMI of your choice:

aws ec2 describe-images --region us-east-1 --owners amazon \

--filters 'Name=name,Values=Deep Learning AMI (Ubuntu 18.04) Version ??.?' 'Name=state,Values=available' \

--query 'reverse(sort_by(Images, &CreationDate))[:1].ImageId' --output text

NoteYou can specify a release version for a given framework or get the latest release by replacing the version number with a question mark.

3. The output should look similar to the following:

ami-094c089c38ed069f2

Copy this DLAMI ID and press q to exit the prompt.

Next Step

EC2 Console (p. 20)

EC2 Console

Note

To launch an instance with Elastic Fabric Adapter (EFA), refer to these steps.

1. Open the EC2 Console.

2. Note your current region in the top-most navigation. If this isn't your desired AWS Region, change this option before proceeding. For more information, see EC2 Regions.

3. Choose Launch Instance.

4. Search for the desired instance by name:

a. Select the DLAMI that is right for you. Find the DLAMI name as listed in the release notes or find the DLAMI ID using the AWS CLI.

b. Choose Community AMIs.

i. To view a selection of the latest DLAMIs, choose Quick Start.

ii. Choose AWS Marketplace to browse additional DLAMIs. Only a subset of available DLAMIs will be listed here.

c. Enter the DLAMI name or search the DLAMI ID. Browse the options and then click Select on your choice.

5. Review the details, and then choose Continue.

6. Choose an instance type. For recommendations on DLAMI instance types, see Instance Selection.

(25)

Marketplace Search

NoteIf you want to use Elastic Inference (EI), click Configure Instance Details, select Add an Amazon EI accelerator, then select the size of the Amazon EI accelerator.

7. Choose Review and Launch.

8. Review the details and pricing. Choose Launch.

TipCheck out Get Started with Deep Learning Using the AWS Deep Learning AMI for a walk-through with screenshots!

Next Step

Step 2: Connect to the DLAMI (p. 21)

Marketplace Search

1. Browse the AWS Marketplace and search for AWS Deep Learning AMI.

2. Browse the options, and then click Select on your choice.

3. Review the details, and then choose Continue.

4. Review the details and make note of the Region. If this isn't your desired AWS Region, change this option before proceeding. For more information, see EC2 Regions.

5. Choose an instance type.

6. Choose a key pair, use your default one, or create a new one.

7. Review the details and pricing.

8. Choose Launch with 1-Click.

Next Step

Step 2: Connect to the DLAMI (p. 21)

Step 2: Connect to the DLAMI

Connect to the DLAMI that you launched from a client (Windows, MacOS, or Linux). For more information, see Connect to Your Linux Instance in the Amazon EC2 User Guide for Linux Instances.

Keep a copy of the SSH login command handy if you want to do the Jupyter setup after logging in. You will use a variation of it to connect to the Jupyter webpage.

Next Step

Step 3: Secure Your DLAMI Instance (p. 21)

Step 3: Secure Your DLAMI Instance

Always keep your operating system and other installed software up to date by applying patches and updates as soon as they become available.

If you are using Amazon Linux or Ubuntu, when you login to your DLAMI, you are notified if updates are available and see instructions for updating. For further information on Amazon Linux maintenance, see Updating Instance Software. For Ubuntu instances, refer to the official Ubuntu documentation.

(26)

Step 4: Test Your DLAMI

On Windows, check Windows Update regularly for software and security updates. If you prefer, have updates applied automatically.

Important

For information about the Meltdown and Spectre vulnerabilities and how to patch your operating system to address them, see Security Bulletin AWS-2018-013.

Step 4: Test Your DLAMI

Depending on your DLAMI version, you have different testing options:

• Deep Learning AMI with Conda (p. 6) – go to Using the Deep Learning AMI with Conda (p. 29).

• Deep Learning Base AMI (p. 6) – refer to your desired framework's installation documentation.

You can also create a Jupyter notebook, try out tutorials, or start coding in Python. For more information, see Set up a Jupyter Notebook Server (p. 22).

Clean Up

When you no longer need the DLAMI, you can stop it or terminate it to avoid incurring continuing charges. Stopping an instance will keep it around so you can resume it later. Your configurations, files, and other non-volatile information is being stored in a volume on Amazon S3. You will be charged the small S3 fee to retain the volume while the instance is stopped, but you will no longer be charged for the compute resources while it is in the stopped state. When your start the instance again, it will mount that volume and your data will be there. If you terminate an instance, it is gone, and you cannot start it again. Your data actually still resides on S3, so to prevent any further charges you need to delete the volume as well. For more instructions, see Terminate Your Instance in the Amazon EC2 User Guide for Linux Instances.

Set up a Jupyter Notebook Server

A Jupyter notebook server enables you to create and run Jupyter notebooks from your DLAMI instance.

With Jupyter notebooks, you can conduct machine learning (ML) experiments for training and inference while using the AWS infrastructure and accessing packages built into the DLAMI. For more information about Jupyter notebooks, see the Jupyter Notebook documentation.

To set up a Jupyter notebook server, you must:

• Configure the Jupyter notebook server on your Amazon EC2 DLAMI instance.

• Configure your client so that you can connect to the Jupyter notebook server. We provide configuration instructions for Windows, macOS, and Linux clients.

• Test the setup by logging in to the Jupyter notebook server.

To complete the steps to set up a Jupyter, follow the instructions in the following topics. Once you've set up a Jupyter notebook server, see Running Jupyter Notebook Tutorials (p. 33) for information on running the example notebooks that ship in the DLAMI.

Topics

• Secure Your Jupyter Server (p. 23)

• Start the Jupyter notebook server (p. 23)

• Configure the Client to Connect to the Jupyter Server (p. 24)

(27)

Secure Jupyter

• Test by Logging in to the Jupyter notebook server (p. 25)

Secure Your Jupyter Server

Here we set up Jupyter with SSL and a custom password.

Connect to the Amazon EC2 instance, and then complete the following procedure.

Configure the Jupyter server

1. Jupyter provides a password utility. Run the following command and enter your preferred password at the prompt.

$ jupyter notebook password

The output will look something like this:

Enter password:

Verify password:

[NotebookPasswordApp] Wrote hashed password to /home/ubuntu/.jupyter/

jupyter_notebook_config.json

2. Create a self-signed SSL certificate. Follow the prompts to fill out your locality as you see fit. You must enter . if you wish to leave a prompt blank. Your answers will not impact the functionality of the certificate.

$ cd ~

$ mkdir ssl

$ cd ssl

$ openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout mykey.key -out mycert.pem

Note

You might be interested in creating a regular SSL certificate that is third party signed and does not cause the browser to give you a security warning. This process is much more involved. Visit Jupyter's documention for more information.

Next Step

Start the Jupyter notebook server (p. 23)

Start the Jupyter notebook server

Now you can fire up the Jupyter server by logging in to the instance and running the following command that uses the SSL certificate you created in the previous step.

$ jupyter notebook --certfile=~/ssl/mycert.pem --keyfile ~/ssl/mykey.key

With the server started, you can now connect to it via an SSH tunnel from your client computer. When the server runs, you will see some output from Jupyter confirming that the server is running. At this point, ignore the callout that you can access the server via a localhost URL, because that won't work until you create the tunnel.

NoteJupyter will handle switching environments for you when you switch frameworks using the Jupyter web interface. More info on this can be found in Switching Environments with Jupyter (p. 34).

(28)

Configure Client

Next Step

Configure the Client to Connect to the Jupyter Server (p. 24)

Configure the Client to Connect to the Jupyter Server

After configuring your client to connect to the Jupyter notebook server, you can create and access notebooks on the server in your workspace and run your deep learning code on the server.

For configuration information, choose one of the following links.

Topics

• Configure a Windows Client (p. 24)

• Configure a Linux or macOS Client (p. 24)

Configure a Windows Client

Prepare

Be sure you have the following information, which you need to set up the SSH tunnel:

• The public DNS name of your Amazon EC2 instance. You can find the public DNS name in the EC2 console.

• The key pair for the private key file. For more information about accessing your key pair, see Amazon EC2 Key Pairs in the Amazon EC2 User Guide for Linux Instances.

Using Jupyter Notebooks from a Windows Client

Refer to these guides on connecting to your Amazon EC2 instance from a Windows client.

1.Troubleshooting Connecting to Your Instance

2.Connecting to Your Linux Instance from Windows Using PuTTY

To create a tunnel to a running Jupyter server, a recommended approach is to install Git Bash on your Windows client, then follow the Linux/macOS client instructions. However, you may use whatever approach you want for opening an SSH tunnel with port mapping. Refer to Jupyter's documentation for further information.

Next Step

Configure a Linux or macOS Client (p. 24)

Configure a Linux or macOS Client

1. Open a terminal.

2. Run the following command to forward all requests on local port 8888 to port 8888 on your remote Amazon EC2 instance. Update the command by replacing the location of your key to access the Amazon EC2 instance and the public DNS name of your Amazon EC2 instance. Note, for an Amazon Linux AMI, the user name is ec2-user instead of ubuntu.

$ ssh -i ~/mykeypair.pem -N -f -L 8888:localhost:8888 ubuntu@ec2-###-##-##-

###.compute-1.amazonaws.com

(29)

Log in to the Jupyter notebook server

This command opens a tunnel between your client and the remote Amazon EC2 instance that is running the Jupyter notebook server.

Next Step

Test by Logging in to the Jupyter notebook server (p. 25)

Test by Logging in to the Jupyter notebook server

Now you are ready to log in to the Jupyter notebook server.

Your next step is to test the connection to the server through your browser.

1. In the address bar of your browser, type the following URL, or click on this link: https://

localhost:8888

2. With a self signed SSL certificate, your browser will warn you and prompt you to avoid continuing to visit the website.

(30)

Log in to the Jupyter notebook server

Since you set this up yourself, it is safe to continue. Depending your browser you will get an

"advanced", "show details", or similar button.

(31)

Log in to the Jupyter notebook server

(32)

Log in to the Jupyter notebook server

Click on this, then click on the "proceed to localhost" link. If the connection is successful, you see the Jupyter notebook server webpage. At this point, you will be asked for the password you previously setup.

Now you have access to the Jupyter notebook server that is running on the DLAMI. You can create new notebooks or run the provided Tutorials (p. 34).

(33)

Conda DLAMI

Using a DLAMI

Topics

• Using the Deep Learning AMI with Conda (p. 29)

• Using the Deep Learning Base AMI (p. 32)

• Running Jupyter Notebook Tutorials (p. 33)

• Tutorials (p. 34)

The following sections describe how the Deep Learning AMI with Conda can be used to switch environments, run sample code from each of the frameworks, and run Jupyter so you can try out different notebook tutorials.

Using the Deep Learning AMI with Conda

Topics

• Introduction to the Deep Learning AMI with Conda (p. 29)

• Log in to Your DLAMI (p. 29)

• Start the TensorFlow Environment (p. 30)

• Switch to the PyTorch Python 3 Environment (p. 31)

• Switch to the MXNet Python 3 Environment (p. 31)

• Removing Environments (p. 32)

Introduction to the Deep Learning AMI with Conda

Conda is an open source package management system and environment management system that runs on Windows, macOS, and Linux. Conda quickly installs, runs, and updates packages and their dependencies. Conda easily creates, saves, loads and switches between environments on your local computer.

The Deep Learning AMI with Conda has been configured for you to easily switch between deep learning environments. The following instructions guide you on some basic commands with conda. They also help you verify that the basic import of the framework is functioning, and that you can run a couple simple operations with the framework. You can then move on to more thorough tutorials provided with the DLAMI or the frameworks' examples found on each frameworks' project site.

Log in to Your DLAMI

After you log in to your server, you will see a server "message of the day" (MOTD) describing various Conda commands that you can use to switch between the different deep learning frameworks. Below is an example MOTD. Your specific MOTD may vary as new versions of the DLAMI are released.

Note

We no longer include the CNTK, Caffe, Caffe2 and Theano Conda environments in the AWS Deep Learning AMI starting with the v28 release. Previous releases of the AWS Deep Learning AMI that contain these environments will continue to be available. However, we will only provide updates to these environments if there are security fixes published by the open source community for these frameworks.

=============================================================================

__| __|_ )

(34)

Start the TensorFlow Environment

_| ( / Deep Learning AMI (Ubuntu 18.04) Version 40.0 ___|\___|___|

=============================================================================

Welcome to Ubuntu 18.04.5 LTS (GNU/Linux 5.4.0-1037-aws x86_64v)

Please use one of the following commands to start the required environment with the framework of your choice:

for AWS MX 1.7 (+Keras2) with Python3 (CUDA 10.1 and Intel MKL-DNN) _______________________________ source activate mxnet_p36

for AWS MX 1.8 (+Keras2) with Python3 (CUDA + and Intel MKL-DNN) ___________________________ source activate mxnet_latest_p37

for AWS MX(+AWS Neuron) with Python3 ___________________________________________________

source activate aws_neuron_mxnet_p36

for AWS MX(+Amazon Elastic Inference) with Python3 _______________________________________

source activate amazonei_mxnet_p36

for TensorFlow(+Keras2) with Python3 (CUDA + and Intel MKL-DNN) _____________________________ source activate tensorflow_p37

for Tensorflow(+AWS Neuron) with Python3 _________________________________________ source activate aws_neuron_tensorflow_p36

for TensorFlow 2(+Keras2) with Python3 (CUDA 10.1 and Intel MKL-DNN) _______________________ source activate tensorflow2_p36

for TensorFlow 2.3 with Python3.7 (CUDA + and Intel MKL-DNN) ________________________

source activate tensorflow2_latest_p37

for PyTorch 1.4 with Python3 (CUDA 10.1 and Intel MKL)

_________________________________________ source activate pytorch_p36

for PyTorch 1.7.1 with Python3.7 (CUDA 11.0 and Intel MKL) ________________________________

source activate pytorch_latest_p37

for PyTorch (+AWS Neuron) with Python3 ______________________________________________

source activate aws_neuron_pytorch_p36 for base Python3 (CUDA 10.0)

_______________________________________________________________________ source activate python3

Each Conda command has the following pattern:

source activate framework_python-version

For example, you may see for MXNet(+Keras1) with Python3 (CUDA 10.1)

_____________________ source activate mxnet_p36, which signifies that the environment has MXNet, Keras 1, Python 3, and CUDA 10.1. And to activate this environment, the command you would use is:

$ source activate mxnet_p36

Start the TensorFlow Environment

NoteWhen you launch your first Conda environment, please be patient while it loads. The Deep Learning AMI with Conda automatically installs the most optimized version of the framework for your EC2 instance upon the framework's first activation. You should not expect subsequent delays.

1. Activate the TensorFlow virtual environment for Python 3.

$ source activate tensorflow_p37 2. Start the iPython terminal.

(tensorflow_37)$ ipython

(35)

Switch to the PyTorch Python 3 Environment

3. Run a quick TensorFlow program.

import tensorflow as tf

hello = tf.constant('Hello, TensorFlow!') sess = tf.Session()

print(sess.run(hello))

You should see "Hello, Tensorflow!"

Next Up

Running Jupyter Notebook Tutorials (p. 33)

Switch to the PyTorch Python 3 Environment

If you're still in the iPython console, use quit(), then get ready to switch environments.

• Activate the PyTorch virtual environment for Python 3.

$ source activate pytorch_p36

Test Some PyTorch Code

To test your installation, use Python to write PyTorch code that creates and prints an array.

1. Start the iPython terminal.

(pytorch_p36)$ ipython 2. Import PyTorch.

import torch

You might see a warning message about a third-party package. You can ignore it.

3. Create a 5x3 matrix with the elements initialized randomly. Print the array.

x = torch.rand(5, 3) print(x)

Verify the result.

tensor([[0.3105, 0.5983, 0.5410], [0.0234, 0.0934, 0.0371], [0.9740, 0.1439, 0.3107], [0.6461, 0.9035, 0.5715], [0.4401, 0.7990, 0.8913]])

Switch to the MXNet Python 3 Environment

If you're still in the iPython console, use quit(), then get ready to switch environments.

(36)

Removing Environments

• Activate the MXNet virtual environment for Python 3.

$ source activate mxnet_p36

Test Some MXNet Code

To test your installation, use Python to write MXNet code that creates and prints an array using the NDArray API. For more information, see NDArray API.

1. Start the iPython terminal.

(mxnet_p36)$ ipython 2. Import MXNet.

import mxnet as mx

You might see a warning message about a third-party package. You can ignore it.

3. Create a 5x5 matrix, an instance of the NDArray, with elements initialized to 0. Print the array.

mx.ndarray.zeros((5,5)).asnumpy()

Verify the result.

array([[ 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0.],

[ 0., 0., 0., 0., 0.]], dtype=float32)

You can find more examples of MXNet in the MXNet tutorials section.

Removing Environments

If you run out of space on the DLAMI, you can choose to uninstall Conda packages that you are not using:

conda env list

conda env remove –-name <env_name>

Using the Deep Learning Base AMI

Using the Deep Learning Base AMI

The Base AMI comes with a foundational platform of GPU drivers and acceleration libraries to deploy your own customized deep learning environment. By default the AMI is configured with the NVIDIA CUDA 10.0 environment. You can also switch between different versions of CUDA. Refer to the following instructions for how to do this.

(37)

Configuring CUDA Versions

Configuring CUDA Versions

You can select and verify a particular CUDA version with the following bash commands.

TipYou can verify the CUDA version by running NVIDIA's nvcc program.

$ nvcc --version

• CUDA 11.0:

$ sudo rm /usr/local/cuda

$ sudo ln -s /usr/local/cuda-11.0 /usr/local/cuda

• CUDA 10.2:

$ sudo rm /usr/local/cuda

$ sudo ln -s /usr/local/cuda-10.2 /usr/local/cuda

• CUDA 10.1:

$ sudo rm /usr/local/cuda

$ sudo ln -s /usr/local/cuda-10.1 /usr/local/cuda

• CUDA 10.0:

$ sudo rm /usr/local/cuda

$ sudo ln -s /usr/local/cuda-10.0 /usr/local/cuda

Running Jupyter Notebook Tutorials

Tutorials and examples ship with each of the deep learning projects' source and in most cases they will run on any DLAMI. If you chose the Deep Learning AMI with Conda (p. 6), you get the added benefit of a few hand-picked tutorials already set up and ready to try out.

Important

To run the Jupyter notebook tutorials installed on the DLAMI, you will need to Set up a Jupyter Notebook Server (p. 22).

Once the Jupyter server is running, you can run the tutorials through your web browser. If you are running the Deep Learning AMI with Conda or if you have set up Python environments, you can switch Python kernels from the Jupyter notebook interface. Select the appropriate kernel before trying to run a framework-specific tutorial. Further examples of this are provided for users of the Deep Learning AMI with Conda.

NoteMany tutorials require additional Python modules that may not be set up on your DLAMI. If you get an error like "xyz module not found", log in to the DLAMI, activate the environment as described above, then install the necessary modules.

TipDeep learning tutorials and examples often rely on one or more GPUs. If your instance type doesn't have a GPU, you may need to change some of the example's code to get it to run.

(38)

Navigating the Installed Tutorials

Navigating the Installed Tutorials

Once you're logged in to the Jupyter server and can see the tutorials directory (on Deep Learning AMI with Conda only), you will be presented with folders of tutorials by each framework name. If you don't see a framework listed, then tutorials are not available for that framework on your current DLAMI. Click on the name of the framework to see the listed tutorials, then click a tutorial to launch it.

The first time you run a notebook on the Deep Learning AMI with Conda, it will want to know which environment you would like to use. It will prompt you to select from a list. Each environment is named according to this pattern:

Environment (conda_framework_python-version)

For example, you might see Environment (conda_mxnet_p36), which signifies that the environment has MXNet and Python 3. The other variation of this would be Environment (conda_mxnet_p27), which signifies that the environment has MXNet and Python 2.

TipIf you're concerned about which version of CUDA is active, one way to see it is in the MOTD when you first log in to the DLAMI.

Switching Environments with Jupyter

If you decide to try a tutorial for a different framework, be sure to verify the currently running kernel.

This info can be seen in the upper right of the Jupyter interface, just below the logout button. You can change the kernel on any open notebook by clicking the Jupyter menu item Kernel, then Change Kernel, and then clicking the environment that suits the notebook you're running.

At this point you'll need to rerun any cells because a change in the kernel will erase the state of anything you've run previously.

TipSwitching between frameworks can be fun and educational, however you can run out of memory. If you start running into errors, look at the terminal window that has the Jupyter server running. There are helpful messages and error logging here, and you may see an out- of-memory error. To fix this, you can go to the home page of your Jupyter server, click the Running tab, then click Shutdown for each of the tutorials that are probably still running in the background and eating up all of your memory.

Next Up

For more examples and sample code from each framework, click Next or continue to Apache MXNet (Incubating) (p. 35).

Tutorials

The following are tutorials on how to use the Deep Learning AMI with Conda's software.

Topics

• 10 Minute Tutorials (p. 35)

• Activating Frameworks (p. 35)

• Debugging and Visualization (p. 49)

• Distributed Training (p. 52)

(39)

10 Minute Tutorials

• Elastic Fabric Adapter (p. 69)

• GPU Monitoring and Optimization (p. 81)

• The AWS Inferentia Chip With DLAMI (p. 87)

• The Graviton DLAMI (p. 102)

• The Habana DLAMI (p. 109)

• Inference (p. 110)

• Using Frameworks with ONNX (p. 114)

• Model Serving (p. 122)

10 Minute Tutorials

• Launch a AWS Deep Learning AMI (in 10 minutes)

• Train a Deep Learning model with AWS Deep Learning Containers on Amazon EC2 (in 10 minutes)

Activating Frameworks

The following are the deep learning frameworks installed on the Deep Learning AMI with Conda. Click on a framework to learn how to activate it.

Topics

• Apache MXNet (Incubating) (p. 35)

• Caffe2 (p. 37)

• Chainer (p. 38)

• CNTK (p. 38)

• Keras (p. 40)

• PyTorch (p. 41)

• TensorFlow (p. 43)

• TensorFlow 2 (p. 45)

• TensorFlow with Horovod (p. 46)

• TensorFlow 2 with Horovod (p. 47)

• Theano (p. 48)

Apache MXNet (Incubating)

Activating Apache MXNet (Incubating)

This tutorial shows how to activate MXNet on an instance running the Deep Learning AMI with Conda (DLAMI on Conda) and run a MXNet program.

When a stable Conda package of a framework is released, it's tested and pre-installed on the DLAMI. If you want to run the latest, untested nightly build, you can Installing MXNet's Nightly Build (experimental) (p. 36) manually.

To run MXNet on the DLAMI with Conda

1. To activate the framework, open an Amazon Elastic Compute Cloud (Amazon EC2) instance of the DLAMI with Conda.

參考文獻

相關文件

• User goal: Two tickets for “the witch” tomorrow 9:30 PM at regal meridian 16, Seattle. E2E Task-Completion Bot (TC-Bot) (Li et

 End-to-end reinforcement learning dialogue system (Li et al., 2017; Zhao and Eskenazi, 2016)?.  No specific goal, focus on

infinite ensemble learning could be better – existing AdaBoost-Stump applications may switch. derived new and

– stump kernel: succeeded in specific applications infinite ensemble learning could be better – existing AdaBoost-Stump applications may switch. not the

A novel surrogate able to adapt to any given MLL criterion The first cost-sensitive multi-label learning deep model The proposed model successfully. Tackle general

Deep Learning of Binary Hash Codes for Fast Image Retrieval!. Kevin Lin, Huei-Fang Yang, Jen-Hao Hsiao,

Ongoing Projects in Image/Video Analytics with Deep Convolutional Neural Networks. § Goal – Devise effective and efficient learning methods for scalable visual analytic

○ Value function: how good is each state and/or action. ○ Policy: agent’s