Deep Learning AMI

(1)

Deep Learning AMI

Developer Guide

(2)

Deep Learning AMI: Developer Guide

Amazon's trademarks and trade dress may not be used in connection with any product or service that is not Amazon's, in any manner that is likely to cause confusion among customers, or in any manner that disparages or discredits Amazon. All other trademarks not owned by Amazon are the property of their respective owners, who may or may not be aﬃliated with, connected to, or sponsored by Amazon.

(3)

What Is the AWS Deep Learning AMI?

Welcome to the User Guide for the AWS Deep Learning AMI.

The AWS Deep Learning AMI (DLAMI) is your one-stop shop for deep learning in the cloud. This customized machine instance is available in most Amazon EC2 regions for a variety of instance types, from a small CPU-only instance to the latest high-powered multi-GPU instances. It comes preconﬁgured with NVIDIA CUDA and NVIDIA cuDNN, as well as the latest releases of the most popular deep learning frameworks.

About This Guide

This guide will help you launch and use the DLAMI. It covers several use cases that are common for deep learning, for both training and inference. Choosing the right AMI for your purpose and the kind of instances you may prefer is also covered. The DLAMI comes with several tutorials for each of the frameworks. It also has tutorials on distributed training, debugging, using AWS Inferentia, and other key concepts. You will ﬁnd instructions on how to conﬁgure Jupyter to run the tutorials in your browser.

Prerequisites

You should be familiar with command line tools and basic Python to successfully run the DLAMI.

Tutorials on how to use each framework are provided by the frameworks themselves, however, this guide can show you how to activate each one and ﬁnd the appropriate tutorials to get started.

Example DLAMI Uses

Learning about deep learning: The DLAMI is a great choice for learning or teaching machine learning and deep learning frameworks. It takes the headache away from troubleshooting the installations of each framework and getting them to play along on the same computer. The DLAMI comes with a Jupyter notebook and makes it easy to run the tutorials provided by the frameworks for people new to machine learning and deep learning.

App development: If you're an app developer and are interested in using deep learning to make your apps utilize the latest advances in AI, the DLAMI is the perfect test bed for you. Each framework comes with tutorials on how to get started with deep learning, and many of them have model zoos that make it easy to try out deep learning without having to create the neural networks yourself or to do any of the model training. Some examples show you how to build an image detection application in just a few minutes, or how to build a speech recognition app for your own chatbot.

Machine learning and data analytics: If you're a data scientist or interested in processing your data with deep learning, you'll ﬁnd that many of the frameworks have support for R and Spark. You will ﬁnd tutorials on how to do simple regressions, all the way up to building scalable data processing systems for personalization and predictions systems.

Research: If you're a researcher and want to try out a new framework, test out a new model, or train new models, the DLAMI and AWS capabilities for scale can alleviate the pain of tedious installations and management of multiple training nodes. You can use EMR and AWS CloudFormation templates to easily launch a full cluster of instances that are ready to go for scalable training.

(6)

Features

NoteWhile your initial choice might be to upgrade your instance type up to a larger instance with more GPUs (up to 8), you can also scale horizontally by creating a cluster of DLAMI instances. To quickly set up a cluster, you can use the predeﬁned AWS CloudFormation template. Check out Related Information (p. 138) for more information on cluster builds.

Features of the DLAMI

Preinstalled Frameworks

There are currently two primary ﬂavors of the DLAMI with other variations related to the operating system (OS) and software versions:

• Deep Learning AMI with Conda (p. 6) - frameworks installed separately using conda packages and separate Python environments

• Deep Learning Base AMI (p. 6) - no frameworks installed; only NVIDIA CUDA and other dependencies

The Deep Learning AMI with Conda uses Anaconda environments to isolate each framework, so you can switch between them at will and not worry about their dependencies conﬂicting.

For more information on selecting the best DLAMI for you, take a look at Getting Started (p. 4).

This is the full list of supported frameworks by Deep Learning AMI with Conda:

• Apache MXNet (Incubating)

• Chainer

• Keras

• PyTorch

• TensorFlow

• TensorFlow 2

NoteWe no longer include the CNTK, Caffe, Caffe2 and Theano Conda environments in the AWS Deep Learning AMI starting with the v28 release. Previous releases of the AWS Deep Learning AMI that contain these environments will continue to be available. However, we will only provide updates to these environments if there are security fixes published by the open source community for these frameworks.

Preinstalled GPU Software

Even if you use a CPU-only instance, the DLAMI will have NVIDIA CUDA and NVIDIA cuDNN. The installed software is the same regardless of the instance type. Keep in mind that GPU-speciﬁc tools only work on an instance that has at least one GPU. More information on this is covered in the Selecting the Instance Type for DLAMI (p. 15).

• Latest version of NVIDIA CUDA

• Latest version of NVIDIA cuDNN

• Older versions of CUDA are available as well. See the CUDA Installations and Framework Bindings (p. 5) for more information.

(7)

Elastic Inference Support

The Deep Learning AMI with Conda comes with environments that support Elastic Inference for both AWS Deep Learning AMI with Ubuntu 16.04 Options (p. 12) and AWS Deep Learning AMI Amazon Linux Options (p. 13). Elastic Inference environments are not currently supported for AWS Deep Learning AMI with Ubuntu 18.04 Options (p. 12) and AWS Deep Learning AMI Amazon Linux 2 Options (p. 14). For tutorials and more information on Elastic Inference, see the Elastic Inference Documentation.

Model Serving and Visualization

Deep Learning AMI with Conda comes preinstalled with two kinds of model servers, one for MXNet and one for TensorFlow, as well as TensorBoard, for model visualizations.

• Model Server for Apache MXNet (MMS) (p. 123)

• TensorFlow Serving (p. 125)

• TensorBoard (p. 51)

(8)

How to Get Started with the DLAMI

Getting Started

How to Get Started with the DLAMI

This guide includes tips about picking the DLAMI that's right for you, selecting an instance type that ﬁts your use case and budget, and Related Information (p. 138) that describes custom setups that may be of interest.

If you're new to using AWS or using Amazon EC2, start with the Deep Learning AMI with Conda (p. 6).

If you're familiar with Amazon EC2 and other AWS services like Amazon EMR, Amazon EFS, or Amazon S3, and are interested in integrating those services for projects that need distributed training or inference, then check out Related Information (p. 138) to see if one ﬁts your use case.

We recommend that you check out Choosing Your DLAMI (p. 4) to get an idea of which instance type might be best for your application.

Another option is this quick tutorial: Launch a AWS Deep Learning AMI (in 10 minutes).

Next Step

Choosing Your DLAMI (p. 4)

Choosing Your DLAMI

We oﬀer a range of DLAMI options. To help you select the correct DLAMI for your use case, we group images by the hardware type or functionality for which they were developed. Our top level groupings are:

• DLAMI Type: CUDA versus Base versus Single-Framework versus Multi-Framework (Conda DLAMI)

• Compute Architecture: x86-based versus Arm-based AWS Graviton

• Processor Type: GPU versus CPU versus Inferentia versus Habana

• SDK: CUDA versus AWS Neuron versus SynapsesAI

• OS: Amazon Linux versus Ubuntu

The rest of the topics in this guide help further inform you and go into more details.

Topics

• CUDA Installations and Framework Bindings (p. 5)

• Deep Learning Base AMI (p. 6)

• Deep Learning AMI with Conda (p. 6)

• DLAMI CPU Architecture Options (p. 8)

• DLAMI Operating System Options (p. 8)

Next Up

(9)

CUDA

Deep Learning AMI with Conda (p. 6)

CUDA Installations and Framework Bindings

While deep learning is all pretty cutting edge, each framework oﬀers "stable" versions. These stable versions may not work with the latest CUDA or cuDNN implementation and features. Your use case and the features you require can help you choose a framework. If you are not sure, then use the latest Deep Learning AMI with Conda. It has oﬃcial pip binaries for all frameworks with CUDA 10, using whichever most recent version is supported by each framework. If you want the latest versions, and to customize your deep learning environment, use the Deep Learning Base AMI.

Look at our guide on Stable Versus Release Candidates (p. 6) for further guidance.

Choose a DLAMI with CUDA

The Deep Learning Base AMI (p. 6) has CUDA 10, 10.1, and 10.2.

The Deep Learning AMI with Conda (p. 6) has CUDA 10, 10.1, and 10.2.

• CUDA 10.1 with cuDNN 7: Apache MXNet (Incubating), PyTorch, and TensorFlow 2

• CUDA 10 with cuDNN 7: TensorFlow and Chainer

NoteWe no longer include the CNTK, Caffe, Caffe2, and Theano Conda environments in the AWS Deep Learning AMI starting with the v28 release. Previous releases of the AWS Deep Learning AMI that contain these environments continue to be available. However, we only provide updates to these environments if there are security fixes published by the open-source community for these frameworks.

For installation options for DLAMI types and operating systems, refer to each of the CUDA version and DLAMI options pages:

• Deep Learning AMI with CUDA 10.2 Options (p. 10)

• Deep Learning AMI with CUDA 10 Options (p. 11)

• Deep Learning AMI with Conda Options (p. 9)

• Deep Learning Base AMI Options (p. 10)

For speciﬁc framework version numbers, see the Release Notes for DLAMI (p. 141)

Choose this DLAMI type or learn more about the diﬀerent DLAMIs with the Next Up option.

Choose one of the CUDA versions and review the full list of DLAMIs that have that version in the Appendix, or learn more about the diﬀerent DLAMIs with the Next Up option.

Next Up

Deep Learning Base AMI (p. 6)

Deep Learning Base AMI

The Deep Learning Base AMI is like an empty canvas for deep learning. It comes with everything you need up until the point of the installation of a particular framework, and has your choice of CUDA versions.

Why to Choose the Base DLAMI

This AMI group is useful for project contributors who want to fork a deep learning project and build the latest. It's for someone who wants to roll their own environment with the conﬁdence that the latest NVIDIA software is installed and working so they can focus on picking which frameworks and versions they want to install.

Choose this DLAMI type or learn more about the diﬀerent DLAMIs with the Next Up option.

Next Up

Deep Learning AMI with Conda (p. 6)

Deep Learning AMI with Conda

The Conda DLAMI uses Anaconda virtual environments. These environments are configured to keep the different framework installations separate and streamline switching between frameworks. This is great for learning and experimenting with all of the frameworks the DLAMI has to offer. Most users find that the new Deep Learning AMI with Conda is perfect for them.

These AMIs are the primary DLAMIs. They are updated often with the latest versions from the frameworks, and have the latest GPU drivers and software. They are generally referred to as the AWS Deep Learning AMI in most documents.

• The Ubuntu 18.04 DLAMI has the following frameworks: Apache MXNet (Incubating), Chainer, PyTorch, TensorFlow, and TensorFlow 2.

• The Ubuntu 16.04 and Amazon Linux DLAMI has the following frameworks: Apache MXNet (Incubating), Chainer, Keras, PyTorch, TensorFlow, and TensorFlow 2.

• The Amazon Linux 2 DLAMI has the following frameworks: Apache MXNet (Incubating), Chainer, PyTorch, TensorFlow, TensorFlow 2, and Keras.

NoteWe no longer include the CNTK, Caffe, Caffe2, and Theano Conda environments in the AWS Deep Learning AMI starting with the v28 release. Previous releases of the AWS Deep Learning AMI that contain these environments continue to be available. However, we only provide updates to these environments if there are security fixes published by the open-source community for these frameworks.

Stable Versus Release Candidates

The Conda AMIs use optimized binaries of the most recent formal releases from each framework.

Release candidates and experimental features are not to be expected. The optimizations depend on the

(11)

Conda

framework's support for acceleration technologies like Intel's MKL DNN, which speeds up training and inference on C5 and C4 CPU instance types. The binaries are also compiled to support advanced Intel instruction sets including but not limited to AVX, AVX-2, SSE4.1, and SSE4.2. These accelerate vector and ﬂoating point operations on Intel CPU architectures. Additionally, for GPU instance types, the CUDA and cuDNN are updated with whichever version the latest oﬃcial release supports.

The Deep Learning AMI with Conda automatically installs the most optimized version of the framework for your Amazon EC2 instance upon the framework's ﬁrst activation. For more information, refer to Using the Deep Learning AMI with Conda (p. 29).

If you want to install from source, using custom or optimized build options, the Deep Learning Base AMI (p. 6)s might be a better option for you.

Python 2 Deprecation

The Python open source community has oﬃcially ended support for Python 2 on January 1, 2020. The TensorFlow and PyTorch community have announced that the TensorFlow 2.1 and PyTorch 1.4 releases are the last ones supporting Python 2. Previous releases of the DLAMI (v26, v25, etc) that contain Python 2 Conda environments continue to be available. However, we provide updates to the Python 2 Conda environments on previously published DLAMI versions only if there are security ﬁxes published by the open-source community for those versions. DLAMI releases with the latest versions of the TensorFlow and PyTorch frameworks do not contain the Python 2 Conda environments.

Elastic Inference Support

The Deep Learning AMI with Conda comes with environments that support Elastic Inference for both AWS Deep Learning AMI with Ubuntu 16.04 Options (p. 12) and AWS Deep Learning AMI Amazon Linux Options (p. 13). Elastic Inference environments are not currently supported for AWS Deep Learning AMI with Ubuntu 18.04 Options (p. 12) and AWS Deep Learning AMI Amazon Linux 2 Options (p. 14). For tutorials and more information about Elastic Inference, see the Elastic Inference Documentation.

CUDA Support

The Deep Learning AMI with Conda's CUDA version and the frameworks supported for each:

• CUDA 10.1 with cuDNN 7: Apache MXNet

• CUDA 10 with cuDNN 7: PyTorch, TensorFlow, TensorFlow 2, Apache MXNet, Chainer

Speciﬁc framework version numbers can be found in the Release Notes for DLAMI (p. 141) Choose this DLAMI type or learn more about the diﬀerent DLAMIs with the Next Up option.

Next Up

DLAMI CPU Architecture Options (p. 8)

DLAMI CPU Architecture Options

AWS Deep Learning AMIs are oﬀered with either x86-based or Arm-based AWS Graviton2 CPU architectures.

Choose one of the Graviton GPU DLAMIs to work with an Arm-based CPU architecture. All other GPU DLAMIs are currently x86-based.

• AWS Deep Learning AMI Graviton GPU CUDA 11.4 (Ubuntu 20.04)

• AWS Deep Learning AMI Graviton GPU TensorFlow 2.6 (Ubuntu 20.04)

• AWS Deep Learning AMI Graviton GPU PyTorch 1.10 (Ubuntu 20.04)

For information about getting started with the Graviton GPU DLAMI, see The Graviton DLAMI (p. 102).

For more details on available instance types, see Selecting the Instance Type for DLAMI (p. 15).

Next Up

DLAMI Operating System Options (p. 8)

DLAMI Operating System Options

DLAMIs are oﬀered in the following operating systems. If you're more familiar with CentOS or RedHat, see AWS Deep Learning AMI Amazon Linux Options (p. 13) or AWS Deep Learning AMI Amazon Linux 2 Options (p. 14). Otherwise, see AWS Deep Learning AMI with Ubuntu 18.04 Options (p. 12) or AWS Deep Learning AMI with Ubuntu 16.04 Options (p. 12).

Choose one of the operating systems and review their full list in the Appendix, or see the next steps for picking your AMI and instance type.

• AWS Deep Learning AMI Amazon Linux Options (p. 13)

• AWS Deep Learning AMI Amazon Linux 2 Options (p. 14)

• AWS Deep Learning AMI with Ubuntu 18.04 Options (p. 12)

As mentioned in the Getting Started overview, you have multiple options for accessing a DLAMI. Before choosing a DLAMI, assess what instance type you need and identify your AWS Region.

Next Up

Selecting the Instance Type for DLAMI (p. 15)

AMI Options

The following topics describe the categories of AWS Deep Learning AMIs.

Topics

(13)

Conda

• Deep Learning AMI with CUDA 10 Options (p. 11)

• AWS Deep Learning AMI Amazon Linux Options (p. 13)

• AWS Deep Learning AMI Amazon Linux 2 Options (p. 14)

Deep Learning AMI with Conda Options

Use the Launching and Conﬁguring a DLAMI (p. 19) guide to continue with one of these DLAMI.

• Deep Learning AMI (Ubuntu 18.04) on the AWS Marketplace

• Deep Learning AMI (Amazon Linux) on the AWS Marketplace

• Deep Learning AMI (Amazon Linux 2) on the AWS Marketplace

These DLAMIs are available in these Regions:

Region Code

US East (Ohio) us-east-2

US East (N. Virginia) us-east-1

GovCloud us-gov-west-1

US West (N. California) us-west-1

US West (Oregon) us-west-2

Beijing (China) cn-north-1

Ningxia (China) cn-northwest-1

Asia Pacific (Mumbai) ap-south-1 Asia Pacific (Seoul) ap-northeast-2 Asia Pacific (Singapore) ap-southeast-1 Asia Pacific (Sydney) ap-southeast-2 Asia Pacific (Tokyo) ap-northeast-1

Canada (Central) ca-central-1

EU (Frankfurt) eu-central-1

EU (Ireland) eu-west-1

EU (London) eu-west-2

EU (Paris) eu-west-3

SA (Sao Paulo) sa-east-1

(14)

Base

Deep Learning Base AMI Options

• Deep Learning Base AMI (Ubuntu 18.04) on the AWS Marketplace

• Deep Learning Base AMI (Amazon Linux) on the AWS Marketplace

• Deep Learning Base AMI (Amazon Linux 2) on the AWS Marketplace

Region Code

Deep Learning AMI with CUDA 10.2 Options

Use the Launching and Conﬁguring a DLAMI (p. 19) guide to continue with one of these DLAMIs.

(15)

CUDA 10.1

NoteThe Deep Learning AMI with Conda has CUDA 10, CUDA 10.1, and CUDA 10.2. The frameworks use the latest CUDA that they support.

The Deep Learning Base AMI has CUDA 10, CUDA 10.1, and CUDA 10.2. To switch between them, follow the directions on Using the Deep Learning Base AMI (p. 32).

Deep Learning AMI with CUDA 10.1 Options

Deep Learning AMI with CUDA 10 Options

(16)

Ubuntu 18.04

AWS Deep Learning AMI with Ubuntu 18.04 Options

Use the Launching and Conﬁguring a DLAMI (p. 19) guide to continue with one of these DLAMI. The Deep Learning AMI with Conda does not support Elastic Inference for Ubuntu 18.04.

Region Code

AWS Deep Learning AMI with Ubuntu 16.04 Options

NoteUbuntu Linux 16.04 LTS reached the end of its ﬁve-year LTS window on April 30, 2021 and is no longer supported by its vendor. There are no longer updates to the Deep Learning Base AMI (Ubuntu 16.04) in new releases as of October 2021. Previous releases continue to be available.

(17)

Amazon Linux

The Deep Learning AMI with Conda comes with environments that support Elastic Inference for Ubuntu 16.04. For tutorials and more information about Elastic Inference, see the Elastic Inference Documentation.

Region Code

AWS Deep Learning AMI Amazon Linux Options

NoteAmazon Linux is end-of-life as of December 2020. There are no longer updates to the Deep Learning AMI (Amazon Linux) in new releases as of October 2021. Previous releases of the Deep Learning AMI (Amazon Linux) continue to be available.

The Deep Learning AMI with Conda comes with environments that support Elastic Inference for Amazon Linux. For tutorials and more information about Elastic Inference, see the Elastic Inference Documentation.

(18)

Amazon Linux 2

Region Code

AWS Deep Learning AMI Amazon Linux 2 Options

Use the Launching and Conﬁguring a DLAMI (p. 19) guide to continue with one of these DLAMIs. The Deep Learning AMI with Conda does not support Elastic Inference for Amazon Linux 2.

Region Code

(19)

Instance Selection

Region Code

US East (N. Virginia) us-east-1 US West (N. California) us-west-1

Asia Pacific (Mumbai) ap-south-1 Asia Pacific (Seoul) ap-northeast-2 Asia Pacific (Osaka) ap-northeast-3 Asia Pacific (Singapore) ap-southeast-1 Asia Pacific (Sydney) ap-southeast-2 Asia Pacific (Tokyo) ap-northeast-1

Amazon Linux 2 Release Notes

Selecting the Instance Type for DLAMI

Consider the following when selecting an instance type for DLAMI.

• If you're new to deep learning, then an instance with a single GPU might suit your needs.

• If you're budget conscious, then you can use CPU-only instances.

• If you're looking to optimize high performance and cost eﬃciency for deep learning model inference, then you can use instances with AWS Inferentia chips.

• If you're looking to optimize high performance and cost eﬃciency for deep learning model training, then you can use instances with Habana accelerators.

• If you're looking for a high performance GPU instance with an Arm-based CPU architecture, then you can use the G5g instance type.

• If you're interested in running a pretrained model for inference and predictions, then you can attach an Amazon Elastic Inference to your Amazon EC2 instance. Amazon Elastic Inference gives you access to an accelerator with a fraction of a GPU.

• For high-volume inference services, a single CPU instance with a lot of memory, or a cluster of such instances, might be a better solution.

• If you're using a large model with a lot of data or a high batch size, then you need a larger instance with more memory. You can also distribute your model to a cluster of GPUs. You may ﬁnd that using an instance with less memory is a better solution for you if you decrease your batch size. This may impact your accuracy and training speed.

(20)

Pricing

• If you’re interested in running machine learning applications using NVIDIA Collective Communications Library (NCCL) requiring high levels of inter-node communications at scale, you might want to use Elastic Fabric Adapter (EFA).

For more detail on instances, see EC2 Instance Types.

The following topics provide information about instance type considerations.

Important

The Deep Learning AMIs include drivers, software, or toolkits developed, owned, or provided by NVIDIA Corporation. You agree to use these NVIDIA drivers, software, or toolkits only on Amazon EC2 instances that include NVIDIA hardware.

Topics

• Pricing for the DLAMI (p. 16)

• DLAMI Region Availability (p. 16)

• Recommended GPU Instances (p. 16)

• Recommended CPU Instances (p. 17)

• Recommended Inferentia Instances (p. 17)

• Recommended Habana Instances (p. 18)

Pricing for the DLAMI

The deep learning frameworks included in the DLAMI are free, and each has its own open-source licenses.

Although the software included in the DLAMI is free, you still have to pay for the underlying Amazon EC2 instance hardware.

Some Amazon EC2 instance types are labeled as free. It is possible to run the DLAMI on one of these free instances. This means that using the DLAMI is entirely free when you only use that instance's capacity.

If you need a more powerful instance with more CPU cores, more disk space, more RAM, or one or more GPUs, then you need an instance that is not in the free-tier instance class.

For more information about instance selection and pricing, see Amazon EC2 pricing.

DLAMI Region Availability

Each Region supports a different range of instance types and often an instance type has a slightly different cost in different Regions. DLAMIs are not available in every Region, but it is possible to copy DLAMIs to the Region of your choice. See Copying an AMI for more information. Note the Region selection list and be sure you pick a Region that's close to you or your customers. If you plan to use more than one DLAMI and potentially create a cluster, be sure to use the same Region for all of nodes in the cluster.

For a more info on Regions, visit EC2 Regions.

Next Up

Recommended GPU Instances (p. 16)

Recommended GPU Instances

We recommend a GPU instance for most deep learning purposes. Training new models is faster on a GPU instance than a CPU instance. You can scale sub-linearly when you have multi-GPU instances or if you

(21)

CPU

use distributed training across many instances with GPUs. To set up distributed training, see Distributed Training (p. 52).

The following instance types support the DLAMI. For information about GPU instance type options and their uses, see EC2 Instance Types and select Accelerated Computing.

NoteThe size of your model should be a factor in selecting an instance. If your model exceeds an instance's available RAM, select a diﬀerent instance type with enough memory for your application.

• Amazon EC2 P3 Instances have up to 8 NVIDIA Tesla V100 GPUs.

• Amazon EC2 P4 Instances have up to 8 NVIDIA Tesla A100 GPUs.

• Amazon EC2 G3 Instances have up to 4 NVIDIA Tesla M60 GPUs.

• Amazon EC2 G4 Instances have up to 4 NVIDIA T4 GPUs.

• Amazon EC2 G5 Instances have up to 8 NVIDIA A10G GPUs.

• Amazon EC2 G5g Instances have Arm-based AWS Graviton2 processors.

DLAMI instances provide tooling to monitor and optimize your GPU processes. For more information about monitoring your GPU processes, see GPU Monitoring and Optimization (p. 81).

For speciﬁc tutorials on working with G5g instances, see The Graviton DLAMI (p. 102).

Next Up

Recommended CPU Instances (p. 17)

Recommended CPU Instances

Whether you're on a budget, learning about deep learning, or just want to run a prediction service, you have many aﬀordable options in the CPU category. Some frameworks take advantage of Intel's MKL DNN, which speeds up training and inference on C5 (not available in all Regions), C4, and C3 CPU instance types. For information about CPU instance types, see EC2 Instance Types and select Compute Optimized.

• Amazon EC2 C5 Instances have up to 72 Intel vCPUs. C5 instances excel at scientiﬁc modeling, batch processing, distributed analytics, high-performance computing (HPC), and machine and deep learning inference.

• Amazon EC2 C4 Instances have up to 36 Intel vCPUs.

Next Up

Recommended Inferentia Instances (p. 17)

Recommended Inferentia Instances

AWS Inferentia instances are designed to provide high performance and cost eﬃciency for deep learning model inference workloads. Speciﬁcally, Inf1 instance types use AWS Inferentia chips and the AWS Neuron SDK, which is integrated with popular machine learning frameworks such as TensorFlow, PyTorch, and MXNet.

(22)

Habana

Customers can use Inf1 instances to run large scale machine learning inference applications such as search, recommendation engines, computer vision, speech recognition, natural language processing, personalization, and fraud detection, at the lowest cost in the cloud.

• Amazon EC2 Inf1 Instances have up to up to 16 AWS Inferentia chips and 100 Gbps of networking throughput.

For more information about getting started with AWS Inferentia DLAMIs, see The AWS Inferentia Chip With DLAMI (p. 87).

Next Up

Recommended Habana Instances (p. 18)

Recommended Habana Instances

Instances with Habana accelerators are designed to provide high performance and cost efficiency for deep learning model training workloads. Specifically, DL1 instance types use Habana Gaudi accelerators from Habana Labs, an Intel company. DL1 instances are ideal for training machine learning models used in applications such as natural language processing, object detection and classification, recommendation engines, and autonomous vehicle perception.

Instances with Habana accelerators are conﬁgured with Habana SynapseAI software and pre-integrated with popular machine learning frameworks such as TensorFlow and PyTorch. If you are looking for an optimal combination of performance and price for training deep learning models, consider instances with Habana accelerators for the lowest cost to train.

Note

The size of your model should be a factor in selecting an instance. If your model exceeds an instance's available RAM, select a diﬀerent instance type with enough memory for your application.

• Amazon EC2 DL1 Instances have up to eight Habana Gaudi accelerators, 256GB of accelerator memory, 4TB of local NVMe storage, and 400 Gbps of networking throughput.

For more information about getting started with Habana DLAMIs, see The Habana DLAMI (p. 109).

(23)

Step 1: Launch a DLAMI

Launching and Conﬁguring a DLAMI

If you're here you should already have a good idea of which AMI you want to launch. If not, choose a DLAMI using the AMI selection guidelines found throughout Getting Started (p. 4) or use the full listing of AMIs in the Appendix section, AMI Options (p. 8).

You should also know which instance type and region you're going to choose. If not, browse Selecting the Instance Type for DLAMI (p. 15).

Note

We will use p2.xlarge as the default instance type in the examples. Just replace this with whichever instance type you have in mind.

Important

If you plan to use Elastic Inference, you have Elastic Inference Setup that must be completed prior to launching your DLAMI.

Topics

• Step 1: Launch a DLAMI (p. 19)

• DLAMI ID (p. 20)

• EC2 Console (p. 20)

• Marketplace Search (p. 21)

• Step 2: Connect to the DLAMI (p. 21)

• Step 3: Secure Your DLAMI Instance (p. 21)

• Step 4: Test Your DLAMI (p. 22)

• Clean Up (p. 22)

• Set up a Jupyter Notebook Server (p. 22)

Step 1: Launch a DLAMI

Note

For this walkthrough, we might make references speciﬁc to the Deep Learning AMI (Ubuntu 16.04). Even if you select a diﬀerent DLAMI, you should be able to follow this guide.

Launch the instance

1. You have a couple routes for launching DLAMI. Choose one:

• EC2 Console (p. 20)

• Marketplace Search (p. 21)

TipCLI Option: If you choose to spin up a DLAMI using the AWS CLI, you will need the AMI's ID, the region and instance type, and your security token information. Be sure you have your AMI and instance IDs ready. If you haven't set up the AWS CLI yet, do that ﬁrst using the guide for Installing the AWS Command Line Interface.

2. After you have completed the steps of one of those options, wait for the instance to be ready. This usually takes only a few minutes. You can verify the status of the instance in the EC2 Console.

(24)

DLAMI ID

Find the ID for the DLAMI of your choice with the AWS Command Line Interface (AWS CLI). If you do not already have the AWS CLI installed, see Getting started with the AWS CLI.

1. Make sure that your AWS credentials are conﬁgured.

aws configure

2. Choose a DLAMI and check the details in the release notes. Use the following command to get the ID for the DLAMI of your choice:

aws ec2 describe-images --region us-east-1 --owners amazon \

--filters 'Name=name,Values=Deep Learning AMI (Ubuntu 18.04) Version ??.?' 'Name=state,Values=available' \

--query 'reverse(sort_by(Images, &CreationDate))[:1].ImageId' --output text

NoteYou can specify a release version for a given framework or get the latest release by replacing the version number with a question mark.

3. The output should look similar to the following:

ami-094c089c38ed069f2

Copy this DLAMI ID and press q to exit the prompt.

Next Step

EC2 Console (p. 20)

EC2 Console

Note

To launch an instance with Elastic Fabric Adapter (EFA), refer to these steps.

1. Open the EC2 Console.

2. Note your current region in the top-most navigation. If this isn't your desired AWS Region, change this option before proceeding. For more information, see EC2 Regions.

3. Choose Launch Instance.

4. Search for the desired instance by name:

a. Select the DLAMI that is right for you. Find the DLAMI name as listed in the release notes or ﬁnd the DLAMI ID using the AWS CLI.

b. Choose Community AMIs.

i. To view a selection of the latest DLAMIs, choose Quick Start.

ii. Choose AWS Marketplace to browse additional DLAMIs. Only a subset of available DLAMIs will be listed here.

c. Enter the DLAMI name or search the DLAMI ID. Browse the options and then click Select on your choice.

5. Review the details, and then choose Continue.

6. Choose an instance type. For recommendations on DLAMI instance types, see Instance Selection.

(25)

Marketplace Search

NoteIf you want to use Elastic Inference (EI), click Conﬁgure Instance Details, select Add an Amazon EI accelerator, then select the size of the Amazon EI accelerator.

7. Choose Review and Launch.

8. Review the details and pricing. Choose Launch.

TipCheck out Get Started with Deep Learning Using the AWS Deep Learning AMI for a walk-through with screenshots!

Next Step

Step 2: Connect to the DLAMI (p. 21)

Marketplace Search

1. Browse the AWS Marketplace and search for AWS Deep Learning AMI.

2. Browse the options, and then click Select on your choice.

3. Review the details, and then choose Continue.

4. Review the details and make note of the Region. If this isn't your desired AWS Region, change this option before proceeding. For more information, see EC2 Regions.

5. Choose an instance type.

6. Choose a key pair, use your default one, or create a new one.

7. Review the details and pricing.

8. Choose Launch with 1-Click.

Next Step

Step 2: Connect to the DLAMI (p. 21)

Step 2: Connect to the DLAMI

Connect to the DLAMI that you launched from a client (Windows, MacOS, or Linux). For more information, see Connect to Your Linux Instance in the Amazon EC2 User Guide for Linux Instances.

Keep a copy of the SSH login command handy if you want to do the Jupyter setup after logging in. You will use a variation of it to connect to the Jupyter webpage.

Next Step

Step 3: Secure Your DLAMI Instance (p. 21)

Step 3: Secure Your DLAMI Instance

Always keep your operating system and other installed software up to date by applying patches and updates as soon as they become available.

If you are using Amazon Linux or Ubuntu, when you login to your DLAMI, you are notiﬁed if updates are available and see instructions for updating. For further information on Amazon Linux maintenance, see Updating Instance Software. For Ubuntu instances, refer to the oﬃcial Ubuntu documentation.

(26)

Step 4: Test Your DLAMI

On Windows, check Windows Update regularly for software and security updates. If you prefer, have updates applied automatically.

Important

For information about the Meltdown and Spectre vulnerabilities and how to patch your operating system to address them, see Security Bulletin AWS-2018-013.

Step 4: Test Your DLAMI

Depending on your DLAMI version, you have diﬀerent testing options:

• Deep Learning AMI with Conda (p. 6) – go to Using the Deep Learning AMI with Conda (p. 29).

• Deep Learning Base AMI (p. 6) – refer to your desired framework's installation documentation.

You can also create a Jupyter notebook, try out tutorials, or start coding in Python. For more information, see Set up a Jupyter Notebook Server (p. 22).

Clean Up

When you no longer need the DLAMI, you can stop it or terminate it to avoid incurring continuing charges. Stopping an instance will keep it around so you can resume it later. Your conﬁgurations, ﬁles, and other non-volatile information is being stored in a volume on Amazon S3. You will be charged the small S3 fee to retain the volume while the instance is stopped, but you will no longer be charged for the compute resources while it is in the stopped state. When your start the instance again, it will mount that volume and your data will be there. If you terminate an instance, it is gone, and you cannot start it again. Your data actually still resides on S3, so to prevent any further charges you need to delete the volume as well. For more instructions, see Terminate Your Instance in the Amazon EC2 User Guide for Linux Instances.

Set up a Jupyter Notebook Server

A Jupyter notebook server enables you to create and run Jupyter notebooks from your DLAMI instance.

With Jupyter notebooks, you can conduct machine learning (ML) experiments for training and inference while using the AWS infrastructure and accessing packages built into the DLAMI. For more information about Jupyter notebooks, see the Jupyter Notebook documentation.

To set up a Jupyter notebook server, you must:

• Conﬁgure the Jupyter notebook server on your Amazon EC2 DLAMI instance.

• Conﬁgure your client so that you can connect to the Jupyter notebook server. We provide conﬁguration instructions for Windows, macOS, and Linux clients.

• Test the setup by logging in to the Jupyter notebook server.

To complete the steps to set up a Jupyter, follow the instructions in the following topics. Once you've set up a Jupyter notebook server, see Running Jupyter Notebook Tutorials (p. 33) for information on running the example notebooks that ship in the DLAMI.

Topics

• Secure Your Jupyter Server (p. 23)

• Start the Jupyter notebook server (p. 23)

• Conﬁgure the Client to Connect to the Jupyter Server (p. 24)

(27)

Secure Jupyter

• Test by Logging in to the Jupyter notebook server (p. 25)

Secure Your Jupyter Server

Here we set up Jupyter with SSL and a custom password.

Connect to the Amazon EC2 instance, and then complete the following procedure.

Conﬁgure the Jupyter server

1. Jupyter provides a password utility. Run the following command and enter your preferred password at the prompt.

$ jupyter notebook password

The output will look something like this:

Enter password:

Verify password:

[NotebookPasswordApp] Wrote hashed password to /home/ubuntu/.jupyter/

jupyter_notebook_config.json

2. Create a self-signed SSL certificate. Follow the prompts to fill out your locality as you see fit. You must enter . if you wish to leave a prompt blank. Your answers will not impact the functionality of the certificate.

$ cd ~

$ mkdir ssl

$ cd ssl

$ openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout mykey.key -out mycert.pem

Note

You might be interested in creating a regular SSL certiﬁcate that is third party signed and does not cause the browser to give you a security warning. This process is much more involved. Visit Jupyter's documention for more information.

Next Step

Start the Jupyter notebook server (p. 23)

Start the Jupyter notebook server

Now you can ﬁre up the Jupyter server by logging in to the instance and running the following command that uses the SSL certiﬁcate you created in the previous step.

$ jupyter notebook --certfile=~/ssl/mycert.pem --keyfile ~/ssl/mykey.key

With the server started, you can now connect to it via an SSH tunnel from your client computer. When the server runs, you will see some output from Jupyter conﬁrming that the server is running. At this point, ignore the callout that you can access the server via a localhost URL, because that won't work until you create the tunnel.

NoteJupyter will handle switching environments for you when you switch frameworks using the Jupyter web interface. More info on this can be found in Switching Environments with Jupyter (p. 34).

(28)

Conﬁgure Client

Next Step

Conﬁgure the Client to Connect to the Jupyter Server (p. 24)

Conﬁgure the Client to Connect to the Jupyter Server

After conﬁguring your client to connect to the Jupyter notebook server, you can create and access notebooks on the server in your workspace and run your deep learning code on the server.

For conﬁguration information, choose one of the following links.

Topics

• Conﬁgure a Windows Client (p. 24)

• Conﬁgure a Linux or macOS Client (p. 24)

Conﬁgure a Windows Client

Prepare

Be sure you have the following information, which you need to set up the SSH tunnel:

• The public DNS name of your Amazon EC2 instance. You can ﬁnd the public DNS name in the EC2 console.

• The key pair for the private key ﬁle. For more information about accessing your key pair, see Amazon EC2 Key Pairs in the Amazon EC2 User Guide for Linux Instances.

Using Jupyter Notebooks from a Windows Client

Refer to these guides on connecting to your Amazon EC2 instance from a Windows client.

1.Troubleshooting Connecting to Your Instance

2.Connecting to Your Linux Instance from Windows Using PuTTY

To create a tunnel to a running Jupyter server, a recommended approach is to install Git Bash on your Windows client, then follow the Linux/macOS client instructions. However, you may use whatever approach you want for opening an SSH tunnel with port mapping. Refer to Jupyter's documentation for further information.

Next Step

Conﬁgure a Linux or macOS Client (p. 24)

Conﬁgure a Linux or macOS Client

1. Open a terminal.

2. Run the following command to forward all requests on local port 8888 to port 8888 on your remote Amazon EC2 instance. Update the command by replacing the location of your key to access the Amazon EC2 instance and the public DNS name of your Amazon EC2 instance. Note, for an Amazon Linux AMI, the user name is ec2-user instead of ubuntu.

$ ssh -i ~/mykeypair.pem -N -f -L 8888:localhost:8888 ubuntu@ec2-###-##-##-

###.compute-1.amazonaws.com

(29)

Log in to the Jupyter notebook server

This command opens a tunnel between your client and the remote Amazon EC2 instance that is running the Jupyter notebook server.

Next Step

Test by Logging in to the Jupyter notebook server (p. 25)

Test by Logging in to the Jupyter notebook server

Now you are ready to log in to the Jupyter notebook server.

Your next step is to test the connection to the server through your browser.

1. In the address bar of your browser, type the following URL, or click on this link: https://

localhost:8888

2. With a self signed SSL certiﬁcate, your browser will warn you and prompt you to avoid continuing to visit the website.

(30)

Since you set this up yourself, it is safe to continue. Depending your browser you will get an

"advanced", "show details", or similar button.

(31)

(32)

Click on this, then click on the "proceed to localhost" link. If the connection is successful, you see the Jupyter notebook server webpage. At this point, you will be asked for the password you previously setup.

Now you have access to the Jupyter notebook server that is running on the DLAMI. You can create new notebooks or run the provided Tutorials (p. 34).

(33)

Conda DLAMI

Using a DLAMI

Topics

• Using the Deep Learning AMI with Conda (p. 29)

• Using the Deep Learning Base AMI (p. 32)

• Running Jupyter Notebook Tutorials (p. 33)

• Tutorials (p. 34)

The following sections describe how the Deep Learning AMI with Conda can be used to switch environments, run sample code from each of the frameworks, and run Jupyter so you can try out diﬀerent notebook tutorials.

Using the Deep Learning AMI with Conda

Topics

• Introduction to the Deep Learning AMI with Conda (p. 29)

• Log in to Your DLAMI (p. 29)

• Start the TensorFlow Environment (p. 30)

• Switch to the PyTorch Python 3 Environment (p. 31)

• Switch to the MXNet Python 3 Environment (p. 31)

• Removing Environments (p. 32)

Introduction to the Deep Learning AMI with Conda

Conda is an open source package management system and environment management system that runs on Windows, macOS, and Linux. Conda quickly installs, runs, and updates packages and their dependencies. Conda easily creates, saves, loads and switches between environments on your local computer.

The Deep Learning AMI with Conda has been conﬁgured for you to easily switch between deep learning environments. The following instructions guide you on some basic commands with conda. They also help you verify that the basic import of the framework is functioning, and that you can run a couple simple operations with the framework. You can then move on to more thorough tutorials provided with the DLAMI or the frameworks' examples found on each frameworks' project site.

Log in to Your DLAMI

After you log in to your server, you will see a server "message of the day" (MOTD) describing various Conda commands that you can use to switch between the diﬀerent deep learning frameworks. Below is an example MOTD. Your speciﬁc MOTD may vary as new versions of the DLAMI are released.

Note

We no longer include the CNTK, Caffe, Caffe2 and Theano Conda environments in the AWS Deep Learning AMI starting with the v28 release. Previous releases of the AWS Deep Learning AMI that contain these environments will continue to be available. However, we will only provide updates to these environments if there are security fixes published by the open source community for these frameworks.

=============================================================================

__| __|_ )

(34)

Start the TensorFlow Environment

_| ( / Deep Learning AMI (Ubuntu 18.04) Version 40.0 ___|\___|___|

=============================================================================

Welcome to Ubuntu 18.04.5 LTS (GNU/Linux 5.4.0-1037-aws x86_64v)

Please use one of the following commands to start the required environment with the framework of your choice:

for AWS MX 1.7 (+Keras2) with Python3 (CUDA 10.1 and Intel MKL-DNN) _______________________________ source activate mxnet_p36

for AWS MX 1.8 (+Keras2) with Python3 (CUDA + and Intel MKL-DNN) ___________________________ source activate mxnet_latest_p37

for AWS MX(+AWS Neuron) with Python3 ___________________________________________________

source activate aws_neuron_mxnet_p36

for AWS MX(+Amazon Elastic Inference) with Python3 _______________________________________

source activate amazonei_mxnet_p36

for TensorFlow(+Keras2) with Python3 (CUDA + and Intel MKL-DNN) _____________________________ source activate tensorflow_p37

for Tensorflow(+AWS Neuron) with Python3 _________________________________________ source activate aws_neuron_tensorflow_p36

for TensorFlow 2(+Keras2) with Python3 (CUDA 10.1 and Intel MKL-DNN) _______________________ source activate tensorflow2_p36

for TensorFlow 2.3 with Python3.7 (CUDA + and Intel MKL-DNN) ________________________

source activate tensorflow2_latest_p37

for PyTorch 1.4 with Python3 (CUDA 10.1 and Intel MKL)

_________________________________________ source activate pytorch_p36

for PyTorch 1.7.1 with Python3.7 (CUDA 11.0 and Intel MKL) ________________________________

source activate pytorch_latest_p37

for PyTorch (+AWS Neuron) with Python3 ______________________________________________

source activate aws_neuron_pytorch_p36 for base Python3 (CUDA 10.0)

_______________________________________________________________________ source activate python3

Each Conda command has the following pattern:

source activate framework_python-version

For example, you may see for MXNet(+Keras1) with Python3 (CUDA 10.1)

_____________________ source activate mxnet_p36, which signiﬁes that the environment has MXNet, Keras 1, Python 3, and CUDA 10.1. And to activate this environment, the command you would use is:

$ source activate mxnet_p36

Start the TensorFlow Environment

NoteWhen you launch your ﬁrst Conda environment, please be patient while it loads. The Deep Learning AMI with Conda automatically installs the most optimized version of the framework for your EC2 instance upon the framework's ﬁrst activation. You should not expect subsequent delays.

1. Activate the TensorFlow virtual environment for Python 3.

$ source activate tensorflow_p37 2. Start the iPython terminal.

(tensorflow_37)$ ipython

(35)

Switch to the PyTorch Python 3 Environment

3. Run a quick TensorFlow program.

import tensorflow as tf

hello = tf.constant('Hello, TensorFlow!') sess = tf.Session()

print(sess.run(hello))

You should see "Hello, Tensorﬂow!"

Next Up

Running Jupyter Notebook Tutorials (p. 33)

Switch to the PyTorch Python 3 Environment

If you're still in the iPython console, use quit(), then get ready to switch environments.

• Activate the PyTorch virtual environment for Python 3.

$ source activate pytorch_p36

Test Some PyTorch Code

To test your installation, use Python to write PyTorch code that creates and prints an array.

1. Start the iPython terminal.

(pytorch_p36)$ ipython 2. Import PyTorch.

import torch

You might see a warning message about a third-party package. You can ignore it.

3. Create a 5x3 matrix with the elements initialized randomly. Print the array.

x = torch.rand(5, 3) print(x)

Verify the result.

tensor([[0.3105, 0.5983, 0.5410], [0.0234, 0.0934, 0.0371], [0.9740, 0.1439, 0.3107], [0.6461, 0.9035, 0.5715], [0.4401, 0.7990, 0.8913]])

Switch to the MXNet Python 3 Environment

If you're still in the iPython console, use quit(), then get ready to switch environments.

(36)

Removing Environments

• Activate the MXNet virtual environment for Python 3.

$ source activate mxnet_p36

Test Some MXNet Code

To test your installation, use Python to write MXNet code that creates and prints an array using the NDArray API. For more information, see NDArray API.

1. Start the iPython terminal.

(mxnet_p36)$ ipython 2. Import MXNet.

import mxnet as mx

You might see a warning message about a third-party package. You can ignore it.

3. Create a 5x5 matrix, an instance of the NDArray, with elements initialized to 0. Print the array.

mx.ndarray.zeros((5,5)).asnumpy()

Verify the result.

array([[ 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0.],

[ 0., 0., 0., 0., 0.]], dtype=float32)

You can ﬁnd more examples of MXNet in the MXNet tutorials section.

Removing Environments

If you run out of space on the DLAMI, you can choose to uninstall Conda packages that you are not using:

conda env list

conda env remove –-name <env_name>

Using the Deep Learning Base AMI

The Base AMI comes with a foundational platform of GPU drivers and acceleration libraries to deploy your own customized deep learning environment. By default the AMI is conﬁgured with the NVIDIA CUDA 10.0 environment. You can also switch between diﬀerent versions of CUDA. Refer to the following instructions for how to do this.

(37)

Conﬁguring CUDA Versions

You can select and verify a particular CUDA version with the following bash commands.

TipYou can verify the CUDA version by running NVIDIA's nvcc program.

$ nvcc --version

• CUDA 11.0:

$ sudo rm /usr/local/cuda

$ sudo ln -s /usr/local/cuda-11.0 /usr/local/cuda

• CUDA 10.2:

• CUDA 10.1:

• CUDA 10.0:

Running Jupyter Notebook Tutorials

Tutorials and examples ship with each of the deep learning projects' source and in most cases they will run on any DLAMI. If you chose the Deep Learning AMI with Conda (p. 6), you get the added beneﬁt of a few hand-picked tutorials already set up and ready to try out.

Important

To run the Jupyter notebook tutorials installed on the DLAMI, you will need to Set up a Jupyter Notebook Server (p. 22).

Once the Jupyter server is running, you can run the tutorials through your web browser. If you are running the Deep Learning AMI with Conda or if you have set up Python environments, you can switch Python kernels from the Jupyter notebook interface. Select the appropriate kernel before trying to run a framework-speciﬁc tutorial. Further examples of this are provided for users of the Deep Learning AMI with Conda.

NoteMany tutorials require additional Python modules that may not be set up on your DLAMI. If you get an error like "xyz module not found", log in to the DLAMI, activate the environment as described above, then install the necessary modules.

TipDeep learning tutorials and examples often rely on one or more GPUs. If your instance type doesn't have a GPU, you may need to change some of the example's code to get it to run.

(38)

Navigating the Installed Tutorials

Once you're logged in to the Jupyter server and can see the tutorials directory (on Deep Learning AMI with Conda only), you will be presented with folders of tutorials by each framework name. If you don't see a framework listed, then tutorials are not available for that framework on your current DLAMI. Click on the name of the framework to see the listed tutorials, then click a tutorial to launch it.

The ﬁrst time you run a notebook on the Deep Learning AMI with Conda, it will want to know which environment you would like to use. It will prompt you to select from a list. Each environment is named according to this pattern:

Environment (conda_framework_python-version)

For example, you might see Environment (conda_mxnet_p36), which signiﬁes that the environment has MXNet and Python 3. The other variation of this would be Environment (conda_mxnet_p27), which signiﬁes that the environment has MXNet and Python 2.

TipIf you're concerned about which version of CUDA is active, one way to see it is in the MOTD when you ﬁrst log in to the DLAMI.

Switching Environments with Jupyter

If you decide to try a tutorial for a diﬀerent framework, be sure to verify the currently running kernel.

This info can be seen in the upper right of the Jupyter interface, just below the logout button. You can change the kernel on any open notebook by clicking the Jupyter menu item Kernel, then Change Kernel, and then clicking the environment that suits the notebook you're running.

At this point you'll need to rerun any cells because a change in the kernel will erase the state of anything you've run previously.

TipSwitching between frameworks can be fun and educational, however you can run out of memory. If you start running into errors, look at the terminal window that has the Jupyter server running. There are helpful messages and error logging here, and you may see an out- of-memory error. To ﬁx this, you can go to the home page of your Jupyter server, click the Running tab, then click Shutdown for each of the tutorials that are probably still running in the background and eating up all of your memory.

Next Up

For more examples and sample code from each framework, click Next or continue to Apache MXNet (Incubating) (p. 35).

Tutorials

The following are tutorials on how to use the Deep Learning AMI with Conda's software.

Topics

• 10 Minute Tutorials (p. 35)

• Activating Frameworks (p. 35)

• Debugging and Visualization (p. 49)

• Distributed Training (p. 52)

(39)

10 Minute Tutorials

• Elastic Fabric Adapter (p. 69)

• GPU Monitoring and Optimization (p. 81)

• The AWS Inferentia Chip With DLAMI (p. 87)

• The Graviton DLAMI (p. 102)

• The Habana DLAMI (p. 109)

• Inference (p. 110)

• Using Frameworks with ONNX (p. 114)

• Model Serving (p. 122)

10 Minute Tutorials

• Launch a AWS Deep Learning AMI (in 10 minutes)

• Train a Deep Learning model with AWS Deep Learning Containers on Amazon EC2 (in 10 minutes)

Activating Frameworks

The following are the deep learning frameworks installed on the Deep Learning AMI with Conda. Click on a framework to learn how to activate it.

Topics

• Apache MXNet (Incubating) (p. 35)

• Caﬀe2 (p. 37)

• Chainer (p. 38)

• CNTK (p. 38)

• Keras (p. 40)

• PyTorch (p. 41)

• TensorFlow (p. 43)

• TensorFlow 2 (p. 45)

• TensorFlow with Horovod (p. 46)

• TensorFlow 2 with Horovod (p. 47)

• Theano (p. 48)

Apache MXNet (Incubating)

Activating Apache MXNet (Incubating)

This tutorial shows how to activate MXNet on an instance running the Deep Learning AMI with Conda (DLAMI on Conda) and run a MXNet program.

When a stable Conda package of a framework is released, it's tested and pre-installed on the DLAMI. If you want to run the latest, untested nightly build, you can Installing MXNet's Nightly Build (experimental) (p. 36) manually.

To run MXNet on the DLAMI with Conda

1. To activate the framework, open an Amazon Elastic Compute Cloud (Amazon EC2) instance of the DLAMI with Conda.

Deep Learning AMI

Deep Learning AMI

Developer Guide

Deep Learning AMI: Developer Guide

Table of Contents

What Is the AWS Deep Learning AMI?

About This Guide

Prerequisites

Example DLAMI Uses

Features of the DLAMI

Preinstalled Frameworks

Preinstalled GPU Software

Elastic Inference Support

Model Serving and Visualization

Getting Started

How to Get Started with the DLAMI

Choosing Your DLAMI

CUDA Installations and Framework Bindings

Choose a DLAMI with CUDA

Related Topics

Deep Learning Base AMI

Why to Choose the Base DLAMI

Related Topics

Deep Learning AMI with Conda

Stable Versus Release Candidates

Python 2 Deprecation

Elastic Inference Support

CUDA Support

Related Topics

DLAMI CPU Architecture Options

DLAMI Operating System Options

AMI Options

Deep Learning AMI with Conda Options

Deep Learning Base AMI Options

Deep Learning AMI with CUDA 10.2 Options

Deep Learning AMI with CUDA 10.1 Options

Deep Learning AMI with CUDA 10 Options

AWS Deep Learning AMI with Ubuntu 18.04 Options

AWS Deep Learning AMI with Ubuntu 16.04 Options

AWS Deep Learning AMI Amazon Linux Options

AWS Deep Learning AMI Amazon Linux 2 Options

Selecting the Instance Type for DLAMI

Pricing for the DLAMI

DLAMI Region Availability

Recommended GPU Instances

Recommended CPU Instances

Recommended Inferentia Instances

Recommended Habana Instances

Launching and Conﬁguring a DLAMI

Step 1: Launch a DLAMI

DLAMI ID

EC2 Console

Marketplace Search

Step 2: Connect to the DLAMI

Step 3: Secure Your DLAMI Instance

Step 4: Test Your DLAMI

Clean Up

Set up a Jupyter Notebook Server

Secure Your Jupyter Server

Start the Jupyter notebook server

Conﬁgure the Client to Connect to the Jupyter Server

Conﬁgure a Windows Client

Prepare

Using Jupyter Notebooks from a Windows Client

Conﬁgure a Linux or macOS Client

Test by Logging in to the Jupyter notebook server

Using a DLAMI

Using the Deep Learning AMI with Conda

Introduction to the Deep Learning AMI with Conda

Log in to Your DLAMI

Start the TensorFlow Environment

Switch to the PyTorch Python 3 Environment

Test Some PyTorch Code

Switch to the MXNet Python 3 Environment

Test Some MXNet Code

Removing Environments

Using the Deep Learning Base AMI

Using the Deep Learning Base AMI

Conﬁguring CUDA Versions

Running Jupyter Notebook Tutorials