• 沒有找到結果。

Amazon Elastic Inference

N/A
N/A
Protected

Academic year: 2022

Share "Amazon Elastic Inference"

Copied!
110
0
0

加載中.... (立即查看全文)

全文

(1)

Amazon Elastic Inference

Developer Guide

(2)

Amazon Elastic Inference: Developer Guide

Copyright © Amazon Web Services, Inc. and/or its affiliates. All rights reserved.

Amazon's trademarks and trade dress may not be used in connection with any product or service that is not Amazon's, in any manner that is likely to cause confusion among customers, or in any manner that disparages or discredits Amazon. All other trademarks not owned by Amazon are the property of their respective owners, who may or may not be affiliated with, connected to, or sponsored by Amazon.

(3)

Table of Contents

What Is Amazon Elastic Inference? ... 1

Prerequisites ... 1

Pricing for Amazon Elastic Inference ... 1

Elastic Inference Uses ... 1

Elastic Inference Basics ... 2

Elastic Inference Uses ... 1

Getting Started ... 3

Amazon Elastic Inference Service Limits ... 3

Choosing an Instance and Accelerator Type for Your Model ... 5

Using Amazon Elastic Inference with EC2 Auto Scaling ... 5

Working with Amazon Elastic Inference ... 6

Setting Up ... 6

Configuring Your Security Groups for Elastic Inference ... 6

Configuring AWS PrivateLink Endpoint Services ... 7

Configuring an Instance Role with an Elastic Inference Policy ... 8

Launching an Instance with Elastic Inference ... 9

TensorFlow Models ... 11

Elastic Inference Enabled TensorFlow ... 11

Additional Requirements and Considerations ... 12

TensorFlow Elastic Inference with Python ... 12

TensorFlow 2 Elastic Inference with Python ... 21

MXNet Models ... 30

More Models and Resources ... 31

MXNet Elastic Inference with Python ... 31

MXNet Elastic Inference with Deep Java Library (DJL) ... 51

PyTorch Models ... 62

Compile Elastic Inference-enabled PyTorch models ... 63

Additional Requirements and Considerations ... 64

PyTorch Elastic Inference with Python ... 65

Monitoring Elastic Inference Accelerators ... 70

EI_VISIBLE_DEVICES ... 70

EI Tool ... 71

Health Check ... 74

MXNet Elastic Inference with SageMaker ... 75

Using Amazon Deep Learning Containers With Elastic Inference ... 76

Using Amazon Deep Learning Containers with Amazon Elastic Inference on Amazon EC2 ... 76

Prerequisites ... 76

Using TensorFlow Elastic Inference accelerators on EC2 ... 77

Using MXNet Elastic Inference accelerators on Amazon EC2 ... 78

Using PyTorch Elastic Inference accelerators on Amazon EC2 ... 79

Using Deep Learning Containers with Amazon Deep Learning Containers on Amazon ECS ... 80

Prerequisites ... 80

Using TensorFlow Elastic Inference accelerators on Amazon ECS ... 81

Using MXNet Elastic Inference accelerators on Amazon ECS ... 83

Using PyTorch Elastic Inference accelerators on Amazon ECS ... 86

Using Amazon Deep Learning Containers with Elastic Inference on Amazon SageMaker ... 89

Security ... 90

Identity and Access Management ... 90

Authenticating With Identities ... 91

Managing Access Using Policies ... 92

Logging and Monitoring ... 94

Compliance Validation ... 94

Resilience ... 95

Infrastructure Security ... 95

(4)

Configuration and Vulnerability Analysis ... 95

Using CloudWatch Metrics to Monitor Elastic Inference ... 96

Elastic Inference Metrics and Dimensions ... 96

Creating CloudWatch Alarms to Monitor Elastic Inference ... 98

Troubleshooting ... 99

Issues Launching Accelerators ... 99

Resolving Configuration Issues ... 99

Issues Running AWS Batch ... 99

Resolving Permission Issues ... 100

Stop and Start the Instance ... 100

Troubleshooting Model Performance ... 100

Submitting Feedback ... 100

Amazon Elastic Inference Error Codes ... 101

Document History ... 105

AWS glossary ... 106

(5)

What Is Amazon Elastic Inference?

Amazon Elastic Inference (Elastic Inference) is a resource you can attach to your Amazon Elastic Compute Cloud CPU instances, Amazon Deep Learning Containers, and SageMaker instances. Elastic Inference helps you accelerate your deep learning (DL) inference workloads. Elastic Inference accelerators come in multiple sizes and help you build intelligent capabilities into your applications.

Elastic Inference distributes model operations defined by TensorFlow, Apache MXNet (MXNet), and PyTorch between low-cost, DL inference accelerators and the CPU of the instance. Elastic Inference also supports the open neural network exchange (ONNX) format through MXNet.

Prerequisites

You need an Amazon Web Services account and should be familiar with launching an Amazon EC2, Amazon Deep Learning Containers, or SageMaker instances to successfully run Amazon Elastic Inference.

To launch an Amazon EC2 instance, complete the steps in Setting up with Amazon EC2. Amazon S3 resources are required for installing packages via pip. For more information about setting up Amazon S3 resources, see the Amazon Simple Storage Service User Guide.

Pricing for Amazon Elastic Inference

You are charged for each second that an Elastic Inference accelerator is attached to an instance in the running state. You are not charged for an accelerator attached to an instance that is in the pending, stopping, stopped, shutting-down, or terminated state. You are also not charged when an Elastic Inference accelerator is in the unknown or impaired state.

You do not incur AWS PrivateLink charges for VPC endpoints to the Elastic Inference service when you have accelerators provisioned in the subnet.

For more information about pricing by Region for Elastic Inference, see Elastic Inference Pricing.

Elastic Inference Uses

You can use Elastic Inference in the following use cases:

• For Elastic Inference-enabled TensorFlow and TensorFlow 2 with Python, see Using TensorFlow Models with Elastic Inference (p. 11)

• For Elastic Inference-enabled MXNet with Python, Java, and Scala, see Using MXNet Models with Elastic Inference (p. 30)

• For Elastic Inference-enabled PyTorch with Python, see Using PyTorch Models with Elastic Inference (p. 62)

• For Elastic Inference with SageMaker, see MXNet Elastic Inference with SageMaker (p. 75)

• For Amazon Deep Learning Containers with Elastic Inference on Amazon EC2, Amazon ECS, and SageMaker, see Using Amazon Deep Learning Containers With Elastic Inference (p. 76)

(6)

• For security information on Elastic Inference, see Security in Amazon Elastic Inference (p. 90)

• To troubleshoot your Elastic Inference workflow, see Troubleshooting (p. 99)

Next Up

Amazon Elastic Inference Basics (p. 2)

Amazon Elastic Inference Basics

When you configure an Amazon EC2 instance to launch with an Elastic Inference accelerator, AWS finds available accelerator capacity. It then establishes a network connection between your instance and the accelerator.

The following Elastic Inference accelerator types are available. You can attach any Elastic Inference accelerator type to any Amazon EC2 instance type.

Accelerator Type FP32 Throughput

(TFLOPS) FP16 Throughput

(TFLOPS) Memory (GB)

eia2.medium 1 8 2

eia2.large 2 16 4

eia2.xlarge 4 32 8

You can attach multiple Elastic Inference accelerators of various sizes to a single Amazon EC2 instance when launching the instance. With multiple accelerators, you can run inference for multiple models on a single fleet of Amazon EC2 instances. If your models require different amounts of GPU memory and compute capacity, you can choose the appropriate accelerator size to attach to your CPU. For faster response times, load your models to an Elastic Inference accelerator once and continue making inference calls on multiple accelerators without unloading any models for each call. By attaching multiple accelerators to a single instance, you avoid deploying multiple fleets of CPU or GPU instances and the associated cost. For more information on attaching multiple accelerators to a single instance, see Using TensorFlow Models with Elastic Inference, Using MXNet Models with Elastic Inference , and Using PyTorch Models with Elastic Inference.

NoteAttaching multiple Elastic Inference accelerators to a single Amazon EC2 instance requires that the instance has AWS Deep Learning AMI (DLAMI) version 25 or later. For more information on the AWS Deep Learning AMI, see What Is the AWS Deep Learning AMI?.

An Elastic Inference accelerator is not part of the hardware that makes up your instance. Instead, the accelerator is attached through the network using an AWS PrivateLink endpoint service. The endpoint service routes traffic from your instance to the Elastic Inference accelerator configured with your instance.

NoteAn Elastic Inference accelerator cannot be modified through the management console of your instance.

Before you launch an instance with an Elastic Inference accelerator, you must create an AWS PrivateLink endpoint service. Only a single endpoint service is needed in every Availability Zone to connect instances with Elastic Inference accelerators. A single endpoint service can span multiple Availability Zones. For more information, see VPC Endpoint Services (AWS PrivateLink).

(7)

You can use Amazon Elastic Inference enabled TensorFlow, TensorFlow Serving, Apache MXNet, or PyTorch libraries to load models and make inference calls. The modified versions of these frameworks automatically detect the presence of Elastic Inference accelerators. They then optimally distribute the model operations between the Elastic Inference accelerator and the CPU of the instance. The AWS Deep Learning AMIs include the latest releases of Amazon Elastic Inference enabled TensorFlow, TensorFlow Serving, MXNet, and PyTorch. If you are using custom AMIs or container images, you can download and install the required TensorFlow, Apache MXNet, and PyTorch libraries from Amazon S3.

Elastic Inference Uses

You can use Elastic Inference in the following use cases:

• For Elastic Inference-enabled TensorFlow and TensorFlow 2 with Python, see Using TensorFlow Models with Elastic Inference (p. 11)

• For Elastic Inference-enabled MXNet with Python, Java, and Scala, see Using MXNet Models with Elastic Inference (p. 30)

• For Elastic Inference-enabled PyTorch with Python, see Using PyTorch Models with Elastic Inference (p. 62)

• For Elastic Inference with SageMaker, see MXNet Elastic Inference with SageMaker (p. 75)

• For Amazon Deep Learning Containers with Elastic Inference on Amazon EC2, Amazon ECS, and SageMaker, see Using Amazon Deep Learning Containers With Elastic Inference (p. 76)

• For security information on Elastic Inference, see Security in Amazon Elastic Inference (p. 90)

• To troubleshoot your Elastic Inference workflow, see Troubleshooting (p. 99)

Before you get started with Amazon Elastic Inference

Amazon Elastic Inference Service Limits

Before you start using Elastic Inference accelerators, be aware of the following limitations:

(8)

Limit Description Elastic Inference

accelerator instance limit

You can attach up to five Elastic Inference accelerators by default to each instance at a time, and only during instance launch. This is adjustable. We recommend testing the optimal setup before deploying to production.

Elastic Inference

Sharing You cannot share

an Elastic Inference accelerator between instances.

Elastic Inference

Transfer You cannot detach

an Elastic Inference accelerator from an instance or transfer it to another instance.

If you no longer need an Elastic Inference accelerator, you must terminate your instance. You cannot change the Elastic Inference accelerator type. Terminate the instance and launch a new instance with a different Elastic Inference accelerator specification.

Supported Libraries Only the Amazon Elastic Inference enhanced MXNet, TensorFlow, and PyTorch libraries can make inference calls to Elastic Inference accelerators.

Elastic Inference

Attachment Elastic Inference accelerators can only be attached to instances in a VPC.

Reserving accelerator

capacity Pricing for Elastic Inference accelerators is available at On- Demand Instance rates only. You can attach an accelerator to a

(9)

Limit Description Reserved Instance, Scheduled Reserved Instance, or Spot Instance. However, the On-Demand Instance price for the Elastic Inference accelerator applies. You cannot reserve or schedule Elastic Inference accelerator capacity.

Choosing an Instance and Accelerator Type for Your Model

Demands on CPU compute resources, CPU memory, GPU-based acceleration, and GPU memory vary significantly between different types of deep learning models. The latency and throughput requirements of the application also determine the amount of instance compute and Elastic Inference acceleration you need. Consider the following when you choose an instance and accelerator type combination for your model:

• Before you evaluate the right combination of resources for your model or application stack, you should determine the target latency, throughput needs, and constraints. For example, let's assume your application must respond within 300 milliseconds (ms). If data retrieval (including any authentication) and preprocessing takes 200ms, you have a 100-ms window to work with for the inference request.

Using this analysis, you can determine the lowest cost infrastructure combination that meets these targets.

• Start with a reasonably small combination of resources. For example, a budget-friendly c5.xlarge CPU instance type along with an eia2.medium accelerator type. This combination has been tested to work well for various computer vision workloads (including a large version of ResNet: ResNet-200). The combination gives comparable or better performance than a more costly p2.xlarge GPU instance.

You can then resize the instance or accelerator type depending on your latency targets.

• I/O data transfer between instance and accelerator adds to inference latency because Elastic Inference accelerators are attached over the network.

• If you use multiple models with your accelerator, you might need a larger accelerator size to better support both compute and memory needs. This also applies if you use the same model from multiple application processes on the instance.

• You can convert your model to mixed precision, which uses the higher FP16 TFLOPS of the accelerator, to provide lower latency and higher performance.

Using Amazon Elastic Inference with EC2 Auto Scaling

When you create an Auto Scaling group, you can specify the information required to configure the Amazon EC2 instances. This includes Elastic Inference accelerators. To do this, specify a launch template with your instance configuration and the Elastic Inference accelerator type.

(10)

Working with Amazon Elastic Inference

To work with Amazon Elastic Inference, set up and launch your Amazon Elastic Compute Cloud instance with Elastic Inference. After that, use Elastic Inference accelerators that are powered by the Elastic Inference enabled versions of TensorFlow, TensorFlow Serving, Apache MXNet (MXNet), and PyTorch. You can do this with few changes to your code.

Topics

• Setting Up to Launch Amazon EC2 with Elastic Inference (p. 6)

• Using TensorFlow Models with Elastic Inference (p. 11)

• Using MXNet Models with Elastic Inference (p. 30)

• Using PyTorch Models with Elastic Inference (p. 62)

• Monitoring Elastic Inference Accelerators (p. 70)

• MXNet Elastic Inference with SageMaker (p. 75)

Setting Up to Launch Amazon EC2 with Elastic Inference

The most convenient way to set up Amazon EC2 with Elastic Inference uses the Elastic Inference setup script described in https://aws.amazon.com/blogs/machine-learning/launch-ei-accelerators-in-minutes- with-the-amazon-elastic-inference-setup-tool-for-ec2/. To manually launch an instance and associate it with an Elastic Inference accelerator, first configure your security groups and AWS PrivateLink endpoint services. Then, configure an instance role with the Elastic Inference policy.

Topics

• Configuring Your Security Groups for Elastic Inference (p. 6)

• Configuring AWS PrivateLink Endpoint Services (p. 7)

• Configuring an Instance Role with an Elastic Inference Policy (p. 8)

• Launching an Instance with Elastic Inference (p. 9)

Configuring Your Security Groups for Elastic Inference

You need two security groups. One for inbound and outbound traffic for the new Elastic Inference VPC endpoint. A second one for outbound traffic for the associated Amazon EC2 instances that you launch.

Configure Your Security Groups for Elastic Inference

To configure a security group for an Elastic Inference accelerator (console) 1. Open the Amazon VPC console at https://console.aws.amazon.com/vpc/.

2. In the left navigation pane, choose Security, Security Groups.

(11)

3. Choose Create Security Group

4. Under Create Security Group, specify a name and description for the security group and choose the ID of the VPC. Choose Create and then choose Close.

5. Select the check box next to your security group and choose Actions, Edit inbound rules. Add a rule to allow HTTPS traffic on port 443 as follows:

a. Choose Add Rule.

b. For Type, select HTTPS.

c. For Source, specify a CIDR block (for example, 0.0.0.0/0) or the security group for your instance.

d. To allow traffic for port 22 to the EC2 instance, repeat the procedure. For Type, select SSH.

e. Choose Save rules and then choose Close.

6. Choose Edit outbound rules. Choose Add rule. To allow traffic for all ports, for Type, select All Traffic.

7. Choose Save rules.

To configure a security group for an Elastic Inference accelerator (AWS CLI) 1. Create a security group using the create-security-group command:

aws ec2 create-security-group

--description insert a description for the security group --group-name assign a name for the security group

[--vpc-id enter the VPC ID]

2. Create inbound rules using the authorize-security-group-ingress command:

aws ec2 authorize-security-group-ingress --group-id insert the security group ID -- protocol tcp --port 443 --cidr 0.0.0.0/0

aws ec2 authorize-security-group-ingress --group-id insert the security group ID -- protocol tcp --port 22 --cidr 0.0.0.0/0

3. The default setting for outbound rules allows all traffic from all ports for this instance.

Configuring AWS PrivateLink Endpoint Services

Elastic Inference uses VPC endpoints to privately connect the instance in your VPC with their associated Elastic Inference accelerator. Create a VPC endpoint for Elastic Inference before you launch instances with accelerators. This needs to be done just one time per VPC. For more information, see Interface VPC Endpoints (AWS PrivateLink).

To configure an AWS PrivateLink endpoint service (console)

1. Open the Amazon VPC console at https://console.aws.amazon.com/vpc/.

2. In the left navigation pane, choose Endpoints, Create Endpoint.

3. For Service category, choose Find service by name.

4. For Service Name, select com.amazonaws.<your-region>.elastic-inference.runtime.

For example, for the us-west-2 Region, select com.amazonaws.us-west-2.elastic- inference.runtime.

5. For Subnets, select one or more Availability Zones where the endpoint should be created. Where you plan to launch instances with accelerators, you must select subnets for the Availability Zone.

(12)

6. Enable the private DNS name and enter the security group for your endpoint. Choose Create endpoint. Note the VPC endpoint ID for later.

7. The security group that we configured for the endpoint in previous steps must allow inbound traffic to port 443.

To configure an AWS PrivateLink endpoint service (AWS CLI)

• Use the https://docs.aws.amazon.com/cli/latest/reference/ec2/create-vpc-endpoint.html command and specify the following: VPC ID, type of VPC endpoint (interface), service name, subnets to use the endpoint, and security groups to associate with the endpoint network interfaces. For information about how to set up a security group for your VPC endpoint, see the section called “Configuring Your Security Groups for Elastic Inference” (p. 6).

aws ec2 create-vpc-endpoint --vpc-id vpc-insert VPC ID --vpc-endpoint-type Interface --service-name com.amazonaws.us-west-2.elastic-inference.runtime --subnet-id

subnet-insert subnet --security-group-id sg-insert security group ID

Configuring an Instance Role with an Elastic Inference Policy

To launch an instance with an Elastic Inference accelerator, you must provide an IAM role that allows actions on Elastic Inference accelerators.

To configure an instance role with an Elastic Inference policy (console) 1. Open the IAM console at https://console.aws.amazon.com/iam/.

2. In the left navigation pane, choose Policies, Create Policy.

3. Choose JSON and paste the following policy:

{

"Version": "2012-10-17", "Statement": [

{

"Effect": "Allow", "Action": [

"elastic-inference:Connect", "iam:List*",

"iam:Get*", "ec2:Describe*", "ec2:Get*"

],

"Resource": "*"

} ] }

NoteYou may get a warning message about the Elastic Inference service not being recognizable.

This is a known issue and does not block creation of the policy.

4. Choose Review policy and enter a name for the policy, such as ec2-role-trust-policy.json, and a description.

5. Choose Create policy.

6. In the left navigation pane, choose Roles, Create role.

7. Choose AWS service, EC2, Next: Permissions.

(13)

8. Select the name of the policy that you just created (ec2-role-trust-policy.json). Choose Next: Tags.

9. Provide a role name and choose Create Role.

When you create your instance, select the role under Configure Instance Details in the launch wizard.

To configure an instance role with an Elastic Inference policy (AWS CLI)

• To configure an instance role with an Elastic Inference policy, follow the steps in Creating an IAM Role. Add the following policy to your instance:

{

"Version": "2012-10-17", "Statement": [

{

"Effect": "Allow", "Action": [

"elastic-inference:Connect", "iam:List*",

"iam:Get*", "ec2:Describe*", "ec2:Get*"

],

"Resource": "*"

} ] }

Note

You may get a warning message about the Elastic Inference service not being recognizable.

This is a known issue and does not block creation of the policy.

Launching an Instance with Elastic Inference

You can now configure Amazon EC2 instances with accelerators to launch within your subnet. Choose any supported Amazon EC2 instance type and Elastic Inference accelerator size. Elastic Inference accelerators are available to all current generation instance types. There are two accelerator types.

EIA2 is the second generation of Elastic Inference accelerators. It offers improved performance and increased memory. With up to 8 GB of GPU memory, EIA2 is a cost-effective resource for deploying machine learning (ML) models. Use it for applications such as image classification, object detection, automated speech recognition, and language translation. Your accelerator memory choices depend on the size of your input and models. You can choose from the following Elastic Inference accelerators:

• eia2.medium with 2 GB of accelerator memory

• eia2.large with 4 GB of accelerator memory

• eia2.xlarge with 8 GB of accelerator memory

Note: We continue to support EIA1 in three sizes: eia1.medium, eia1.large, and eia1.xlarge

You can launch an instance with Elastic Inference automatically by using the Amazon Elastic Inference setup tool for EC2, or manually using the console or AWS Command Line Interface.

To launch an instance with Elastic Inference (console)

1. Open the Amazon EC2 console at https://console.aws.amazon.com/ec2/.

(14)

2. Choose Launch Instance.

3. Under Choose an Amazon Machine Image, select an Amazon Linux or Ubuntu AMI. We recommend one of the Deep Learning AMIs.

NoteAttaching multiple Elastic Inference accelerators to a single Amazon EC2 instance requires that the instance has AWS Deep Learning AMI (DLAMI) Version 25 or later.

4. Under Choose an Instance Type, select the hardware configuration of your instance.

5. Choose Next: Configure Instance Details.

6. Under Configure Instance Details, check the configuration settings. Ensure that you are using the VPC with the security groups for the instance and the Elastic Inference accelerator that you set up earlier. For more information, see Configuring Your Security Groups for Elastic Inference (p. 6).

7. For IAM role, select the role that you created in the Configuring an Instance Role with an Elastic Inference Policy (p. 8) procedure.

8. Select Add an Elastic Inference accelerator.

9. Select the size and amount of Elastic Inference accelerators. Your options are: eia2.medium, eia2.large, and eia2.xlarge.

10. To add another Elastic Inference accelerator, choose Add. Then select the size and amount of accelerators to add.

11. (Optional) You can choose to add storage and tags by choosing Next at the bottom of the page. Or, you can let the instance wizard complete the remaining configuration steps for you.

12. In the Add Security Group step, choose the security group created previously.

13. Review the configuration of your instance and choose Launch.

14. You are prompted to choose an existing key pair for your instance or to create a new key pair. For more information, see Amazon EC2 Key Pairs..

Warning

Don’t select the Proceed without a key pair option. If you launch your instance without a key pair, then you can’t connect to it.

15. After making your key pair selection, choose Launch Instances.

16. A confirmation page lets you know that your instance is launching. To close the confirmation page and return to the console, choose View Instances.

17. Under Instances, you can view the status of the launch. It takes a short time for an instance to launch. When you launch an instance, its initial state is pending. After the instance starts, its state changes to running.

18. It can take a few minutes for the instance to be ready so that you can connect to it. Check that your instance has passed its status checks. You can view this information in the Status Checks column.

To launch an instance with Elastic Inference (AWS CLI)

To launch an instance with Elastic Inference at the command line, you need your key pair name, subnet ID, security group ID, AMI ID, and the name of the instance profile that you created in the section Configuring an Instance Role with an Elastic Inference Policy (p. 8). For the security group ID, use the one you created for your instance that contains the AWS PrivateLink endpoint. For more information, see Configuring Your Security Groups for Elastic Inference (p. 6)). For more information about the AMI ID, see Finding a Linux AMI.

1. Use the run-instances command to launch your instance and accelerator:

aws ec2 run-instances --image-id ami-image ID --instance-type m5.large --subnet-id subnet-subnet ID --elastic-inference-accelerator Type=eia2.large --key-name key pair name --security-group-ids sg-security group ID --iam-instance-profile Name="accelerator profile name"

(15)

To launch an instance with multiple accelerators, you can add multiple Type parameters to -- elastic-inference-accelerator.

aws ec2 run-instances --image-id ami-image ID --instance-type m5.large --subnet- id subnet-subnet ID --elastic-inference-accelerator Type=eia2.large,Count=2

Type=eia2.xlarge --key-name key pair name --region region name --security-group-ids sg-security group ID

2. When the run-instances operation succeeds, your output is similar to the following. The ElasticInferenceAcceleratorArn identifies the Elastic Inference accelerator.

"ElasticInferenceAcceleratorAssociations": [ {

"ElasticInferenceAcceleratorArn": "arn:aws:elastic-

inference:us-west-2:204044812891:elastic-inference-accelerator/

eia-3e1de7c2f64a4de8b970c205e838af6b",

"ElasticInferenceAcceleratorAssociationId": "eia-assoc-031f6f53ddcd5f260", "ElasticInferenceAcceleratorAssociationState": "associating",

"ElasticInferenceAcceleratorAssociationTime": "2018-10-05T17:22:20.000Z"

} ],

You are now ready to run your models using TensorFlow, MXNet, or PyTorch on the provided AMI.

Once your Elastic Inference accelerator is running, you can use the describe-accelerators AWS CLI.

This command returns information about the accelerator, such as the region it is in and the name of the accelerator. For more information about the usage of this command, see the Elastic Inference AWS CLI Command Reference.

Using TensorFlow Models with Elastic Inference

Amazon Elastic Inference (Elastic Inference) is available only on instances that were launched with an Elastic Inference accelerator.

The Elastic Inference enabled version of TensorFlow allows you to use Elastic Inference accelerators with minimal changes to your TensorFlow code.

Topics

• Elastic Inference Enabled TensorFlow (p. 11)

• Additional Requirements and Considerations (p. 12)

• TensorFlow Elastic Inference with Python (p. 12)

• TensorFlow 2 Elastic Inference with Python (p. 21)

Elastic Inference Enabled TensorFlow

Preinstalled EI Enabled TensorFlow

The Elastic Inference enabled packages are available in the AWS Deep Learning AMI. AWS Deep Learning AMIs come with supported TensorFlow version and ei_for_tf pre-installed. Elastic Inference enabled TensorFlow 2 requires AWS Deep Learning AMI v28 or higher. You also have Docker container options with Using Amazon Deep Learning Containers With Elastic Inference (p. 76).

(16)

Installing EI Enabled TensorFlow

If you're not using a AWS Deep Learning AMI instance, you can download the packages from the Amazon S3 bucket to build it in to your own Amazon Linux or Ubuntu AMIs.

Install ei_for_tf.

pip install -U ei_for_tf*.whl

If the TensorFlow version is lower than the required version, pip upgrades TensorFlow to the appropriate version. If the TensorFlow version is higher than the required version, there will be a warning about the incompatibility. Your program fails at run-time if the TensorFlow version incompatibility isn’t fixed.

Additional Requirements and Considerations

TensorFlow 2.0 Differences

Starting with TensorFlow 2.0, the Elastic Inference package is a separate pip wheel, instead of an enhanced TensorFlow pip wheel. The prefix for import statements for the Elastic Inference specific API have changed from tensorflow.contrib.ei to ei_for_tf.

To see the compatible TensorFlow version for a specific ei_for_tf version, see the ei_for_tf_compatibility.txt file in the Amazon S3 bucket.

Model Formats Supported

Elastic Inference supports the TensorFlow saved_model format via TensorFlow Serving.

Warmup

Elastic Inference TensorFlow Serving provides a warmup feature to preload models and reduce the delay that is typical of the first inference request. Amazon Elastic Inference TensorFlow Serving only supports warming up the "fault-finders" signature definition.

Amazon Elastic Inference supports SageMaker Neo compiled TensorFlow models

Amazon Elastic Inference supports TensorFlow 2 models optimized by SageMaker Neo. A pre-trained TensorFlow model can be compiled in SageMaker Neo with EIA as the target device. The resulting model artifacts can be used for inference in Elastic Inference Accelerators. This functionality only works for ei_for_tf version 1.6 and greater. For more information, see Use Elastic Inference with SageMaker Neo compiled models (p. 30).

TensorFlow Elastic Inference with Python

With Elastic Inference TensorFlow Serving, the standard TensorFlow Serving interface remains unchanged. The only difference is that the entry point is a different binary named amazonei_tensorflow_model_server.

TensorFlow Serving and Predictor are the only inference modes that Elastic Inference supports. If you haven't tried TensorFlow Serving before, we recommend that you try the TensorFlow Serving tutorial first.

This release of Elastic Inference TensorFlow Serving has been tested to perform well and provide cost- saving benefits with the following deep learning use cases and network architectures (and similar variants):

(17)

Use Case Example Network Topology

Image Recognition Inception, ResNet, MVCNN

Object Detection SSD, RCNN

Neural Machine Translation GNMT

NoteThese tutorials assume usage of a DLAMI with v26 or later, and Elastic Inference enabled Tensorflow.

Topics

• Activate the Tensorflow Elastic Inference Environment (p. 13)

• Use Elastic Inference with TensorFlow Serving (p. 13)

• Use Elastic Inference with the TensorFlow EIPredictor API (p. 15)

• Use Elastic Inference with TensorFlow Predictor Example (p. 17)

• Use Elastic Inference with the TensorFlow Keras API (p. 19)

Activate the Tensorflow Elastic Inference Environment

1. • (Option for Python 3) - Activate the Python 3 TensorFlow Elastic Inference environment:

$ source activate amazonei_tensorflow_p36

• (Option for Python 2) - Activate the Python 2.7 TensorFlow Elastic Inference environment:

$ source activate amazonei_tensorflow_p27

2. The remaining parts of this guide assume you are using the amazonei_tensorflow_p27 environment.

If you are switching between Elastic Inference enabled MXNet, TensorFlow, or PyTorch environments, you must stop and then start your instance in order to reattach the Elastic Inference accelerator. Rebooting is not sufficient since the process requires a complete shut down.

Use Elastic Inference with TensorFlow Serving

The following is an example of serving a Single Shot Detector (SSD) with a ResNet backbone.

Serve and Test Inference with an Inception Model 1. Download the model.

curl -O https://s3-us-west-2.amazonaws.com/aws-tf-serving-ei-example/ssd_resnet.zip 2. Unzip the model.

unzip ssd_resnet.zip -d /tmp

3. Download a picture of three dogs to your home directory.

(18)

curl -O https://raw.githubusercontent.com/awslabs/mxnet-model-server/master/docs/

images/3dogs.jpg

4. Use the built-in EI Tool to get the device ordinal number of all attached Elastic Inference accelerators. For more information on EI Tool, see Monitoring Elastic Inference Accelerators.

/opt/amazon/ei/ei_tools/bin/ei describe-accelerators --json

Your output should look like the following:

{

"ei_client_version": "1.5.0", "time": "Fri Nov 1 03:09:38 2019", "attached_accelerators": 2,

"devices": [ {

"ordinal": 0,

"type": "eia1.xlarge",

"id": "eia-679e4c622d584803aed5b42ab6a97706", "status": "healthy"

}, {

"ordinal": 1,

"type": "eia1.xlarge",

"id": "eia-6c414c6ee37a4d93874afc00825c2f28", "status": "healthy"

} ] }

5. Navigate to the folder where AmazonEI_TensorFlow_Serving is installed and run the following command to launch the server. Set EI_VISIBLE_DEVICES to the device ordinal or device ID of the attached Elastic Inference accelerator that you want to use. This device will then be accessible using id 0. For more information on EI_VISIBLE_DEVICES, see Monitoring Elastic Inference Accelerators. Note, model_base_path must be an absolute path.

EI_VISIBLE_DEVICES=<ordinal number> amazonei_tensorflow_model_server -- model_name=ssdresnet --model_base_path=/tmp/ssd_resnet50_v1_coco --port=9000

6. While the server is running in the foreground, launch another terminal session. Open a new terminal and activate the TensorFlow environment.

source activate amazonei_tensorflow_p27

7. Use your preferred text editor to create a script that has the following content. Name it ssd_resnet_client.py. This script will take an image filename as a parameter and get a prediction result from the pretrained model.

from __future__ import print_function import grpc

import tensorflow as tf from PIL import Image import numpy as np import time import os

from tensorflow_serving.apis import predict_pb2

from tensorflow_serving.apis import prediction_service_pb2_grpc

(19)

tf.app.flags.DEFINE_string('server', 'localhost:9000', 'PredictionService host:port')

tf.app.flags.DEFINE_string('image', '', 'path to image in JPEG format') FLAGS = tf.app.flags.FLAGS

coco_classes_txt = "https://raw.githubusercontent.com/amikelive/coco-labels/master/

coco-labels-paper.txt"

local_coco_classes_txt = "/tmp/coco-labels-paper.txt"

# it's a file like object and works just like a file

os.system("curl -o %s -O %s"%(local_coco_classes_txt, coco_classes_txt)) NUM_PREDICTIONS = 5

with open(local_coco_classes_txt) as f:

classes = ["No Class"] + [line.strip() for line in f.readlines()]

def main(_):

channel = grpc.insecure_channel(FLAGS.server)

stub = prediction_service_pb2_grpc.PredictionServiceStub(channel) # Send request

with Image.open(FLAGS.image) as f:

f.load()

# See prediction_service.proto for gRPC request/response details.

data = np.asarray(f)

data = np.expand_dims(data, axis=0) request = predict_pb2.PredictRequest() request.model_spec.name = 'ssdresnet' request.inputs['inputs'].CopyFrom(

tf.contrib.util.make_tensor_proto(data, shape=data.shape)) result = stub.Predict(request, 60.0) # 10 secs timeout outputs = result.outputs

detection_classes = outputs["detection_classes"]

detection_classes = tf.make_ndarray(detection_classes)

num_detections = int(tf.make_ndarray(outputs["num_detections"])[0]) print("%d detection[s]" % (num_detections))

class_label = [classes[int(x)]

for x in detection_classes[0][:num_detections]]

print("SSD Prediction is ", class_label)

if __name__ == '__main__':

tf.app.run()

8. Now run the script passing the server location, port, and the dog photo's filename as the parameters.

python ssd_resnet_client.py --server=localhost:9000 --image 3dogs.jpg

Use Elastic Inference with the TensorFlow EIPredictor API

Elastic Inference TensorFlow packages for Python 2 and 3 provide an EIPredictor API. This API function provides you with a flexible way to run models on Elastic Inference accelerators as an alternative to using TensorFlow Serving. The EIPredictor API provides a simple interface to perform repeated inference on a pretrained model. The following code sample shows the available parameters.

Note

accelerator_id should be set to the device's ordinal number, not its ID.

ei_predictor = EIPredictor(model_dir, signature_def_key=None, signature_def=None,

(20)

input_names=None, output_names=None, tags=None,

graph=None, config=None, use_ei=True,

accelerator_id=<device ordinal number>)

output_dict = ei_predictor(feed_dict)

EIPredictor can be used in the following ways:

//EIPredictor class picks inputs and outputs from default serving signature def with tag “serve”. (similar to TF predictor)

ei_predictor = EIPredictor(model_dir)

//EI Predictor class picks inputs and outputs from the signature def picked using the signtaure_def_key (similar to TF predictor)

ei_predictor = EIPredictor(model_dir, signature_def_key='predict') // Signature_def can be provided directly (similar to TF predictor) ei_predictor = EIPredictor(model_dir, signature_def= sig_def) // You provide the input_names and output_names dict.

// similar to TF predictor

ei_predictor = EIPredictor(model_dir, input_names, output_names)

// tag is used to get the correct signature def. (similar to TF predictor) ei_predictor = EIPredictor(model_dir, tags='serve')

Additional EI Predictor functionality includes:

• Support for frozen models.

// For Frozen graphs, model_dir takes a file name , input_names and output_names // input_names and output_names should be provided in this case.

ei_predictor = EIPredictor(model_dir,

input_names=None, output_names=None )

• Ability to disable use of Elastic Inference by using the use_ei flag, which defaults to True. This is useful for testing EIPredictor against TensorFlow Predictor.

• EIPredictor can also be created from a TensorFlow Estimator. Given a trained Estimator, you can first export a SavedModel. See the SavedModel documentation for more details. The following shows example usage:

saved_model_dir = estimator.export_savedmodel(my_export_dir, serving_input_fn) ei_predictor = EIPredictor(export_dir=saved_model_dir)

// Once the EIPredictor is created, inference is done using the following:

output_dict = ei_predictor(feed_dict)

(21)

Use Elastic Inference with TensorFlow Predictor Example

Installing Elastic Inference TensorFlow

Elastic Inference enabled TensorFlow comes bundled in the AWS Deep Learning AMI. You can also download pip wheels for Python 2 and 3 from the Elastic Inference S3 bucket. Follow these instructions to download and install the pip package:

Choose the tar file for the Python version and operating system of your choice from the S3 bucket. Copy the path to the tar file and run the following command:

curl -O [URL of the tar file of your choice]

To open the tar the file:

tar -xvzf [name of tar file]

Try the following example to serve different models, such as ResNet, using a Single Shot Detector (SSD).

Serve and Test Inference with an SSD Model

1. Download the model. If you already downloaded the model in the Serving example, skip this step.

curl -O https://s3-us-west-2.amazonaws.com/aws-tf-serving-ei-example/ssd_resnet.zip 2. Unzip the model. Again, you may skip this step if you already have the model.

unzip ssd_resnet.zip -d /tmp

3. Download a picture of three dogs to your current directory.

curl -O https://raw.githubusercontent.com/awslabs/mxnet-model-server/master/docs/

images/3dogs.jpg

4. Use the built-in EI Tool to get the device ordinal number of all attached Elastic Inference accelerators. For more information on EI Tool, see Monitoring Elastic Inference Accelerators.

/opt/amazon/ei/ei_tools/bin/ei describe-accelerators --json

Your output should look like the following:

{ "ei_client_version": "1.5.0", "time": "Fri Nov 1 03:09:38 2019", "attached_accelerators": 2,

"devices": [ {

"ordinal": 0,

"type": "eia1.xlarge",

"id": "eia-679e4c622d584803aed5b42ab6a97706", "status": "healthy"

}, {

"ordinal": 1,

"type": "eia1.xlarge",

"id": "eia-6c414c6ee37a4d93874afc00825c2f28", "status": "healthy"

(22)

} ]}

You use the device ordinal of your desired Elastic Inference accelerator to create a Predictor.

5. Open a text editor, such as vim, and paste the following inference script. Replace the

accelerator_id value with the device ordinal of the desired Elastic Inference accelerator. This value must be an integer. Save the file as ssd_resnet_predictor.py.

from __future__ import absolute_import from __future__ import division from __future__ import print_function import os

import sys

import numpy as np import tensorflow as tf

import matplotlib.image as mpimg

from tensorflow.contrib.ei.python.predictor.ei_predictor import EIPredictor tf.app.flags.DEFINE_string('image', '', 'path to image in JPEG format') FLAGS = tf.app.flags.FLAGS

coco_classes_txt = "https://raw.githubusercontent.com/amikelive/coco-labels/master/

coco-labels-paper.txt"

local_coco_classes_txt = "/tmp/coco-labels-paper.txt"

# it's a file like object and works just like a file

os.system("curl -o %s -O %s"%(local_coco_classes_txt, coco_classes_txt)) NUM_PREDICTIONS = 5

with open(local_coco_classes_txt) as f:

classes = ["No Class"] + [line.strip() for line in f.readlines()]

def get_output(eia_predictor, test_input):

pred = None

for curpred in range(NUM_PREDICTIONS):

pred = eia_predictor(test_input)

num_detections = int(pred["num_detections"]) print("%d detection[s]" % (num_detections))

detection_classes = pred["detection_classes"][0][:num_detections]

print([classes[int(i)] for i in detection_classes])

def main(_):

img = mpimg.imread(FLAGS.image) img = np.expand_dims(img, axis=0) ssd_resnet_input = {'inputs': img}

print('Running SSD Resnet on EIPredictor using specified input and outputs') eia_predictor = EIPredictor(

model_dir='/tmp/ssd_resnet50_v1_coco/1/', input_names={"inputs": "image_tensor:0"},

output_names={"detection_classes": "detection_classes:0", "num_detections":

"num_detections:0",

"detection_boxes": "detection_boxes:0"}, accelerator_id=<device ordinal>

) get_output(eia_predictor, ssd_resnet_input)

print('Running SSD Resnet on EIPredictor using default Signature Def') eia_predictor = EIPredictor(

(23)

model_dir='/tmp/ssd_resnet50_v1_coco/1/', ) get_output(eia_predictor, ssd_resnet_input)

if __name__ == "__main__":

tf.app.run() 6. Run the inference script.

python ssd_resnet_predictor.py --image 3dogs.jpg

For more tutorials and examples, see the TensorFlow Python API.

Use Elastic Inference with the TensorFlow Keras API

The Keras API has become an integral part of the machine learning development cycle because of its simplicity and ease of use. Keras enables rapid prototyping and development of machine learning constructs. Elastic Inference provides an API that offers native support for Keras. Using this API, you can directly use your Keras model, h5 file, and weights to instantiate a Keras-like Object. This object supports the native Keras prediction APIs, while fully utilizing Elastic Inference in the backend. The following code sample shows the available parameters:

EIKerasModel(model, weights=None, export_dir=None, ):

"""Constructs an `EIKerasModel` instance.

Args:

model: A model object that either has its weights already set, or will be set with the weights argument.

A model file that can be loaded

weights (Optional): A weights object, or weights file that can be loaded, and will be set to the model object

export_dir: A folder location to save your model as a SavedModelBundle Raises:

RuntimeError: If eager execution is enabled.

"""

EIKerasModel can be used as follows:

#Loading from Keras Model Object

from tensorflow.contrib.ei.python.keras.ei_keras import EIKerasModel model = Model()

# Build Keras Model in the normal fashion x = # input data

ei_model = EIKerasModel(model) # Only additional step to use EI res = ei_model.predict(x)

#Loading from Keras h5 File

from tensorflow.contrib.ei.python.keras.ei_keras import EIKerasModel x = # input data

ei_model = EIKerasModel("keras_model.h5") # Only additional step to use EI res = ei_model.predict(x)

#Loading from Keras h5 File and Weights file

from tensorflow.contrib.ei.python.keras.ei_keras import EIKerasModel x = # input data

(24)

ei_model = EIKerasModel("keras_model.json", weights="keras_weights.h5") # Only additional step to use EI

res = ei_model.predict(x)

Additionally, Elastic Inference enabled Keras includes Predict API Support:

tf.keras

def predict( x,

batch_size=None, verbose=0, steps=None,

max_queue_size=10, #Not supported workers=1, #Not Supported

use_multiprocessing=False): #Not Supported

Native Keras def predict( x,

batch_size=None, verbose=0, steps=None,

callbacks=None) # Not supported

TensorFlow Keras API Example

In this example, you use a trained ResNet-50 model to classify an image of an African Elephant from ImageNet.

Test Inference with a Keras Model

1. Activate the Elastic Inference TensorFlow Conda Environment

source activate amazonei_tensorflow_p27

2. Download an image of an African Elephant to your current directory.

curl -O https://upload.wikimedia.org/wikipedia/commons/5/59/

Serengeti_Elefantenbulle.jpg

3. Open a text editor, such as vim, and paste the following inference script. Save the file as test_keras.py.

# Resnet Example

from tensorflow.keras.applications.resnet50 import ResNet50 from tensorflow.keras.preprocessing import image

from tensorflow.keras.applications.resnet50 import preprocess_input, decode_predictions from tensorflow.contrib.ei.python.keras.ei_keras import EIKerasModel

import numpy as np import time import os ITERATIONS = 20

model = ResNet50(weights='imagenet') ei_model = EIKerasModel(model)

folder_name = os.path.dirname(os.path.abspath(__file__)) img_path = folder_name + '/Serengeti_Elefantenbulle.jpg' img = image.load_img(img_path, target_size=(224, 224)) x = image.img_to_array(img)

x = np.expand_dims(x, axis=0) x = preprocess_input(x)

(25)

# Warm up both models _ = model.predict(x) _ = ei_model.predict(x)

# Benchmark both models

for each in range(ITERATIONS):

start = time.time() preds = model.predict(x)

print("Vanilla iteration %d took %f" % (each, time.time() - start)) for each in range(ITERATIONS):

start = time.time()

ei_preds = ei_model.predict(x)

print("EI iteration %d took %f" % (each, time.time() - start))

# decode the results into a list of tuples (class, description, probability)

# (one such list for each sample in the batch)

print('Predicted:', decode_predictions(preds, top=3)[0]) print('EI Predicted:', decode_predictions(ei_preds, top=3)[0]) 4. Run the inference script.

python test_keras.py

5. Your output should be a list of predictions, as well as their respective confidence score.

('Predicted:', [(u'n02504458', u'African_elephant', 0.9081173), (u'n01871265', u'tusker', 0.07836755), (u'n02504013', u'Indian_elephant', 0.011482777)])

('EI Predicted:', [(u'n02504458', u'African_elephant', 0.90811676), (u'n01871265', u'tusker', 0.07836751), (u'n02504013', u'Indian_elephant', 0.011482781)])

For more tutorials and examples, see the TensorFlow Python API.

TensorFlow 2 Elastic Inference with Python

With Elastic Inference TensorFlow 2 Serving, the standard TensorFlow 2 Serving interface remains unchanged. The only difference is that the entry point is a different binary named amazonei_tensorflow2_model_server.

TensorFlow 2 Serving and Predictor are the only inference modes that Elastic Inference supports. If you haven't tried TensorFlow 2 Serving before, we recommend that you try the TensorFlow Serving tutorial first.

This release of Elastic Inference TensorFlow Serving has been tested to perform well and provide cost- saving benefits with the following deep learning use cases and network architectures (and similar variants):

Use Case Example Network Topology

Image Recognition Inception, ResNet, MVCNN

Object Detection SSD, RCNN

Neural Machine Translation GNMT

Note

These tutorials assume usage of a DLAMI with v42 or later, and Elastic Inference enabled Tensorflow 2.

Topics

(26)

• Activate the Tensorflow 2 Elastic Inference Environment (p. 22)

• Use Elastic Inference with TensorFlow 2 Serving (p. 22)

• Use Elastic Inference with the TensorFlow 2 EIPredictor API (p. 24)

• Use Elastic Inference with TensorFlow 2 Predictor Example (p. 25)

• Use Elastic Inference with the TensorFlow 2 Keras API (p. 27)

• Use Elastic Inference with SageMaker Neo compiled models (p. 30)

Activate the Tensorflow 2 Elastic Inference Environment

Activate the Python 3 TensorFlow 2 Elastic Inference environment:

$ source activate amazonei_tensorflow2_p36

Use Elastic Inference with TensorFlow 2 Serving

The following is an example of serving a Single Shot Detector (SSD) with a ResNet backbone.

To serve and test inference with an inception model 1. Download the model.

curl -O https://s3-us-west-2.amazonaws.com/aws-tf-serving-ei-example/ssd_resnet.zip 2. Unzip the model.

unzip ssd_resnet.zip -d /tmp

3. Download a picture of three dogs to your home directory.

curl -O https://raw.githubusercontent.com/awslabs/mxnet-model-server/master/docs/

images/3dogs.jpg

4. Use the built-in EI Tool to get the device ordinal number of all attached Elastic Inference accelerators. For more information on EI Tool, see Monitoring Elastic Inference Accelerators.

/opt/amazon/ei/ei_tools/bin/ei describe-accelerators --json

Your output should look like the following:

{ "ei_client_version": "1.5.0", "time": "Fri Nov 1 03:09:38 2019", "attached_accelerators": 2,

"devices": [ {

"ordinal": 0,

"type": "eia1.xlarge",

"id": "eia-679e4c622d584803aed5b42ab6a97706", "status": "healthy"

}, {

"ordinal": 1,

"type": "eia1.xlarge",

"id": "eia-6c414c6ee37a4d93874afc00825c2f28",

(27)

"status": "healthy"

} ]}

5. Navigate to the folder where AmazonEI_TensorFlow_Serving is installed and run the following command to launch the server. Set EI_VISIBLE_DEVICES to the device ordinal or device

ID of the attached Elastic Inference accelerator that you want to use. This device will then be accessible using id 0. model_base_path must be an absolute path. For more information on EI_VISIBLE_DEVICES, see Monitoring Elastic Inference Accelerators.

EI_VISIBLE_DEVICES=<ordinal number> amazonei_tensorflow2_model_server --model_name=ssdresnet

--model_base_path=/tmp/ssd_resnet50_v1_coco --port=9000

6. While the server is running in the foreground, launch another terminal session. Open a new terminal and activate the TensorFlow environment.

source activate amazonei_tensorflow2_p36

7. Use your preferred text editor to create a script that has the following content. Name it ssd_resnet_client.py. This script will take an image filename as a parameter and get a prediction result from the pretrained model.

from __future__ import print_function import grpc

import tensorflow as tf from PIL import Image import numpy as np import time import os

from tensorflow_serving.apis import predict_pb2

from tensorflow_serving.apis import prediction_service_pb2_grpc tf.compat.v1.app.flags.DEFINE_string('server', 'localhost:9000', 'PredictionService host:port')

tf.compat.v1.app.flags.DEFINE_string('image', '', 'path to image in JPEG format') FLAGS = tf.compat.v1.app.flags.FLAGS

coco_classes_txt = "https://raw.githubusercontent.com/amikelive/coco-labels/master/

coco-labels-paper.txt"

local_coco_classes_txt = "/tmp/coco-labels-paper.txt"

# it's a file like object and works just like a file

os.system("curl -o %s -O %s"%(local_coco_classes_txt, coco_classes_txt)) NUM_PREDICTIONS = 5

with open(local_coco_classes_txt) as f:

classes = ["No Class"] + [line.strip() for line in f.readlines()]

def main(_):

channel = grpc.insecure_channel(FLAGS.server)

stub = prediction_service_pb2_grpc.PredictionServiceStub(channel) # Send request

with Image.open(FLAGS.image) as f:

f.load()

# See prediction_service.proto for gRPC request/response details.

data = np.asarray(f)

data = np.expand_dims(data, axis=0) request = predict_pb2.PredictRequest()

(28)

request.model_spec.name = 'ssdresnet' request.inputs['inputs'].CopyFrom(

tf.make_tensor_proto(data, shape=data.shape)) result = stub.Predict(request, 60.0) # 10 secs timeout outputs = result.outputs

detection_classes = outputs["detection_classes"]

detection_classes = tf.make_ndarray(detection_classes)

num_detections = int(tf.make_ndarray(outputs["num_detections"])[0]) print("%d detection[s]" % (num_detections))

class_label = [classes[int(x)]

for x in detection_classes[0][:num_detections]]

print("SSD Prediction is ", class_label)

if __name__ == '__main__':

tf.compat.v1.app.run()

8. Now run the script passing the server location, port, and the dog photo's filename as the parameters.

python ssd_resnet_client.py --server=localhost:9000 --image 3dogs.jpg

Use Elastic Inference with the TensorFlow 2 EIPredictor API

Elastic Inference TensorFlow packages for Python 3 provide an EIPredictor API. This API function

provides you with a flexible way to run models on Elastic Inference accelerators as an alternative to using TensorFlow 2 Serving. The EIPredictor API provides a simple interface to perform repeated inference on a pretrained model. The following code sample shows the available parameters.

Note

accelerator_id should be set to the device's ordinal number, not its ID.

ei_predictor = EIPredictor(model_dir, signature_def_key=None, signature_def=None, input_names=None, output_names=None, tags=None,

graph=None, config=None, use_ei=True,

accelerator_id=<device ordinal number>)

output_dict = ei_predictor(feed_dict)

You can use EIPredictor in the following ways:

//EIPredictor class picks inputs and outputs from default serving signature def with tag “serve”. (similar to TF predictor)

ei_predictor = EIPredictor(model_dir)

//EI Predictor class picks inputs and outputs from the signature def picked using the signtaure_def_key (similar to TF predictor)

ei_predictor = EIPredictor(model_dir, signature_def_key='predict') // Signature_def can be provided directly (similar to TF predictor) ei_predictor = EIPredictor(model_dir, signature_def= sig_def) // You provide the input_names and output_names dict.

(29)

// similar to TF predictor

ei_predictor = EIPredictor(model_dir, input_names, output_names)

// tag is used to get the correct signature def. (similar to TF predictor) ei_predictor = EIPredictor(model_dir, tags='serve')

Additional EI Predictor functionality includes the following:

• Support for frozen models.

// For Frozen graphs, model_dir takes a file name , input_names and output_names // input_names and output_names should be provided in this case.

ei_predictor = EIPredictor(model_dir,

input_names=None, output_names=None )

• Ability to disable use of Elastic Inference by using the use_ei flag, which defaults to True. This is useful for testing EIPredictor against TensorFlow 2 Predictor.

• EIPredictor can also be created from a TensorFlow 2 Estimator. Given a trained Estimator, you can first export a SavedModel. See the SavedModel documentation for more details. The following shows example usage:

saved_model_dir = estimator.export_savedmodel(my_export_dir, serving_input_fn) ei_predictor = EIPredictor(export_dir=saved_model_dir)

// Once the EIPredictor is created, inference is done using the following:

output_dict = ei_predictor(feed_dict)

Use Elastic Inference with TensorFlow 2 Predictor Example

Installing Elastic Inference TensorFlow 2

Elastic Inference enabled TensorFlow 2 comes bundled in the AWS Deep Learning AMI. You can also download the pip wheels for Python 3 from the Elastic Inference S3 bucket. Follow these instructions to download and install the pip package:

1. Choose the tar file for the Python version and operating system of your choice from the S3 bucket.

Copy the path to the tar file and run the following command:

curl -O [URL of the tar file of your choice]

2. To open the tar the file, run the following command:

tar -xvzf [name of tar file]

3. Install the wheel using pip as shown in the following:

pip install -U [name of untarred folder]/[name of tensorflow whl]

To serve different models, such as ResNet, using a Single Shot Detector (SSD), try the following example.

(30)

To serve and test inference with an SSD model

1. Download and unzip the model. If you already have the model, skip this step.

curl -O https://s3-us-west-2.amazonaws.com/aws-tf-serving-ei-example/ssd_resnet.zip unzip ssd_resnet.zip -d /tmp

2. Download a picture of three dogs to your current directory.

curl -O https://raw.githubusercontent.com/awslabs/mxnet-model-server/master/docs/

images/3dogs.jpg

3. Use the built-in EI Tool to get the device ordinal number of all attached Elastic Inference accelerators. For more information on EI Tool, see Monitoring Elastic Inference Accelerators.

/opt/amazon/ei/ei_tools/bin/ei describe-accelerators --json

Your output should look like the following:

{ "ei_client_version": "1.5.0", "time": "Fri Nov 1 03:09:38 2019", "attached_accelerators": 2,

"devices": [ {

"ordinal": 0,

"type": "eia1.xlarge",

"id": "eia-679e4c622d584803aed5b42ab6a97706", "status": "healthy"

}, {

"ordinal": 1,

"type": "eia1.xlarge",

"id": "eia-6c414c6ee37a4d93874afc00825c2f28", "status": "healthy"

} ] }

You use the device ordinal of your desired Elastic Inference accelerator to create a Predictor.

4. Open a text editor, such as vim, and paste the following inference script. Replace the

accelerator_id value with the device ordinal of the desired Elastic Inference accelerator. This value must be an integer. Save the file as ssd_resnet_predictor.py.

from __future__ import absolute_import from __future__ import division from __future__ import print_function import os

import sys

import numpy as np import tensorflow as tf

import matplotlib.image as mpimg

from ei_for_tf.python.predictor.ei_predictor import EIPredictor

tf.compat.v1.app.flags.DEFINE_string('image', '', 'path to image in JPEG format') FLAGS = tf.compat.v1.app.flags.FLAGS

coco_classes_txt = "https://raw.githubusercontent.com/amikelive/coco-labels/master/

coco-labels-paper.txt"

(31)

local_coco_classes_txt = "/tmp/coco-labels-paper.txt"

# it's a file like object and works just like a file

os.system("curl -o %s -O %s"%(local_coco_classes_txt, coco_classes_txt)) NUM_PREDICTIONS = 5

with open(local_coco_classes_txt) as f:

classes = ["No Class"] + [line.strip() for line in f.readlines()]

def get_output(eia_predictor, test_input):

pred = None

for curpred in range(NUM_PREDICTIONS):

pred = eia_predictor(test_input)

num_detections = int(pred["num_detections"]) print("%d detection[s]" % (num_detections))

detection_classes = pred["detection_classes"][0][:num_detections]

print([classes[int(i)] for i in detection_classes])

def main(_):

img = mpimg.imread(FLAGS.image) img = np.expand_dims(img, axis=0) ssd_resnet_input = {'inputs': img}

print('Running SSD Resnet on EIPredictor using specified input and outputs') eia_predictor = EIPredictor(

model_dir='/tmp/ssd_resnet50_v1_coco/1/', input_names={"inputs": "image_tensor:0"},

output_names={"detection_classes": "detection_classes:0", "num_detections":

"num_detections:0",

"detection_boxes": "detection_boxes:0"}, accelerator_id=0

)

get_output(eia_predictor, ssd_resnet_input)

print('Running SSD Resnet on EIPredictor using default Signature Def') eia_predictor = EIPredictor(

model_dir='/tmp/ssd_resnet50_v1_coco/1/', ) get_output(eia_predictor, ssd_resnet_input)

if __name__ == "__main__":

tf.compat.v1.app.run() 5. Run the inference script.

python ssd_resnet_predictor.py --image 3dogs.jpg

For more tutorials and examples, see the TensorFlow Python API.

Use Elastic Inference with the TensorFlow 2 Keras API

The Keras API has become an integral part of the machine learning development cycle because of its simplicity and ease of use. Keras enables rapid prototyping and development of machine learning constructs. Elastic Inference provides an API that offers native support for Keras. Using this API, you can directly use your Keras model, h5 file, and weights to instantiate a Keras-like Object. This object supports the native Keras prediction APIs, while fully utilizing Elastic Inference in the backend. Currently, EIKerasModel is only supported in Graph Mode. The following code sample shows the available parameters:

(32)

EIKerasModel(model, weights=None, export_dir=None, ):

"""Constructs an `EIKerasModel` instance.

Args:

model: A model object that either has its weights already set, or will be set with the weights argument.

A model file that can be loaded

weights (Optional): A weights object, or weights file that can be loaded, and will be set to the model object

export_dir: A folder location to save your model as a SavedModelBundle Raises:

RuntimeError: If eager execution is enabled.

"""

EIKerasModel can be used as follows:

#Loading from Keras Model Object

from ei_for_tf.python.keras.ei_keras import EIKerasModel model = Model()

# Build Keras Model in the normal fashion x = # input data

ei_model = EIKerasModel(model) # Only additional step to use EI res = ei_model.predict(x)

#Loading from Keras h5 File

from ei_for_tf.python.keras.ei_keras import EIKerasModel x = # input data

ei_model = EIKerasModel("keras_model.h5") # Only additional step to use EI res = ei_model.predict(x)

#Loading from Keras h5 File and Weights file

from ei_for_tf.python.keras.ei_keras import EIKerasModel x = # input data

ei_model = EIKerasModel("keras_model.json", weights="keras_weights.h5") # Only additional step to use EI

res = ei_model.predict(x)

Additionally, Elastic Inference enabled Keras includes Predict API Support as follows:

tf.keras

def predict( x,

batch_size=None, verbose=0, steps=None,

max_queue_size=10, #Not supported workers=1, #Not Supported

use_multiprocessing=False): #Not Supported

Native Keras def predict( x,

batch_size=None, verbose=0, steps=None,

callbacks=None) # Not supported

(33)

TensorFlow 2 Keras API Example

In this example, you use a trained ResNet-50 model to classify an image of an African Elephant from ImageNet.

To test inference with a Keras model

1. Activate the Elastic Inference TensorFlow Conda Environment

source activate amazonei_tensorflow2_p36

2. Download an image of an African Elephant to your current directory.

curl -O https://upload.wikimedia.org/wikipedia/commons/5/59/

Serengeti_Elefantenbulle.jpg

3. Open a text editor, such as vim, and paste the following inference script. Save the file as test_keras.py.

# Resnet Example

from tensorflow.keras.applications.resnet50 import ResNet50 from tensorflow.keras.preprocessing import image

from tensorflow.keras.applications.resnet50 import preprocess_input, decode_predictions from ei_for_tf.python.keras.ei_keras import EIKerasModel

import numpy as np import time import os

import tensorflow as tf

tf.compat.v1.disable_eager_execution()

ITERATIONS = 20

model = ResNet50(weights='imagenet') ei_model = EIKerasModel(model)

folder_name = os.path.dirname(os.path.abspath(__file__)) img_path = folder_name + '/Serengeti_Elefantenbulle.jpg' img = image.load_img(img_path, target_size=(224, 224)) x = image.img_to_array(img)

x = np.expand_dims(x, axis=0) x = preprocess_input(x)

# Warm up both models _ = model.predict(x) _ = ei_model.predict(x)

# Benchmark both models

for each in range(ITERATIONS):

start = time.time() preds = model.predict(x)

print("Vanilla iteration %d took %f" % (each, time.time() - start)) for each in range(ITERATIONS):

start = time.time()

ei_preds = ei_model.predict(x)

print("EI iteration %d took %f" % (each, time.time() - start))

# decode the results into a list of tuples (class, description, probability)

# (one such list for each sample in the batch)

print('Predicted:', decode_predictions(preds, top=3)[0]) print('EI Predicted:', decode_predictions(ei_preds, top=3)[0]) 4. Run the inference script as follows:

python test_keras.py

參考文獻

相關文件

In this paper, we would like to characterize non-radiating volume and surface (faulting) sources for the elastic waves in anisotropic inhomogeneous media.. Each type of the source

Results for such increasing stability phenomena in the inverse source problems for the acoustic, electromagnetic, and elastic waves can be found in [ABF02, BLT10, BHKY18, BLZ20,

Keywords: Requesting Song, Information Retrieval, Knowledge Base, Fuzzy Inference, Adaptation Recommendation System... 致

Assessing Fit of Unidimensional Item Response Theory Models The issue of evaluating practical consequences of model misfit has been given little attention in the model

If the best number of degrees of freedom for pure error can be specified, we might use some standard optimality criterion to obtain an optimal design for the given model, and

“Transductive Inference for Text Classification Using Support Vector Machines”, Proceedings of ICML-99, 16 th International Conference on Machine Learning, pp.200-209. Coppin

mNewLine ; invoke the macro This is how you define and invoke a simple macro. The assembler will substitute &#34;call

Since the FP-tree reduces the number of database scans and uses less memory to represent the necessary information, many frequent pattern mining algorithms are based on its