Amazon Elastic Inference

(1)

Amazon Elastic Inference

Developer Guide

(2)

Amazon Elastic Inference: Developer Guide

Amazon's trademarks and trade dress may not be used in connection with any product or service that is not Amazon's, in any manner that is likely to cause confusion among customers, or in any manner that disparages or discredits Amazon. All other trademarks not owned by Amazon are the property of their respective owners, who may or may not be aﬃliated with, connected to, or sponsored by Amazon.

(3)

What Is Amazon Elastic Inference?

Amazon Elastic Inference (Elastic Inference) is a resource you can attach to your Amazon Elastic Compute Cloud CPU instances, Amazon Deep Learning Containers, and SageMaker instances. Elastic Inference helps you accelerate your deep learning (DL) inference workloads. Elastic Inference accelerators come in multiple sizes and help you build intelligent capabilities into your applications.

Elastic Inference distributes model operations deﬁned by TensorFlow, Apache MXNet (MXNet), and PyTorch between low-cost, DL inference accelerators and the CPU of the instance. Elastic Inference also supports the open neural network exchange (ONNX) format through MXNet.

Prerequisites

You need an Amazon Web Services account and should be familiar with launching an Amazon EC2, Amazon Deep Learning Containers, or SageMaker instances to successfully run Amazon Elastic Inference.

To launch an Amazon EC2 instance, complete the steps in Setting up with Amazon EC2. Amazon S3 resources are required for installing packages via pip. For more information about setting up Amazon S3 resources, see the Amazon Simple Storage Service User Guide.

Pricing for Amazon Elastic Inference

You are charged for each second that an Elastic Inference accelerator is attached to an instance in the running state. You are not charged for an accelerator attached to an instance that is in the pending, stopping, stopped, shutting-down, or terminated state. You are also not charged when an Elastic Inference accelerator is in the unknown or impaired state.

You do not incur AWS PrivateLink charges for VPC endpoints to the Elastic Inference service when you have accelerators provisioned in the subnet.

For more information about pricing by Region for Elastic Inference, see Elastic Inference Pricing.

Elastic Inference Uses

You can use Elastic Inference in the following use cases:

• For Elastic Inference-enabled TensorFlow and TensorFlow 2 with Python, see Using TensorFlow Models with Elastic Inference (p. 11)

• For Elastic Inference-enabled MXNet with Python, Java, and Scala, see Using MXNet Models with Elastic Inference (p. 30)

• For Elastic Inference-enabled PyTorch with Python, see Using PyTorch Models with Elastic Inference (p. 62)

• For Elastic Inference with SageMaker, see MXNet Elastic Inference with SageMaker (p. 75)

• For Amazon Deep Learning Containers with Elastic Inference on Amazon EC2, Amazon ECS, and SageMaker, see Using Amazon Deep Learning Containers With Elastic Inference (p. 76)

(6)

• For security information on Elastic Inference, see Security in Amazon Elastic Inference (p. 90)

• To troubleshoot your Elastic Inference workﬂow, see Troubleshooting (p. 99)

Next Up

Amazon Elastic Inference Basics (p. 2)

Amazon Elastic Inference Basics

When you conﬁgure an Amazon EC2 instance to launch with an Elastic Inference accelerator, AWS ﬁnds available accelerator capacity. It then establishes a network connection between your instance and the accelerator.

The following Elastic Inference accelerator types are available. You can attach any Elastic Inference accelerator type to any Amazon EC2 instance type.

Accelerator Type FP32 Throughput

(TFLOPS) FP16 Throughput

(TFLOPS) Memory (GB)

eia2.medium 1 8 2

eia2.large 2 16 4

eia2.xlarge 4 32 8

You can attach multiple Elastic Inference accelerators of various sizes to a single Amazon EC2 instance when launching the instance. With multiple accelerators, you can run inference for multiple models on a single fleet of Amazon EC2 instances. If your models require different amounts of GPU memory and compute capacity, you can choose the appropriate accelerator size to attach to your CPU. For faster response times, load your models to an Elastic Inference accelerator once and continue making inference calls on multiple accelerators without unloading any models for each call. By attaching multiple accelerators to a single instance, you avoid deploying multiple fleets of CPU or GPU instances and the associated cost. For more information on attaching multiple accelerators to a single instance, see Using TensorFlow Models with Elastic Inference, Using MXNet Models with Elastic Inference , and Using PyTorch Models with Elastic Inference.

NoteAttaching multiple Elastic Inference accelerators to a single Amazon EC2 instance requires that the instance has AWS Deep Learning AMI (DLAMI) version 25 or later. For more information on the AWS Deep Learning AMI, see What Is the AWS Deep Learning AMI?.

An Elastic Inference accelerator is not part of the hardware that makes up your instance. Instead, the accelerator is attached through the network using an AWS PrivateLink endpoint service. The endpoint service routes traﬃc from your instance to the Elastic Inference accelerator conﬁgured with your instance.

NoteAn Elastic Inference accelerator cannot be modiﬁed through the management console of your instance.

Before you launch an instance with an Elastic Inference accelerator, you must create an AWS PrivateLink endpoint service. Only a single endpoint service is needed in every Availability Zone to connect instances with Elastic Inference accelerators. A single endpoint service can span multiple Availability Zones. For more information, see VPC Endpoint Services (AWS PrivateLink).

(7)

You can use Amazon Elastic Inference enabled TensorFlow, TensorFlow Serving, Apache MXNet, or PyTorch libraries to load models and make inference calls. The modiﬁed versions of these frameworks automatically detect the presence of Elastic Inference accelerators. They then optimally distribute the model operations between the Elastic Inference accelerator and the CPU of the instance. The AWS Deep Learning AMIs include the latest releases of Amazon Elastic Inference enabled TensorFlow, TensorFlow Serving, MXNet, and PyTorch. If you are using custom AMIs or container images, you can download and install the required TensorFlow, Apache MXNet, and PyTorch libraries from Amazon S3.

Elastic Inference Uses

You can use Elastic Inference in the following use cases:

• For Elastic Inference-enabled TensorFlow and TensorFlow 2 with Python, see Using TensorFlow Models with Elastic Inference (p. 11)

• For Elastic Inference-enabled MXNet with Python, Java, and Scala, see Using MXNet Models with Elastic Inference (p. 30)

• For Elastic Inference-enabled PyTorch with Python, see Using PyTorch Models with Elastic Inference (p. 62)

• For Elastic Inference with SageMaker, see MXNet Elastic Inference with SageMaker (p. 75)

• For Amazon Deep Learning Containers with Elastic Inference on Amazon EC2, Amazon ECS, and SageMaker, see Using Amazon Deep Learning Containers With Elastic Inference (p. 76)

• For security information on Elastic Inference, see Security in Amazon Elastic Inference (p. 90)

• To troubleshoot your Elastic Inference workﬂow, see Troubleshooting (p. 99)

Before you get started with Amazon Elastic Inference

Amazon Elastic Inference Service Limits

Before you start using Elastic Inference accelerators, be aware of the following limitations:

(8)

Limit Description Elastic Inference

accelerator instance limit

You can attach up to ﬁve Elastic Inference accelerators by default to each instance at a time, and only during instance launch. This is adjustable. We recommend testing the optimal setup before deploying to production.

Elastic Inference

Sharing You cannot share

an Elastic Inference accelerator between instances.

Elastic Inference

Transfer You cannot detach

an Elastic Inference accelerator from an instance or transfer it to another instance.

If you no longer need an Elastic Inference accelerator, you must terminate your instance. You cannot change the Elastic Inference accelerator type. Terminate the instance and launch a new instance with a diﬀerent Elastic Inference accelerator speciﬁcation.

Supported Libraries Only the Amazon Elastic Inference enhanced MXNet, TensorFlow, and PyTorch libraries can make inference calls to Elastic Inference accelerators.

Elastic Inference

Attachment Elastic Inference accelerators can only be attached to instances in a VPC.

Reserving accelerator

capacity Pricing for Elastic Inference accelerators is available at On- Demand Instance rates only. You can attach an accelerator to a

(9)

Limit Description Reserved Instance, Scheduled Reserved Instance, or Spot Instance. However, the On-Demand Instance price for the Elastic Inference accelerator applies. You cannot reserve or schedule Elastic Inference accelerator capacity.

Choosing an Instance and Accelerator Type for Your Model

Demands on CPU compute resources, CPU memory, GPU-based acceleration, and GPU memory vary signiﬁcantly between diﬀerent types of deep learning models. The latency and throughput requirements of the application also determine the amount of instance compute and Elastic Inference acceleration you need. Consider the following when you choose an instance and accelerator type combination for your model:

• Before you evaluate the right combination of resources for your model or application stack, you should determine the target latency, throughput needs, and constraints. For example, let's assume your application must respond within 300 milliseconds (ms). If data retrieval (including any authentication) and preprocessing takes 200ms, you have a 100-ms window to work with for the inference request.

Using this analysis, you can determine the lowest cost infrastructure combination that meets these targets.

• Start with a reasonably small combination of resources. For example, a budget-friendly c5.xlarge CPU instance type along with an eia2.medium accelerator type. This combination has been tested to work well for various computer vision workloads (including a large version of ResNet: ResNet-200). The combination gives comparable or better performance than a more costly p2.xlarge GPU instance.

You can then resize the instance or accelerator type depending on your latency targets.

• I/O data transfer between instance and accelerator adds to inference latency because Elastic Inference accelerators are attached over the network.

• If you use multiple models with your accelerator, you might need a larger accelerator size to better support both compute and memory needs. This also applies if you use the same model from multiple application processes on the instance.

• You can convert your model to mixed precision, which uses the higher FP16 TFLOPS of the accelerator, to provide lower latency and higher performance.

Using Amazon Elastic Inference with EC2 Auto Scaling

When you create an Auto Scaling group, you can specify the information required to conﬁgure the Amazon EC2 instances. This includes Elastic Inference accelerators. To do this, specify a launch template with your instance conﬁguration and the Elastic Inference accelerator type.

(10)

Working with Amazon Elastic Inference

To work with Amazon Elastic Inference, set up and launch your Amazon Elastic Compute Cloud instance with Elastic Inference. After that, use Elastic Inference accelerators that are powered by the Elastic Inference enabled versions of TensorFlow, TensorFlow Serving, Apache MXNet (MXNet), and PyTorch. You can do this with few changes to your code.

Topics

• Setting Up to Launch Amazon EC2 with Elastic Inference (p. 6)

• Using TensorFlow Models with Elastic Inference (p. 11)

• Using MXNet Models with Elastic Inference (p. 30)

• Using PyTorch Models with Elastic Inference (p. 62)

• Monitoring Elastic Inference Accelerators (p. 70)

• MXNet Elastic Inference with SageMaker (p. 75)

Setting Up to Launch Amazon EC2 with Elastic Inference

The most convenient way to set up Amazon EC2 with Elastic Inference uses the Elastic Inference setup script described in https://aws.amazon.com/blogs/machine-learning/launch-ei-accelerators-in-minutes- with-the-amazon-elastic-inference-setup-tool-for-ec2/. To manually launch an instance and associate it with an Elastic Inference accelerator, first configure your security groups and AWS PrivateLink endpoint services. Then, configure an instance role with the Elastic Inference policy.

Topics

• Conﬁguring Your Security Groups for Elastic Inference (p. 6)

• Conﬁguring AWS PrivateLink Endpoint Services (p. 7)

• Conﬁguring an Instance Role with an Elastic Inference Policy (p. 8)

• Launching an Instance with Elastic Inference (p. 9)

Conﬁguring Your Security Groups for Elastic Inference

You need two security groups. One for inbound and outbound traﬃc for the new Elastic Inference VPC endpoint. A second one for outbound traﬃc for the associated Amazon EC2 instances that you launch.

Conﬁgure Your Security Groups for Elastic Inference

To conﬁgure a security group for an Elastic Inference accelerator (console) 1. Open the Amazon VPC console at https://console.aws.amazon.com/vpc/.

2. In the left navigation pane, choose Security, Security Groups.

(11)

3. Choose Create Security Group

4. Under Create Security Group, specify a name and description for the security group and choose the ID of the VPC. Choose Create and then choose Close.

5. Select the check box next to your security group and choose Actions, Edit inbound rules. Add a rule to allow HTTPS traﬃc on port 443 as follows:

a. Choose Add Rule.

b. For Type, select HTTPS.

c. For Source, specify a CIDR block (for example, 0.0.0.0/0) or the security group for your instance.

d. To allow traﬃc for port 22 to the EC2 instance, repeat the procedure. For Type, select SSH.

e. Choose Save rules and then choose Close.

6. Choose Edit outbound rules. Choose Add rule. To allow traﬃc for all ports, for Type, select All Traﬃc.

7. Choose Save rules.

To conﬁgure a security group for an Elastic Inference accelerator (AWS CLI) 1. Create a security group using the create-security-group command:

aws ec2 create-security-group

--description insert a description for the security group --group-name assign a name for the security group

[--vpc-id enter the VPC ID]

2. Create inbound rules using the authorize-security-group-ingress command:

aws ec2 authorize-security-group-ingress --group-id insert the security group ID -- protocol tcp --port 443 --cidr 0.0.0.0/0

aws ec2 authorize-security-group-ingress --group-id insert the security group ID -- protocol tcp --port 22 --cidr 0.0.0.0/0

3. The default setting for outbound rules allows all traﬃc from all ports for this instance.

Conﬁguring AWS PrivateLink Endpoint Services

Elastic Inference uses VPC endpoints to privately connect the instance in your VPC with their associated Elastic Inference accelerator. Create a VPC endpoint for Elastic Inference before you launch instances with accelerators. This needs to be done just one time per VPC. For more information, see Interface VPC Endpoints (AWS PrivateLink).

To conﬁgure an AWS PrivateLink endpoint service (console)

1. Open the Amazon VPC console at https://console.aws.amazon.com/vpc/.

2. In the left navigation pane, choose Endpoints, Create Endpoint.

3. For Service category, choose Find service by name.

4. For Service Name, select com.amazonaws.<your-region>.elastic-inference.runtime.

For example, for the us-west-2 Region, select com.amazonaws.us-west-2.elastic- inference.runtime.

5. For Subnets, select one or more Availability Zones where the endpoint should be created. Where you plan to launch instances with accelerators, you must select subnets for the Availability Zone.

(12)

6. Enable the private DNS name and enter the security group for your endpoint. Choose Create endpoint. Note the VPC endpoint ID for later.

7. The security group that we conﬁgured for the endpoint in previous steps must allow inbound traﬃc to port 443.

To conﬁgure an AWS PrivateLink endpoint service (AWS CLI)

• Use the https://docs.aws.amazon.com/cli/latest/reference/ec2/create-vpc-endpoint.html command and specify the following: VPC ID, type of VPC endpoint (interface), service name, subnets to use the endpoint, and security groups to associate with the endpoint network interfaces. For information about how to set up a security group for your VPC endpoint, see the section called “Conﬁguring Your Security Groups for Elastic Inference” (p. 6).

aws ec2 create-vpc-endpoint --vpc-id vpc-insert VPC ID --vpc-endpoint-type Interface --service-name com.amazonaws.us-west-2.elastic-inference.runtime --subnet-id

subnet-insert subnet --security-group-id sg-insert security group ID

Conﬁguring an Instance Role with an Elastic Inference Policy

To launch an instance with an Elastic Inference accelerator, you must provide an IAM role that allows actions on Elastic Inference accelerators.

To conﬁgure an instance role with an Elastic Inference policy (console) 1. Open the IAM console at https://console.aws.amazon.com/iam/.

2. In the left navigation pane, choose Policies, Create Policy.

3. Choose JSON and paste the following policy:

{

"Version": "2012-10-17", "Statement": [

{

"Effect": "Allow", "Action": [

"elastic-inference:Connect", "iam:List*",

"iam:Get*", "ec2:Describe*", "ec2:Get*"

],

"Resource": "*"

} ] }

NoteYou may get a warning message about the Elastic Inference service not being recognizable.

This is a known issue and does not block creation of the policy.

4. Choose Review policy and enter a name for the policy, such as ec2-role-trust-policy.json, and a description.

5. Choose Create policy.

6. In the left navigation pane, choose Roles, Create role.

7. Choose AWS service, EC2, Next: Permissions.

(13)

8. Select the name of the policy that you just created (ec2-role-trust-policy.json). Choose Next: Tags.

9. Provide a role name and choose Create Role.

When you create your instance, select the role under Conﬁgure Instance Details in the launch wizard.

To conﬁgure an instance role with an Elastic Inference policy (AWS CLI)

• To conﬁgure an instance role with an Elastic Inference policy, follow the steps in Creating an IAM Role. Add the following policy to your instance:

{

"Version": "2012-10-17", "Statement": [

{

"Effect": "Allow", "Action": [

"elastic-inference:Connect", "iam:List*",

"iam:Get*", "ec2:Describe*", "ec2:Get*"

],

"Resource": "*"

} ] }

Note

You may get a warning message about the Elastic Inference service not being recognizable.

This is a known issue and does not block creation of the policy.

Launching an Instance with Elastic Inference

You can now conﬁgure Amazon EC2 instances with accelerators to launch within your subnet. Choose any supported Amazon EC2 instance type and Elastic Inference accelerator size. Elastic Inference accelerators are available to all current generation instance types. There are two accelerator types.

EIA2 is the second generation of Elastic Inference accelerators. It offers improved performance and increased memory. With up to 8 GB of GPU memory, EIA2 is a cost-effective resource for deploying machine learning (ML) models. Use it for applications such as image classification, object detection, automated speech recognition, and language translation. Your accelerator memory choices depend on the size of your input and models. You can choose from the following Elastic Inference accelerators:

• eia2.medium with 2 GB of accelerator memory

• eia2.large with 4 GB of accelerator memory

• eia2.xlarge with 8 GB of accelerator memory

Note: We continue to support EIA1 in three sizes: eia1.medium, eia1.large, and eia1.xlarge

You can launch an instance with Elastic Inference automatically by using the Amazon Elastic Inference setup tool for EC2, or manually using the console or AWS Command Line Interface.

To launch an instance with Elastic Inference (console)

1. Open the Amazon EC2 console at https://console.aws.amazon.com/ec2/.

(14)

2. Choose Launch Instance.

3. Under Choose an Amazon Machine Image, select an Amazon Linux or Ubuntu AMI. We recommend one of the Deep Learning AMIs.

NoteAttaching multiple Elastic Inference accelerators to a single Amazon EC2 instance requires that the instance has AWS Deep Learning AMI (DLAMI) Version 25 or later.

4. Under Choose an Instance Type, select the hardware conﬁguration of your instance.

5. Choose Next: Conﬁgure Instance Details.

6. Under Configure Instance Details, check the configuration settings. Ensure that you are using the VPC with the security groups for the instance and the Elastic Inference accelerator that you set up earlier. For more information, see Configuring Your Security Groups for Elastic Inference (p. 6).

7. For IAM role, select the role that you created in the Conﬁguring an Instance Role with an Elastic Inference Policy (p. 8) procedure.

8. Select Add an Elastic Inference accelerator.

9. Select the size and amount of Elastic Inference accelerators. Your options are: eia2.medium, eia2.large, and eia2.xlarge.

10. To add another Elastic Inference accelerator, choose Add. Then select the size and amount of accelerators to add.

11. (Optional) You can choose to add storage and tags by choosing Next at the bottom of the page. Or, you can let the instance wizard complete the remaining conﬁguration steps for you.

12. In the Add Security Group step, choose the security group created previously.

13. Review the conﬁguration of your instance and choose Launch.

14. You are prompted to choose an existing key pair for your instance or to create a new key pair. For more information, see Amazon EC2 Key Pairs..

Warning

Don’t select the Proceed without a key pair option. If you launch your instance without a key pair, then you can’t connect to it.

15. After making your key pair selection, choose Launch Instances.

16. A conﬁrmation page lets you know that your instance is launching. To close the conﬁrmation page and return to the console, choose View Instances.

17. Under Instances, you can view the status of the launch. It takes a short time for an instance to launch. When you launch an instance, its initial state is pending. After the instance starts, its state changes to running.

18. It can take a few minutes for the instance to be ready so that you can connect to it. Check that your instance has passed its status checks. You can view this information in the Status Checks column.

To launch an instance with Elastic Inference (AWS CLI)

To launch an instance with Elastic Inference at the command line, you need your key pair name, subnet ID, security group ID, AMI ID, and the name of the instance profile that you created in the section Configuring an Instance Role with an Elastic Inference Policy (p. 8). For the security group ID, use the one you created for your instance that contains the AWS PrivateLink endpoint. For more information, see Configuring Your Security Groups for Elastic Inference (p. 6)). For more information about the AMI ID, see Finding a Linux AMI.

1. Use the run-instances command to launch your instance and accelerator:

aws ec2 run-instances --image-id ami-image ID --instance-type m5.large --subnet-id subnet-subnet ID --elastic-inference-accelerator Type=eia2.large --key-name key pair name --security-group-ids sg-security group ID --iam-instance-profile Name="accelerator profile name"

(15)

To launch an instance with multiple accelerators, you can add multiple Type parameters to -- elastic-inference-accelerator.

aws ec2 run-instances --image-id ami-image ID --instance-type m5.large --subnet- id subnet-subnet ID --elastic-inference-accelerator Type=eia2.large,Count=2

Type=eia2.xlarge --key-name key pair name --region region name --security-group-ids sg-security group ID

2. When the run-instances operation succeeds, your output is similar to the following. The ElasticInferenceAcceleratorArn identiﬁes the Elastic Inference accelerator.

"ElasticInferenceAcceleratorAssociations": [ {

"ElasticInferenceAcceleratorArn": "arn:aws:elastic-

inference:us-west-2:204044812891:elastic-inference-accelerator/

eia-3e1de7c2f64a4de8b970c205e838af6b",

"ElasticInferenceAcceleratorAssociationId": "eia-assoc-031f6f53ddcd5f260", "ElasticInferenceAcceleratorAssociationState": "associating",

"ElasticInferenceAcceleratorAssociationTime": "2018-10-05T17:22:20.000Z"

} ],

You are now ready to run your models using TensorFlow, MXNet, or PyTorch on the provided AMI.

Once your Elastic Inference accelerator is running, you can use the describe-accelerators AWS CLI.

This command returns information about the accelerator, such as the region it is in and the name of the accelerator. For more information about the usage of this command, see the Elastic Inference AWS CLI Command Reference.

Using TensorFlow Models with Elastic Inference

Amazon Elastic Inference (Elastic Inference) is available only on instances that were launched with an Elastic Inference accelerator.

The Elastic Inference enabled version of TensorFlow allows you to use Elastic Inference accelerators with minimal changes to your TensorFlow code.

Topics

• Elastic Inference Enabled TensorFlow (p. 11)

• Additional Requirements and Considerations (p. 12)

• TensorFlow Elastic Inference with Python (p. 12)

• TensorFlow 2 Elastic Inference with Python (p. 21)

Elastic Inference Enabled TensorFlow

Preinstalled EI Enabled TensorFlow

The Elastic Inference enabled packages are available in the AWS Deep Learning AMI. AWS Deep Learning AMIs come with supported TensorFlow version and ei_for_tf pre-installed. Elastic Inference enabled TensorFlow 2 requires AWS Deep Learning AMI v28 or higher. You also have Docker container options with Using Amazon Deep Learning Containers With Elastic Inference (p. 76).

(16)

Installing EI Enabled TensorFlow

If you're not using a AWS Deep Learning AMI instance, you can download the packages from the Amazon S3 bucket to build it in to your own Amazon Linux or Ubuntu AMIs.

Install ei_for_tf.

pip install -U ei_for_tf*.whl

If the TensorFlow version is lower than the required version, pip upgrades TensorFlow to the appropriate version. If the TensorFlow version is higher than the required version, there will be a warning about the incompatibility. Your program fails at run-time if the TensorFlow version incompatibility isn’t ﬁxed.

Additional Requirements and Considerations

TensorFlow 2.0 Diﬀerences

Starting with TensorFlow 2.0, the Elastic Inference package is a separate pip wheel, instead of an enhanced TensorFlow pip wheel. The prefix for import statements for the Elastic Inference specific API have changed from tensorflow.contrib.ei to ei_for_tf.

To see the compatible TensorFlow version for a speciﬁc ei_for_tf version, see the ei_for_tf_compatibility.txt ﬁle in the Amazon S3 bucket.

Model Formats Supported

Elastic Inference supports the TensorFlow saved_model format via TensorFlow Serving.

Warmup

Elastic Inference TensorFlow Serving provides a warmup feature to preload models and reduce the delay that is typical of the first inference request. Amazon Elastic Inference TensorFlow Serving only supports warming up the "fault-finders" signature definition.

Amazon Elastic Inference supports SageMaker Neo compiled TensorFlow models

Amazon Elastic Inference supports TensorFlow 2 models optimized by SageMaker Neo. A pre-trained TensorFlow model can be compiled in SageMaker Neo with EIA as the target device. The resulting model artifacts can be used for inference in Elastic Inference Accelerators. This functionality only works for ei_for_tf version 1.6 and greater. For more information, see Use Elastic Inference with SageMaker Neo compiled models (p. 30).

TensorFlow Elastic Inference with Python

With Elastic Inference TensorFlow Serving, the standard TensorFlow Serving interface remains unchanged. The only diﬀerence is that the entry point is a diﬀerent binary named amazonei_tensorflow_model_server.

TensorFlow Serving and Predictor are the only inference modes that Elastic Inference supports. If you haven't tried TensorFlow Serving before, we recommend that you try the TensorFlow Serving tutorial ﬁrst.

This release of Elastic Inference TensorFlow Serving has been tested to perform well and provide cost- saving beneﬁts with the following deep learning use cases and network architectures (and similar variants):

(17)

Use Case Example Network Topology

Image Recognition Inception, ResNet, MVCNN

Object Detection SSD, RCNN

Neural Machine Translation GNMT

NoteThese tutorials assume usage of a DLAMI with v26 or later, and Elastic Inference enabled Tensorﬂow.

Topics

• Activate the Tensorﬂow Elastic Inference Environment (p. 13)

• Use Elastic Inference with TensorFlow Serving (p. 13)

• Use Elastic Inference with the TensorFlow EIPredictor API (p. 15)

• Use Elastic Inference with TensorFlow Predictor Example (p. 17)

• Use Elastic Inference with the TensorFlow Keras API (p. 19)

Activate the Tensorﬂow Elastic Inference Environment

1. • (Option for Python 3) - Activate the Python 3 TensorFlow Elastic Inference environment:

$ source activate amazonei_tensorflow_p36

• (Option for Python 2) - Activate the Python 2.7 TensorFlow Elastic Inference environment:

$ source activate amazonei_tensorflow_p27

2. The remaining parts of this guide assume you are using the amazonei_tensorflow_p27 environment.

If you are switching between Elastic Inference enabled MXNet, TensorFlow, or PyTorch environments, you must stop and then start your instance in order to reattach the Elastic Inference accelerator. Rebooting is not suﬃcient since the process requires a complete shut down.

Use Elastic Inference with TensorFlow Serving

The following is an example of serving a Single Shot Detector (SSD) with a ResNet backbone.

Serve and Test Inference with an Inception Model 1. Download the model.

curl -O https://s3-us-west-2.amazonaws.com/aws-tf-serving-ei-example/ssd_resnet.zip 2. Unzip the model.

unzip ssd_resnet.zip -d /tmp

3. Download a picture of three dogs to your home directory.

(18)

curl -O https://raw.githubusercontent.com/awslabs/mxnet-model-server/master/docs/

images/3dogs.jpg

4. Use the built-in EI Tool to get the device ordinal number of all attached Elastic Inference accelerators. For more information on EI Tool, see Monitoring Elastic Inference Accelerators.

/opt/amazon/ei/ei_tools/bin/ei describe-accelerators --json

Your output should look like the following:

{

"ei_client_version": "1.5.0", "time": "Fri Nov 1 03:09:38 2019", "attached_accelerators": 2,

"devices": [ {

"ordinal": 0,

"type": "eia1.xlarge",

"id": "eia-679e4c622d584803aed5b42ab6a97706", "status": "healthy"

}, {

"ordinal": 1,

"id": "eia-6c414c6ee37a4d93874afc00825c2f28", "status": "healthy"

} ] }

5. Navigate to the folder where AmazonEI_TensorFlow_Serving is installed and run the following command to launch the server. Set EI_VISIBLE_DEVICES to the device ordinal or device ID of the attached Elastic Inference accelerator that you want to use. This device will then be accessible using id 0. For more information on EI_VISIBLE_DEVICES, see Monitoring Elastic Inference Accelerators. Note, model_base_path must be an absolute path.

EI_VISIBLE_DEVICES=<ordinal number> amazonei_tensorflow_model_server -- model_name=ssdresnet --model_base_path=/tmp/ssd_resnet50_v1_coco --port=9000

6. While the server is running in the foreground, launch another terminal session. Open a new terminal and activate the TensorFlow environment.

source activate amazonei_tensorflow_p27

7. Use your preferred text editor to create a script that has the following content. Name it ssd_resnet_client.py. This script will take an image ﬁlename as a parameter and get a prediction result from the pretrained model.

from __future__ import print_function import grpc

import tensorflow as tf from PIL import Image import numpy as np import time import os

from tensorflow_serving.apis import predict_pb2

from tensorflow_serving.apis import prediction_service_pb2_grpc

(19)

tf.app.flags.DEFINE_string('server', 'localhost:9000', 'PredictionService host:port')

tf.app.flags.DEFINE_string('image', '', 'path to image in JPEG format') FLAGS = tf.app.flags.FLAGS

coco_classes_txt = "https://raw.githubusercontent.com/amikelive/coco-labels/master/

coco-labels-paper.txt"

local_coco_classes_txt = "/tmp/coco-labels-paper.txt"

# it's a file like object and works just like a file

os.system("curl -o %s -O %s"%(local_coco_classes_txt, coco_classes_txt)) NUM_PREDICTIONS = 5

with open(local_coco_classes_txt) as f:

classes = ["No Class"] + [line.strip() for line in f.readlines()]

def main(_):

channel = grpc.insecure_channel(FLAGS.server)

stub = prediction_service_pb2_grpc.PredictionServiceStub(channel) # Send request

with Image.open(FLAGS.image) as f:

f.load()

# See prediction_service.proto for gRPC request/response details.

data = np.asarray(f)

data = np.expand_dims(data, axis=0) request = predict_pb2.PredictRequest() request.model_spec.name = 'ssdresnet' request.inputs['inputs'].CopyFrom(

tf.contrib.util.make_tensor_proto(data, shape=data.shape)) result = stub.Predict(request, 60.0) # 10 secs timeout outputs = result.outputs

detection_classes = outputs["detection_classes"]

detection_classes = tf.make_ndarray(detection_classes)

num_detections = int(tf.make_ndarray(outputs["num_detections"])[0]) print("%d detection[s]" % (num_detections))

class_label = [classes[int(x)]

for x in detection_classes[0][:num_detections]]

print("SSD Prediction is ", class_label)

if __name__ == '__main__':

tf.app.run()

8. Now run the script passing the server location, port, and the dog photo's ﬁlename as the parameters.

python ssd_resnet_client.py --server=localhost:9000 --image 3dogs.jpg

Use Elastic Inference with the TensorFlow EIPredictor API

Elastic Inference TensorFlow packages for Python 2 and 3 provide an EIPredictor API. This API function provides you with a ﬂexible way to run models on Elastic Inference accelerators as an alternative to using TensorFlow Serving. The EIPredictor API provides a simple interface to perform repeated inference on a pretrained model. The following code sample shows the available parameters.

Note

accelerator_id should be set to the device's ordinal number, not its ID.

ei_predictor = EIPredictor(model_dir, signature_def_key=None, signature_def=None,

(20)

input_names=None, output_names=None, tags=None,

graph=None, config=None, use_ei=True,

accelerator_id=<device ordinal number>)

output_dict = ei_predictor(feed_dict)

EIPredictor can be used in the following ways:

//EIPredictor class picks inputs and outputs from default serving signature def with tag “serve”. (similar to TF predictor)

ei_predictor = EIPredictor(model_dir)

//EI Predictor class picks inputs and outputs from the signature def picked using the signtaure_def_key (similar to TF predictor)

ei_predictor = EIPredictor(model_dir, signature_def_key='predict') // Signature_def can be provided directly (similar to TF predictor) ei_predictor = EIPredictor(model_dir, signature_def= sig_def) // You provide the input_names and output_names dict.

// similar to TF predictor

ei_predictor = EIPredictor(model_dir, input_names, output_names)

// tag is used to get the correct signature def. (similar to TF predictor) ei_predictor = EIPredictor(model_dir, tags='serve')

Additional EI Predictor functionality includes:

• Support for frozen models.

// For Frozen graphs, model_dir takes a file name , input_names and output_names // input_names and output_names should be provided in this case.

ei_predictor = EIPredictor(model_dir,

input_names=None, output_names=None )

• Ability to disable use of Elastic Inference by using the use_ei ﬂag, which defaults to True. This is useful for testing EIPredictor against TensorFlow Predictor.

• EIPredictor can also be created from a TensorFlow Estimator. Given a trained Estimator, you can ﬁrst export a SavedModel. See the SavedModel documentation for more details. The following shows example usage:

saved_model_dir = estimator.export_savedmodel(my_export_dir, serving_input_fn) ei_predictor = EIPredictor(export_dir=saved_model_dir)

// Once the EIPredictor is created, inference is done using the following:

(21)

Use Elastic Inference with TensorFlow Predictor Example

Installing Elastic Inference TensorFlow

Elastic Inference enabled TensorFlow comes bundled in the AWS Deep Learning AMI. You can also download pip wheels for Python 2 and 3 from the Elastic Inference S3 bucket. Follow these instructions to download and install the pip package:

Choose the tar ﬁle for the Python version and operating system of your choice from the S3 bucket. Copy the path to the tar ﬁle and run the following command:

curl -O [URL of the tar file of your choice]

To open the tar the ﬁle:

tar -xvzf [name of tar file]

Try the following example to serve diﬀerent models, such as ResNet, using a Single Shot Detector (SSD).

Serve and Test Inference with an SSD Model

1. Download the model. If you already downloaded the model in the Serving example, skip this step.

curl -O https://s3-us-west-2.amazonaws.com/aws-tf-serving-ei-example/ssd_resnet.zip 2. Unzip the model. Again, you may skip this step if you already have the model.

3. Download a picture of three dogs to your current directory.

images/3dogs.jpg

{ "ei_client_version": "1.5.0", "time": "Fri Nov 1 03:09:38 2019", "attached_accelerators": 2,

"devices": [ {

"ordinal": 0,

}, {

"ordinal": 1,

(22)

} ]}

You use the device ordinal of your desired Elastic Inference accelerator to create a Predictor.

5. Open a text editor, such as vim, and paste the following inference script. Replace the

accelerator_id value with the device ordinal of the desired Elastic Inference accelerator. This value must be an integer. Save the ﬁle as ssd_resnet_predictor.py.

from __future__ import absolute_import from __future__ import division from __future__ import print_function import os

import sys

import numpy as np import tensorflow as tf

import matplotlib.image as mpimg

from tensorflow.contrib.ei.python.predictor.ei_predictor import EIPredictor tf.app.flags.DEFINE_string('image', '', 'path to image in JPEG format') FLAGS = tf.app.flags.FLAGS

def get_output(eia_predictor, test_input):

pred = None

for curpred in range(NUM_PREDICTIONS):

pred = eia_predictor(test_input)

num_detections = int(pred["num_detections"]) print("%d detection[s]" % (num_detections))

detection_classes = pred["detection_classes"][0][:num_detections]

print([classes[int(i)] for i in detection_classes])

def main(_):

img = mpimg.imread(FLAGS.image) img = np.expand_dims(img, axis=0) ssd_resnet_input = {'inputs': img}

print('Running SSD Resnet on EIPredictor using specified input and outputs') eia_predictor = EIPredictor(

model_dir='/tmp/ssd_resnet50_v1_coco/1/', input_names={"inputs": "image_tensor:0"},

output_names={"detection_classes": "detection_classes:0", "num_detections":

"num_detections:0",

"detection_boxes": "detection_boxes:0"}, accelerator_id=<device ordinal>

) get_output(eia_predictor, ssd_resnet_input)

print('Running SSD Resnet on EIPredictor using default Signature Def') eia_predictor = EIPredictor(

(23)

model_dir='/tmp/ssd_resnet50_v1_coco/1/', ) get_output(eia_predictor, ssd_resnet_input)

if __name__ == "__main__":

tf.app.run() 6. Run the inference script.

python ssd_resnet_predictor.py --image 3dogs.jpg

For more tutorials and examples, see the TensorFlow Python API.

Use Elastic Inference with the TensorFlow Keras API

The Keras API has become an integral part of the machine learning development cycle because of its simplicity and ease of use. Keras enables rapid prototyping and development of machine learning constructs. Elastic Inference provides an API that oﬀers native support for Keras. Using this API, you can directly use your Keras model, h5 ﬁle, and weights to instantiate a Keras-like Object. This object supports the native Keras prediction APIs, while fully utilizing Elastic Inference in the backend. The following code sample shows the available parameters:

EIKerasModel(model, weights=None, export_dir=None, ):

"""Constructs an `EIKerasModel` instance.

Args:

model: A model object that either has its weights already set, or will be set with the weights argument.

A model file that can be loaded

weights (Optional): A weights object, or weights file that can be loaded, and will be set to the model object

export_dir: A folder location to save your model as a SavedModelBundle Raises:

RuntimeError: If eager execution is enabled.

"""

EIKerasModel can be used as follows:

#Loading from Keras Model Object

from tensorflow.contrib.ei.python.keras.ei_keras import EIKerasModel model = Model()

# Build Keras Model in the normal fashion x = # input data

ei_model = EIKerasModel(model) # Only additional step to use EI res = ei_model.predict(x)

#Loading from Keras h5 File

from tensorflow.contrib.ei.python.keras.ei_keras import EIKerasModel x = # input data

ei_model = EIKerasModel("keras_model.h5") # Only additional step to use EI res = ei_model.predict(x)

#Loading from Keras h5 File and Weights file

from tensorflow.contrib.ei.python.keras.ei_keras import EIKerasModel x = # input data

(24)

ei_model = EIKerasModel("keras_model.json", weights="keras_weights.h5") # Only additional step to use EI

res = ei_model.predict(x)

Additionally, Elastic Inference enabled Keras includes Predict API Support:

tf.keras

def predict( x,

batch_size=None, verbose=0, steps=None,

max_queue_size=10, #Not supported workers=1, #Not Supported

use_multiprocessing=False): #Not Supported

Native Keras def predict( x,

callbacks=None) # Not supported

TensorFlow Keras API Example

In this example, you use a trained ResNet-50 model to classify an image of an African Elephant from ImageNet.

Test Inference with a Keras Model

1. Activate the Elastic Inference TensorFlow Conda Environment

source activate amazonei_tensorflow_p27

2. Download an image of an African Elephant to your current directory.

curl -O https://upload.wikimedia.org/wikipedia/commons/5/59/

Serengeti_Elefantenbulle.jpg

3. Open a text editor, such as vim, and paste the following inference script. Save the ﬁle as test_keras.py.

# Resnet Example

from tensorflow.keras.applications.resnet50 import ResNet50 from tensorflow.keras.preprocessing import image

from tensorflow.keras.applications.resnet50 import preprocess_input, decode_predictions from tensorflow.contrib.ei.python.keras.ei_keras import EIKerasModel

import numpy as np import time import os ITERATIONS = 20

model = ResNet50(weights='imagenet') ei_model = EIKerasModel(model)

folder_name = os.path.dirname(os.path.abspath(__file__)) img_path = folder_name + '/Serengeti_Elefantenbulle.jpg' img = image.load_img(img_path, target_size=(224, 224)) x = image.img_to_array(img)

x = np.expand_dims(x, axis=0) x = preprocess_input(x)

(25)

# Warm up both models _ = model.predict(x) _ = ei_model.predict(x)

# Benchmark both models

for each in range(ITERATIONS):

start = time.time() preds = model.predict(x)

print("Vanilla iteration %d took %f" % (each, time.time() - start)) for each in range(ITERATIONS):

start = time.time()

ei_preds = ei_model.predict(x)

print("EI iteration %d took %f" % (each, time.time() - start))

# decode the results into a list of tuples (class, description, probability)

# (one such list for each sample in the batch)

print('Predicted:', decode_predictions(preds, top=3)[0]) print('EI Predicted:', decode_predictions(ei_preds, top=3)[0]) 4. Run the inference script.

python test_keras.py

5. Your output should be a list of predictions, as well as their respective conﬁdence score.

('Predicted:', [(u'n02504458', u'African_elephant', 0.9081173), (u'n01871265', u'tusker', 0.07836755), (u'n02504013', u'Indian_elephant', 0.011482777)])

('EI Predicted:', [(u'n02504458', u'African_elephant', 0.90811676), (u'n01871265', u'tusker', 0.07836751), (u'n02504013', u'Indian_elephant', 0.011482781)])

TensorFlow 2 Elastic Inference with Python

With Elastic Inference TensorFlow 2 Serving, the standard TensorFlow 2 Serving interface remains unchanged. The only diﬀerence is that the entry point is a diﬀerent binary named amazonei_tensorflow2_model_server.

TensorFlow 2 Serving and Predictor are the only inference modes that Elastic Inference supports. If you haven't tried TensorFlow 2 Serving before, we recommend that you try the TensorFlow Serving tutorial ﬁrst.

This release of Elastic Inference TensorFlow Serving has been tested to perform well and provide cost- saving beneﬁts with the following deep learning use cases and network architectures (and similar variants):

Use Case Example Network Topology

Image Recognition Inception, ResNet, MVCNN

Object Detection SSD, RCNN

Neural Machine Translation GNMT

Note

These tutorials assume usage of a DLAMI with v42 or later, and Elastic Inference enabled Tensorﬂow 2.

Topics

(26)

• Activate the Tensorﬂow 2 Elastic Inference Environment (p. 22)

• Use Elastic Inference with TensorFlow 2 Serving (p. 22)

• Use Elastic Inference with the TensorFlow 2 EIPredictor API (p. 24)

• Use Elastic Inference with TensorFlow 2 Predictor Example (p. 25)

• Use Elastic Inference with the TensorFlow 2 Keras API (p. 27)

• Use Elastic Inference with SageMaker Neo compiled models (p. 30)

Activate the Tensorﬂow 2 Elastic Inference Environment

Activate the Python 3 TensorFlow 2 Elastic Inference environment:

$ source activate amazonei_tensorflow2_p36

Use Elastic Inference with TensorFlow 2 Serving

The following is an example of serving a Single Shot Detector (SSD) with a ResNet backbone.

To serve and test inference with an inception model 1. Download the model.

curl -O https://s3-us-west-2.amazonaws.com/aws-tf-serving-ei-example/ssd_resnet.zip 2. Unzip the model.

3. Download a picture of three dogs to your home directory.

images/3dogs.jpg

"devices": [ {

"ordinal": 0,

}, {

"ordinal": 1,

"id": "eia-6c414c6ee37a4d93874afc00825c2f28",

(27)

"status": "healthy"

} ]}

5. Navigate to the folder where AmazonEI_TensorFlow_Serving is installed and run the following command to launch the server. Set EI_VISIBLE_DEVICES to the device ordinal or device

ID of the attached Elastic Inference accelerator that you want to use. This device will then be accessible using id 0. model_base_path must be an absolute path. For more information on EI_VISIBLE_DEVICES, see Monitoring Elastic Inference Accelerators.

EI_VISIBLE_DEVICES=<ordinal number> amazonei_tensorflow2_model_server --model_name=ssdresnet

--model_base_path=/tmp/ssd_resnet50_v1_coco --port=9000

6. While the server is running in the foreground, launch another terminal session. Open a new terminal and activate the TensorFlow environment.

source activate amazonei_tensorflow2_p36

7. Use your preferred text editor to create a script that has the following content. Name it ssd_resnet_client.py. This script will take an image ﬁlename as a parameter and get a prediction result from the pretrained model.

from __future__ import print_function import grpc

import tensorflow as tf from PIL import Image import numpy as np import time import os

from tensorflow_serving.apis import predict_pb2

from tensorflow_serving.apis import prediction_service_pb2_grpc tf.compat.v1.app.flags.DEFINE_string('server', 'localhost:9000', 'PredictionService host:port')

tf.compat.v1.app.flags.DEFINE_string('image', '', 'path to image in JPEG format') FLAGS = tf.compat.v1.app.flags.FLAGS

def main(_):

channel = grpc.insecure_channel(FLAGS.server)

stub = prediction_service_pb2_grpc.PredictionServiceStub(channel) # Send request

with Image.open(FLAGS.image) as f:

f.load()

# See prediction_service.proto for gRPC request/response details.

data = np.asarray(f)

data = np.expand_dims(data, axis=0) request = predict_pb2.PredictRequest()

(28)

request.model_spec.name = 'ssdresnet' request.inputs['inputs'].CopyFrom(

tf.make_tensor_proto(data, shape=data.shape)) result = stub.Predict(request, 60.0) # 10 secs timeout outputs = result.outputs

detection_classes = outputs["detection_classes"]

detection_classes = tf.make_ndarray(detection_classes)

num_detections = int(tf.make_ndarray(outputs["num_detections"])[0]) print("%d detection[s]" % (num_detections))

class_label = [classes[int(x)]

for x in detection_classes[0][:num_detections]]

print("SSD Prediction is ", class_label)

if __name__ == '__main__':

tf.compat.v1.app.run()

8. Now run the script passing the server location, port, and the dog photo's ﬁlename as the parameters.

python ssd_resnet_client.py --server=localhost:9000 --image 3dogs.jpg

Use Elastic Inference with the TensorFlow 2 EIPredictor API

Elastic Inference TensorFlow packages for Python 3 provide an EIPredictor API. This API function

provides you with a ﬂexible way to run models on Elastic Inference accelerators as an alternative to using TensorFlow 2 Serving. The EIPredictor API provides a simple interface to perform repeated inference on a pretrained model. The following code sample shows the available parameters.

Note

accelerator_id should be set to the device's ordinal number, not its ID.

ei_predictor = EIPredictor(model_dir, signature_def_key=None, signature_def=None, input_names=None, output_names=None, tags=None,

graph=None, config=None, use_ei=True,

accelerator_id=<device ordinal number>)

You can use EIPredictor in the following ways:

//EIPredictor class picks inputs and outputs from default serving signature def with tag “serve”. (similar to TF predictor)

ei_predictor = EIPredictor(model_dir)

//EI Predictor class picks inputs and outputs from the signature def picked using the signtaure_def_key (similar to TF predictor)

ei_predictor = EIPredictor(model_dir, signature_def_key='predict') // Signature_def can be provided directly (similar to TF predictor) ei_predictor = EIPredictor(model_dir, signature_def= sig_def) // You provide the input_names and output_names dict.

(29)

// similar to TF predictor

ei_predictor = EIPredictor(model_dir, input_names, output_names)

// tag is used to get the correct signature def. (similar to TF predictor) ei_predictor = EIPredictor(model_dir, tags='serve')

Additional EI Predictor functionality includes the following:

• Support for frozen models.

// For Frozen graphs, model_dir takes a file name , input_names and output_names // input_names and output_names should be provided in this case.

ei_predictor = EIPredictor(model_dir,

input_names=None, output_names=None )

• Ability to disable use of Elastic Inference by using the use_ei ﬂag, which defaults to True. This is useful for testing EIPredictor against TensorFlow 2 Predictor.

• EIPredictor can also be created from a TensorFlow 2 Estimator. Given a trained Estimator, you can ﬁrst export a SavedModel. See the SavedModel documentation for more details. The following shows example usage:

saved_model_dir = estimator.export_savedmodel(my_export_dir, serving_input_fn) ei_predictor = EIPredictor(export_dir=saved_model_dir)

// Once the EIPredictor is created, inference is done using the following:

Use Elastic Inference with TensorFlow 2 Predictor Example

Installing Elastic Inference TensorFlow 2

Elastic Inference enabled TensorFlow 2 comes bundled in the AWS Deep Learning AMI. You can also download the pip wheels for Python 3 from the Elastic Inference S3 bucket. Follow these instructions to download and install the pip package:

1. Choose the tar ﬁle for the Python version and operating system of your choice from the S3 bucket.

Copy the path to the tar ﬁle and run the following command:

curl -O [URL of the tar file of your choice]

2. To open the tar the ﬁle, run the following command:

tar -xvzf [name of tar file]

3. Install the wheel using pip as shown in the following:

pip install -U [name of untarred folder]/[name of tensorflow whl]

To serve diﬀerent models, such as ResNet, using a Single Shot Detector (SSD), try the following example.

(30)

To serve and test inference with an SSD model

1. Download and unzip the model. If you already have the model, skip this step.

curl -O https://s3-us-west-2.amazonaws.com/aws-tf-serving-ei-example/ssd_resnet.zip unzip ssd_resnet.zip -d /tmp

2. Download a picture of three dogs to your current directory.

images/3dogs.jpg

"devices": [ {

"ordinal": 0,

}, {

"ordinal": 1,

} ] }

You use the device ordinal of your desired Elastic Inference accelerator to create a Predictor.

4. Open a text editor, such as vim, and paste the following inference script. Replace the

accelerator_id value with the device ordinal of the desired Elastic Inference accelerator. This value must be an integer. Save the ﬁle as ssd_resnet_predictor.py.

from __future__ import absolute_import from __future__ import division from __future__ import print_function import os

import sys

import numpy as np import tensorflow as tf

import matplotlib.image as mpimg

from ei_for_tf.python.predictor.ei_predictor import EIPredictor

tf.compat.v1.app.flags.DEFINE_string('image', '', 'path to image in JPEG format') FLAGS = tf.compat.v1.app.flags.FLAGS

(31)

def get_output(eia_predictor, test_input):

pred = None

for curpred in range(NUM_PREDICTIONS):

pred = eia_predictor(test_input)

num_detections = int(pred["num_detections"]) print("%d detection[s]" % (num_detections))

detection_classes = pred["detection_classes"][0][:num_detections]

print([classes[int(i)] for i in detection_classes])

def main(_):

img = mpimg.imread(FLAGS.image) img = np.expand_dims(img, axis=0) ssd_resnet_input = {'inputs': img}

print('Running SSD Resnet on EIPredictor using specified input and outputs') eia_predictor = EIPredictor(

model_dir='/tmp/ssd_resnet50_v1_coco/1/', input_names={"inputs": "image_tensor:0"},

output_names={"detection_classes": "detection_classes:0", "num_detections":

"num_detections:0",

"detection_boxes": "detection_boxes:0"}, accelerator_id=0

)

get_output(eia_predictor, ssd_resnet_input)

print('Running SSD Resnet on EIPredictor using default Signature Def') eia_predictor = EIPredictor(

model_dir='/tmp/ssd_resnet50_v1_coco/1/', ) get_output(eia_predictor, ssd_resnet_input)

if __name__ == "__main__":

tf.compat.v1.app.run() 5. Run the inference script.

python ssd_resnet_predictor.py --image 3dogs.jpg

Use Elastic Inference with the TensorFlow 2 Keras API

The Keras API has become an integral part of the machine learning development cycle because of its simplicity and ease of use. Keras enables rapid prototyping and development of machine learning constructs. Elastic Inference provides an API that oﬀers native support for Keras. Using this API, you can directly use your Keras model, h5 ﬁle, and weights to instantiate a Keras-like Object. This object supports the native Keras prediction APIs, while fully utilizing Elastic Inference in the backend. Currently, EIKerasModel is only supported in Graph Mode. The following code sample shows the available parameters:

(32)

EIKerasModel(model, weights=None, export_dir=None, ):

"""Constructs an `EIKerasModel` instance.

Args:

model: A model object that either has its weights already set, or will be set with the weights argument.

A model file that can be loaded

weights (Optional): A weights object, or weights file that can be loaded, and will be set to the model object

export_dir: A folder location to save your model as a SavedModelBundle Raises:

RuntimeError: If eager execution is enabled.

"""

EIKerasModel can be used as follows:

#Loading from Keras Model Object

from ei_for_tf.python.keras.ei_keras import EIKerasModel model = Model()

# Build Keras Model in the normal fashion x = # input data

ei_model = EIKerasModel(model) # Only additional step to use EI res = ei_model.predict(x)

#Loading from Keras h5 File

from ei_for_tf.python.keras.ei_keras import EIKerasModel x = # input data

ei_model = EIKerasModel("keras_model.h5") # Only additional step to use EI res = ei_model.predict(x)

#Loading from Keras h5 File and Weights file

from ei_for_tf.python.keras.ei_keras import EIKerasModel x = # input data

ei_model = EIKerasModel("keras_model.json", weights="keras_weights.h5") # Only additional step to use EI

res = ei_model.predict(x)

Additionally, Elastic Inference enabled Keras includes Predict API Support as follows:

tf.keras

def predict( x,

max_queue_size=10, #Not supported workers=1, #Not Supported

use_multiprocessing=False): #Not Supported

Native Keras def predict( x,

callbacks=None) # Not supported

(33)

TensorFlow 2 Keras API Example

In this example, you use a trained ResNet-50 model to classify an image of an African Elephant from ImageNet.

To test inference with a Keras model

1. Activate the Elastic Inference TensorFlow Conda Environment

source activate amazonei_tensorflow2_p36

2. Download an image of an African Elephant to your current directory.

curl -O https://upload.wikimedia.org/wikipedia/commons/5/59/

Serengeti_Elefantenbulle.jpg

3. Open a text editor, such as vim, and paste the following inference script. Save the ﬁle as test_keras.py.

# Resnet Example

from tensorflow.keras.applications.resnet50 import ResNet50 from tensorflow.keras.preprocessing import image

from tensorflow.keras.applications.resnet50 import preprocess_input, decode_predictions from ei_for_tf.python.keras.ei_keras import EIKerasModel

import numpy as np import time import os

import tensorflow as tf

tf.compat.v1.disable_eager_execution()

ITERATIONS = 20

model = ResNet50(weights='imagenet') ei_model = EIKerasModel(model)

folder_name = os.path.dirname(os.path.abspath(__file__)) img_path = folder_name + '/Serengeti_Elefantenbulle.jpg' img = image.load_img(img_path, target_size=(224, 224)) x = image.img_to_array(img)

x = np.expand_dims(x, axis=0) x = preprocess_input(x)

# Warm up both models _ = model.predict(x) _ = ei_model.predict(x)

# Benchmark both models

for each in range(ITERATIONS):

start = time.time() preds = model.predict(x)

print("Vanilla iteration %d took %f" % (each, time.time() - start)) for each in range(ITERATIONS):

start = time.time()

ei_preds = ei_model.predict(x)

print("EI iteration %d took %f" % (each, time.time() - start))

# decode the results into a list of tuples (class, description, probability)

# (one such list for each sample in the batch)

print('Predicted:', decode_predictions(preds, top=3)[0]) print('EI Predicted:', decode_predictions(ei_preds, top=3)[0]) 4. Run the inference script as follows:

python test_keras.py

Amazon Elastic Inference

Amazon Elastic Inference

Developer Guide

Amazon Elastic Inference: Developer Guide

Table of Contents

What Is Amazon Elastic Inference?

Prerequisites

Pricing for Amazon Elastic Inference

Elastic Inference Uses

Amazon Elastic Inference Basics

Elastic Inference Uses

Before you get started with Amazon Elastic Inference

Amazon Elastic Inference Service Limits

Choosing an Instance and Accelerator Type for Your Model

Using Amazon Elastic Inference with EC2 Auto Scaling

Working with Amazon Elastic Inference

Setting Up to Launch Amazon EC2 with Elastic Inference

Conﬁguring Your Security Groups for Elastic Inference

Conﬁgure Your Security Groups for Elastic Inference

Conﬁguring AWS PrivateLink Endpoint Services

Conﬁguring an Instance Role with an Elastic Inference Policy

Launching an Instance with Elastic Inference

Using TensorFlow Models with Elastic Inference

Elastic Inference Enabled TensorFlow

Preinstalled EI Enabled TensorFlow

Installing EI Enabled TensorFlow

Additional Requirements and Considerations

TensorFlow Elastic Inference with Python

Activate the Tensorﬂow Elastic Inference Environment

Use Elastic Inference with TensorFlow Serving

Use Elastic Inference with the TensorFlow EIPredictor API

Use Elastic Inference with TensorFlow Predictor Example

Use Elastic Inference with the TensorFlow Keras API

TensorFlow Keras API Example

TensorFlow 2 Elastic Inference with Python

Activate the Tensorﬂow 2 Elastic Inference Environment

Use Elastic Inference with TensorFlow 2 Serving

Use Elastic Inference with the TensorFlow 2 EIPredictor API

Use Elastic Inference with TensorFlow 2 Predictor Example

Use Elastic Inference with the TensorFlow 2 Keras API

TensorFlow 2 Keras API Example