Loading From Model Zoo - Amazon Elastic Inference

You can also use pre-trained models from Gluon model zoo as shown in the following:

NoteAll pre-trained models expect inputs to be normalized in the same way as described in the model zoo documentation. with open('synset.txt', 'r') as f:

labels = [l.rstrip() for l in f]

fname = mx.test_utils.download('https://github.com/dmlc/web-data/blob/master/mxnet/doc/

tutorials/python/predict_image/cat.jpg?raw=true')

img = mx.image.imread(fname) # convert into format (batch, RGB, width, height) img = img.as_in_context(ctx) # image must be with EIA context

img = mx.image.imresize(img, 224, 224) # resize

img = mx.image.color_normalize(img.astype(dtype='float32')/255,

mean=mx.nd.array([0.485, 0.456, 0.406]),

std=mx.nd.array([0.229, 0.224, 0.225])) # normalize img = img.transpose((2, 0, 1)) # channel first

img = img.expand_dims(axis=0) # batchify

resnet50 = vision.resnet50_v1(pretrained=True, ctx=ctx)

resnet50.hybridize(backend="EIA", static_alloc=True, static_shape=True) # hybridize with EIA as backend

prob = resnet50(img).softmax() # predict and normalize output idx = prob.topk(k=5)[0] # get top 5 resultfor i in idx:

for i in idx:

i = int(i.asscalar())

print('With prob = %.5f, it contains %s' % (prob[0,i].asscalar(), labels[i]))

Troubleshooting

• When you call sym.optimize_for('EIA'), if you get the following error message:

[22:00:31] src/c_api/c_api_symbolic.cc:1498: Error optimizing for backend 'EIA' cannot be found

You might have forgotten to import the eimx package.

• When you run inference, if you do not see the folowing EIA initialization message:

Using Amazon Elastic Inference Client Library Version: 1.8.0 Number of Elastic Inference Accelerators Available: 1

Elastic Inference Accelerator ID: eia-22cb7576547447dbb5718cbfe4e3f0ce

Elastic Inference Accelerator Type: eia2.xlarge Elastic Inference Accelerator Ordinal: 0

You might have forgotten to call sym.optimize_for('EIA') or

block.hybridize(backend='EIA') to prepare your model for running on EIA. If it’s not called, the inference just runs on CPU instead of Elastic Inference accelerators.

• If you upgrade from an earlier version and you get the following error:

Traceback (most recent call last):

File "<stdin>", line 1, in module

AttributeError: module 'mxnet' has no attribute 'eia'

You might still have the legacy mx.eia() in your code. Replace instances of mx.eia() with mx.cpu() if you are using version 1.7.0 or later.

• Elastic Inference is only for production inference use cases and does not support any model training.

When you use either the Symbol API or the Module API, do not call the backward() method or call bind() with for_training=True. Because the default value of for_training is True, make sure you set for_training=False manually in cases such as the example in Use Elastic Inference with the MXNet Module API (p. 35).

• For Gluon, do not call training-speciﬁc functions or you will receive the following error:

Using Amazon Elastic Inference Client Library Version: 1.8.0 Number of Elastic Inference Accelerators Available: # Elastic Inference Accelerator ID: eia-####################

Elastic Inference Accelerator Type: eia#.#####

Elastic Inference Accelerator Ordinal:#

Error! Operator does not support backward Traceback (most recent call last):

File "gluon_train.py", line 130, in module train(opt.epochs, ctx)

File "gluon_train.py", line 110, in train metric.update([label], [output])

File "/home/ubuntu/anaconda3/envs/amazonei_mxnet_p36/lib/python3.6/site-packages/mxnet/

metric.py", line 493, in update

pred_label = pred_label.asnumpy().astype('int32')

File "/home/ubuntu/anaconda3/envs/amazonei_mxnet_p36/lib/python3.6/site-packages/mxnet/

ndarray/ndarray.py", line 2566, in asnumpy ctypes.c_size_t(data.size)))

File "/home/ubuntu/anaconda3/envs/amazonei_mxnet_p36/lib/python3.6/site-packages/mxnet/

base.py", line 246, in check_call raise get_last_ffi_error()

mxnet.base.MXNetError: Traceback (most recent call last):

File "src/c_api/c_api.cc", line 318

MXNetError: Check failed: callFStatefulComp(stateful_forward_flag, state_op_inst, in_shapes.data(), in_dims.data(), in_data.data(), in_types.data(), in_verIDs.data(), in_dev_type.data(), in_dev_id.data(), in_data.size(), out_shapes.data(),

out_dims.data(), out_data.data(), out_types.data(), out_verIDs.data(),

out_dev_type.data(), out_dev_id.data(), out_data.size(), cpu_malloc, &cpu_alloc, gpu_malloc, &gpu_alloc, cuda_stream, sparse_malloc, &sparse_alloc, in_stypes.data(), out_stypes.data(), in_indices.data(), out_indices.data(), in_indptr.data(),

out_indptr.data(), in_indices_shapes.data(), out_indices_shapes.data(),

in_indptr_shapes.data(), out_indptr_shapes.data(), rng_cpu_states, rng_gpu_states):

Error calling FStatefulCompute for custom operator '_eia_subgraph_op'

• Because training is not allowed, there is no point of initializing an optimizer for inference.

• A model trained on an earlier version of MXNet will work on a later version of MXNet Elastic Inference because it is backwards compatible (e.g. train model on MXNet 1.3 and run on MXNet Elastic Inference 1.4). However, you may run into undeﬁned behavior if you train on a later version of MXNet (e.g. train model on MXNet Master and run on MXNet EI 1.4)

• Diﬀerent sizes of Elastic Inference accelerators have diﬀerent amounts of GPU memory. If your model requires more GPU memory than is available in your accelerator, you get a message that looks like the log below. If you run into this message, you should use a larger accelerator size with more memory.

Stop and restart your instance with a larger accelerator.

mxnet.base.MXNetError: [06:16:17] src/operator/subgraph/eia/eia_subgraph_op.cc:206: Last Error:

EI Error Code: [51, 8, 31]

EI Error Description: Accelerator out of memory. Consider using a larger accelerator.

EI Request ID: MX-A19B0DE6-7999-4580-8C49-8EA 7ADSFFCB -- EI Accelerator ID: eia-cb0aasdfdfsdf2a acab7

EI Client Version: 1.2.12

• For Gluon, make sure you hybridize the model and pass the static_alloc=True and static_shape=True options. Otherwise, each inference loads the model once which causes potential performance degradation and OOM errors. See above to know more about the OOM errors.

• Calling reshape explicitly by using either the Module or the Symbol API, or implicitly using diﬀerent shapes for input NDArrays in diﬀerent forward passes can lead to OOM errors. Before being reshaped, the model is not cleaned up on the accelerator until the session is destroyed. In Gluon, inferring with inputs of diﬀering shapes will result in the model re-allocating memory. For Elastic Inference, this means that the model will be re-loaded on the accelerator leading to performance degradation and potential OOM errors. You can either pad your data so all shapes are the same or bind the model with diﬀerent shapes to use multiple executors. The latter option may result in out-of-memory errors because the model is duplicated on the accelerator.

[Fri Feb 19 01:47:49 2021, 397658us] [Execution Engine][MXNet][3] Failed - Last Error:

EI Error Code: [51, 8, 31]

EI Error Description: Accelerator out of memory. Consider using a larger accelerator.

EI Request ID: MX-78E568D8-9105-468A-8E1C-7D1FFDF9934E -- EI Accelerator ID:

eia-09803cc86d4044e6b4e8d4a8ecd0267e EI Client Version: 1.8.0

src/eia_lib.cc:88 Error: Last Error:

EI Error Code: [51, 8, 31]

EI Error Description: Accelerator out of memory. Consider using a larger accelerator.

EI Request ID: MX-78E568D8-9105-468A-8E1C-7D1FFDF9934E -- EI Accelerator ID:

eia-09803cc86d4044e6b4e8d4a8ecd0267e EI Client Version: 1.8.0

• If you get an error importing the eimx package similar to the following:

Traceback (most recent call last):

File "<stdin>", line 1, in module

File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/eimx/__init__.py", line 20, in module

mxnet.library.load(path_lib, debug)

File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/mxnet/library.py", line 56, in load

check_call(_LIB.MXLoadLib(chararr, mx_uint(verbose_val)))

File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/mxnet/base.py", line 246, in check_call

raise get_last_ffi_error()

mxnet.base.MXNetError: Traceback (most recent call last):

File "src/c_api/c_api.cc", line 1521

MXNetError: Library version (7) does not match MXNet version (10)

You might be using the wrong version of MXNet. MXNet release 1.7.0 uses version 7 and MXNet release 1.8.0 uses version 10. The eimx-1.0 package must be used with MXNet release 1.7.0 only.

• If you get an error importing the eimx package similar to either of the following:

Traceback (most recent call last):

File "<stdin>", line 1, in module

File "/home/ubuntu/.local/lib/python3.6/site-packages/eimx/__init__.py", line 20, in module

mxnet.library.load(path_lib, debug)

AttributeError: module 'mxnet' has no attribute 'library'

Traceback (most recent call last):

File "<stdin>", line 1, in module

File "/home/ubuntu/.local/lib/python3.6/site-packages/eimx/__init__.py", line 20, in module

mxnet.library.load(path_lib, debug)

TypeError: load() takes 1 positional argument but 2 were given

You might be using an older version of MXNet. Please check that you’re using an installation of MXNet release 1.7.0 for the eimx-1.0 package. After installing the correct version of MXNet you should see the following message after importing the eimx package successfully:

src/eia_lib.cc:264 MXNet version 10700 supported

• If you get an error similar the following:

[22:26:23] src/executor/graph_executor.cc:1981: Subgraph backend MKLDNN is activated.

python: /root/deps/aws-sdk-cpp/aws-cpp-sdk-core/source/utils/UUID.cpp:83: static Aws::Utils::UUID Aws::Utils::UUID::RandomUUID(): Assertion `secureRandom' failed.

Aborted (core dumped)

You tried to save the model after running sym.optimize_for('EIA') and reload that model later. Currently models optimized for EIA cannot be saved and reloaded. You must call sym.optimize_for('EIA') every time after reloading your model from disk at the beginning of your script. The time it takes to partition your model and optimize it for EIA is relatively small, so there is no beneﬁt from trying to save/reload anyway.

在文檔中 Amazon Elastic Inference (頁 42-45)