R DNNOff:OfﬂoadingDNN-BasedIntelligentIoTApplicationsinMobileEdgeComputing

(1)

IEEE Proof

DNNOff: Offloading DNN-Based Intelligent IoT Applications in Mobile Edge Computing

1

2

Xing Chen , Member, IEEE, Ming Li, Hao Zhong , Member, IEEE, Yun Ma , Member, IEEE, and Ching-Hsien Hsu , Senior Member, IEEE

3 4

Abstract—A deep neural network (DNN) has become in-

5

creasingly popular in industrial Internet of Things scenar-

6

ios. Due to high demands on computational capability,

7

it is hard for DNN-based applications to directly run on

8

intelligent end devices with limited resources. Computa-

9

tion offloading technology offers a feasible solution by of-

10

floading some computation-intensive tasks to the cloud or

11

edges. Supporting such capability is not easy due to two

12

aspects: Adaptability: offloading should dynamically occur

13

among computation nodes. Effectiveness: it needs to be

14

determined which parts are worth offloading. This article

15

proposes a novel approach, called DNNOff. For a given

16

DNN-based application, DNNOff first rewrites the source

17

code to implement a special program structure supporting

18

on-demand offloading and, at runtime, automatically deter-

19

mines the offloading scheme. We evaluated DNNOff on a

20

real-world intelligent application, with three DNN models.

21

Our results show that, compared with other approaches,

22

DNNOff saves response time by 12.4–66.6% on average.

Q1 Q2

23

Index Terms—Computation offloading, deep neural net-

24

works (DNNs), intelligent Internet of Things (IoT) applica-

25

tion, mobile edge computing (MEC), software adaption.

26

I. INTRODUCTION 27

R

ECENT years have witnessed the remarkable improve-

28

ments of a deep neural network (DNN) . As the core

Q3 29

Manuscript received February 22, 2021; revised March 17, 2021 and March 29, 2021; accepted April 18, 2021. This work was supported in part by the National Natural Science Foundation of China under Grant 62072108 and in part by the Natural Science Foundation of Fujian Province for Distinguished Young Scholars under Grant 2020J06014.

Paper no. TII-21-0800. (Corresponding author: Hao Zhong.)

Xing Chen and Ming Li are with the College of Mathematics and Computer Science and the Fujian Provincial Key Laboratory of Network Computing and Intelligent Information Processing, Fuzhou University, Fuzhou 350118, China (e-mail: [email protected];

[email protected]).

Hao Zhong is with the Department of Computer Science and En- gineering, Shanghai Jiao Tong University, Shanghai 200240, China (e-mail: [email protected]).

Yun Ma is with the Institute for Artificial Intelligence, Peking University, Beijing 100871, China, and also with the School of Software, Tsinghua University, Beijing 100084, China (e-mail: [email protected]).

Ching-Hsien Hsu is with the Department of Computer Science and Information Engineering, Asia University, Taichung 41354, Taiwan, and also with the Department of Computer Science and Information En- gineering, National Chung Cheng University, Chiayi 621301, Taiwan (e-mail: [email protected]).

Color versions of one or more figures in this article are available at https://doi.org/10.1109/TII.2021.3075464.

Digital Object Identifier 10.1109/TII.2021.3075464

machine learning technique [1], the DNN has been applied in 30

industrial Internet of things (IoT) scenarios such as computer 31

vision [2] and self-driving cars [3]. Meanwhile, more and more 32

trained deep learning models have been deployed on intelligent 33

end devices, such as wearable devices [4], vehicles [5], and 34

unmanned aerial vehicles [6]. In this article, we call such trained 35

models as DNN-based intelligent IoT applications. 36

Due to limited resources about computation and storage, ³⁷ complex DNN-based applications cannot be directly run on in- 38

telligent end devices. One feasible solution is to offload all or part 39

of computational tasks to the cloud with sufficient resources [7], 40

[8]. More specifically, DNNs are divided by the granularity of 41

neural network layers [9]. Thus, some computation-intensive 42

neural network layers can be offloaded to the cloud for execution, 43

while other simpler neural network layers are processed locally. 44

However, the network communication between end devices 45

and the cloud is likely to cause significant execution delay, and it 46

seriously affects the user experience. To address this delay prob- 47

lem, mobile edge computing (MEC) has been introduced [10]. 48

The mobile edges provide computing capabilities in close prox- 49

imity to end devices and enable the execution of highly de- 50

manding applications in end devices while offering significantly 51

lower latencies. Although MEC provides new opportunities to ⁵² offload DNN-based applications among end devices, the cloud, 53

and nearby edges, the prior approaches do not consider how 54

to offload them in the new environment. On the one hand, as 55

the environment is constantly changing, the offloading scheme 56

of the DNN model shall be flexible for the need of adaptation. 57

On the other hand, an offloading scheme shall make tradeoffs 58

between the reduced execution time and the network delay, when 59

it determines which layers will be offloaded and where to offload 60

them, based on the changes of environment. 61

To fully release the potential of offloading, an offloading 62

mechanism shall support on-demand changes for DNN-based 63

applications and shall enable the execution of some parts of 64

the DNN model on different computing nodes (including end 65

devices, cloud, and edge servers). Afterward, there needs to be 66

an efficient estimation model, which can determine which of 67

its layers shall be offloaded. In summary, our main research 68

questions are: 1) How to design a mechanism to support the 69

automatic offloading of DNN-based applications in the MEC 70

environment? 2) How to build an estimation model to determine 71

the optimal offloading schemes? After the above questions are 72

carefully handled, the problem of offloading can be reduced to 73

See https://www.ieee.org/publications/rights/index.html for more information.

(2)

IEEE Proof

To address the aforementioned questions, we present a novel

75

approach called DNNOff, which supports offloading DNN-

76

based applications in the MEC environment. This article makes

77

the following major contributions:

78

1) an offloading mechanism that enables DNN-based appli-

79

cations to be offloaded automatically and dynamically

80

in the MEC environment. To achieve this, DNNOff

81

translates a DNN-based application to a target program

82

that is easier to offload;

83

2) an effective model to predict the latency of offloading

84

schemes. DNNOff first extracts the structure and pa-

85

rameters of the DNN model and then uses a random

86

forest regression model to predict the execution cost of

87

each layer. Based on the prediction model, DNNOff can

88

determine which parts shall be moved to MEC servers;

89

and

90

3) an evaluation on a real-world DNN-based application

91

with AlexNet, VGG, and ResNet models. Our results

92

show that DNNOff reduces the response time by 12.4–

93

66.6% for complex DNN-based applications.

94

The rest of this article is organized as follows. Section II

95

reviews the related work. Section III presents our approach, and

96

Section IV evaluates it on a real-world application. Section V

97

discusses some issues about applicability. Section VI intro-

98

duces industrial applications. Finally, Section VII concludes this

99

article.

100

II. RELATEDWORK 101

Mobile devices are generally limited to storage space, battery

102

life, and computing power [12]. To improve the performance

103

of mobile applications, computation offloading has become the

104

most widely used technology. MCC improves the performance

105

of applications by sending computing-intensive components

106

from end devices to the cloud . These applications are partitioned

Q4 107

at different granularities, such as method, thread, and class. For

108

example, MAUI [13] supports offloading at the granularity of

109

methods. It allows annotating which parts of a program can

110

be offloaded to the cloud and makes offloading decisions at

111

runtime. CloneCloud [14] is a thread-based computation of-

112

floading framework, and it modifies virtual machines to support

113

seamless offloading of threads to the cloud. DPartner [15] can

114

offload classes, and it uses a proxy mechanism to access class

115

instances. Furthermore, it calculates the coupling of classes

116

and divides them into two sets. The two sets are deployed on

117

the end device and the cloud server, respectively. However,

118

MCC has inherent limitations, namely, long latency between end

119

devices and remote clouds. Hence, MEC has been proposed, in

120

which the service of cloud is increasingly moving toward nearby

121

edges [16]. AndroidOff [17] supports mobile applications with

122

the offloading capability at the granularity of objects for MEC.

123

It provides the mechanism to offload an object-oriented applica-

124

tion and determine which parts shall be offloaded. However, the

125

proposed works above cannot apply to DNN-based applications.

126

Computation offloading for DNN-based applications is fur-

127

ther advanced in recent years. Neurosurgeon [9] showed that

128

large amounts of data produced by DNN models should be up-

129

loaded to the cloud via wireless network, leading to high latency

130

Fig. 1. Overview of DNNOff.

and energy consumption. For the sake of better performance and 131

energy efficiency of modern DNN-based applications, Neuro- 132

surgeon designed a light weight scheduler to partition DNN- 133

based applications automatically between end devices and the 134

cloud at the granularity of neural network layers. Edgent [18] is 135

a framework that automatically and intelligently selects the best 136

partition point of a DNN model to satisfy the requirement on the 137

execution latency. Compared with Neurosurgeon, Edgent can of- 138

fload computation-intensive DNN layers to the remote server at 139

a low transmission overhead, namely, nearest computation node. 140

Liu et al. [19] proposed an image recognition framework based 141

on the DNN in the MEC environment and realized the food im- 142

age recognition system by employing an edge-computing-based 143

service infrastructure. It allows the system to overcome some 144

inherent limitations of the traditional MCC paradigm, such as 145

high latency and energy consumption. Zhou et al. [20] proposed 146

a robust mobile crowd sensing framework in the MEC environ- 147

ment. It can reduce the service delay with edge-computing-based 148

local processing. The above approaches assume that end devices 149

use a single remote server for computation offloading and can- ¹⁵⁰ not make efficient use of dispersed and changing computing 151

resources in the MEC environment. 152

III. APPROACH 153

Fig. 1presents the overview of DNNOff. For the nodes, we 154

use rectangles to denote its components and circles to denote 155

its internal data. For the edges, red ones denote data flows, 156

and blue ones denote requests. DNNOff has three main compo- ¹⁵⁷ nents, namely, extraction, offloading mechanism, and estimation 158

model. First, the extraction component extracts the structure 159

and parameters of a DNN model (see Section III-A). Second, 160

the offloading mechanism translates a DNN model to a target 161

program that enables offloading (see Section III-B) and deploys 162

it on end devices and remote servers where offloading may occur. 163

Finally, the estimation model component deployed on the end 164

device synthesizes an optimized offloading scheme to execute 165

different parts of the target program on proper locations, based 166

on the DNN network structure information and the surrounding 167

MEC environment (see Section III-C). Moreover, the estimation 168

model will update the offloading decision when the surrounding 169

(3)

IEEE Proof

Fig. 2. Example of the DNN model.

MEC environment changes. InFig. 1, a DNN-based application

170

and its MEC environment are presented on the right.

171

A. Extracting Structure for the DNN Model

172

Fig. 2shows an example of the DNN model. A DNN model

173

consists of layers. InFig. 2, layers are represented as squares

174

in different colors. In particular, the yellow one represents a

175

convolution layer, which translates an image to a feature map

176

with learned filters. The blue one represents an activation layer,

177

which is a nonlinear function. The function accepts a feature map

178

and generates an output with the same dimension. The purple

179

one represents a pooling layer, which can be defined as a general

180

pooling, an average pooling, or a max pooling. The green one

181

represents a fully connected layer, which calculates the weighted

182

sum of the inputs by learned weights. The top of square is the

183

name of layer, such as “conv1” and “relu1,” and the bottom

184

of square is the parameters of layer. For example, “channel:3”

185

denotes that the corresponding value of the parameter “channel”

186

is “3.” The black arrow represents the data flow. DNNOff first

187

extracts the structure of a DNN model, and the structure includes

188

the parameters of each layer and the data flow between layers.

189

Its definitions are as follows.

190

Definition 1 (DNN model structure): A DNN model structure

191

is a directed graph G_D= (L, R) representing data transmissions

192

between layers of a DNN D, where L= {l1, l2, . . . , l_n} is the

193

set of layers of D and R is the set of data flow edges. Each edge

194

r_ij∈ R represents a data flow from lito l_j.

195

Definition 2 (DNN layer information): A layer consists of

196

type and parameters as l_i=< type, feature >, where type is

197

the type of the layer and f eature denotes the set of features of

198

the layer.

199

In general, the DNN-based application stores its trained model

200

in the configuration file, such as prototxt of Caffe2.¹ Our ap-

201

proach takes this file as the input and gets the DNN model graph

202

G_D= (L, R) through code analysis.

203

B. Offloading Mechanism for the DNN Model

204

First, we translate an original application to a target program,

205

and the translated target program follows the pipe-and-filter

206

style. In this style, DNN layers are modeled as filters that receive

207

and send data, and data flows between two layers are modeled as

208

pipes that transmit the intermediate results. Second, we propose

209

a “Pipe” engine to determine which neural network layer shall

210

be offloaded.

211

1[Online]. Available: https://caffe2.ai/

Fig. 3. Translation of a DNN program.

1) Target Program: We abstract a DNN program using the 212

pipe-and-filter architecture style, based on which we propose a 213

design pattern to support adaptive offloading in MEC. 214

A DNN program is essentially a data flow software architec- 215

ture [21]. Each layer can be regarded as a filter, and the data 216

transmission between layers can be regarded as a pipe. In a 217

typical DNN program, each filter performs the calculation of a 218

layer, whereas the pipe uses the result of the preceding layer as ²¹⁹ the input data of the succeeding layer. 220

In order to support adaptive offloading of DNN applications, 221

the pipe should decide whether to transfer the data to the local 222

filter or to the remote filter for the successive computation 223

tasks. The filter should decide whether to perform the current 224

computation task (calculation of the current layer) or directly 225

return the results. 226

The left-hand side of Fig. 3 shows the source program of 227

a DNN-based application. It starts from the first layer and 228

receives the initial data (i.e., an image). The result of each 229

layer, namely the intermediate result, is hidden, and the out- 230

put of the last layer is the return value. The statement, l_i= ²³¹ l_i.type(li−1, l_i.f eature), indicates that the li layer takes the 232

result of l_i−1as its input. The right-hand side ofFig. 3presents 233

the translated target program. It uses “Pipe” functions to connect ²³⁴ each layer, such that the DNN model can be offloaded at the 235

granularity of layers. 236

2) Code Translation: Our translation has three steps. 237

Step 1 (Adding the parameters such as “InitL” and “EndL” 238

after “InitData”): For a given program, DNNOff automatically 239

adds “InitL” and “EndL” into the list of parameters. The two 240

parameters represent the labels of the initial and the last layers, 241

respectively. In addition, DNNOff adds “CurrentL,” which de- 242

notes the label of the layer to be executed. Meanwhile, “InitData” 243

is assigned to the result of lInitL−1, and used as an input to lInitL. 244

Here, when the “DP” program runs, the layers between lInitLand 245

lEndLshall be executed. 246

(4)

IEEE Proof

Algorithm 1: Pipe.

Input: m—the label of the “Pipe” function

Output: CurrentL—the label of the layer to be executed Declare:

conf ig—the offloading scheme that records execution locations of each layer;

EndL—the label of the last layer that is executed at Local;

l_i—the result of the ith layer 1: ifm < CurrentL then 2: returnCurrentL 3: end if

4: ifm== CurrentL and config[m] == Local then 5: returnCurrentL

6: end if

7: ifm== CurrentL and config[m] != Local then 8: k← calculate the label of the next layer that 9: shall be executed at Local

10: ifk== Null then 11: k← EndL + 1 12: end if

13: l_k−1← remote(lCurrentL−1, CurrentL, k− 1) 14: CurrentL← k

15: returnCurrentL 16: end if

Step 2 (Adding a “Pipe(i)” function before each layerl_i):

247

This function determines whether the layer l_ishall be offloaded

248

(see Section III-B3 for details).

249

Step 3 (Adding twoif statements to check each layer l_i): The

250

first statement is “if CurrentL== i,” where lirepresents the

251

ith layer. It checks whether the layer l_iis to be executed currently.

252

The second statement is “if CurrentL− 1 == EndL.” If the

253

layer l_EndL has been executed, its result shall be returned, and

254

the layers after l_EndLare skipped.

255

3) Computation Offloading at Runtime: At runtime, the

256

“Pipe” functions connect each layer that can be executed locally

257

or remotely, according to the offloading scheme. Algorithm 1

258

shows how the “Pipe” function works. Pipe(m) denotes the pipe

259

between the layers l_m−1and l_m, conf ig is the offloading scheme

260

that records execution locations of each layer, and CurrentL

261

denotes the label of the layer to be executed.

262

When m < CurrentL, it indicates that the layer l_mhas been

263

executed and does not need to be repeated (lines 1–3). Therefore,

264

the layer l_mwill be skipped.

265

When m== CurrentL and config[m] == Local (Local

266

is a keyword, representing the local node), it means that the

267

layer l_mis to be locally executed (lines 4–6). Therefore, l_mis

268

executed and the value of CurrentL is added by 1.

269

When m== CurrentL and config[m] != Local, it means

270

that the layer l_m is to be remotely executed (lines 7–15).

271

Then, we calculate the label of the next layer that shall be

272

executed at local (line 8), and if k does not exist, we assign

273

“EndL+1” to it (lines 9–11). Finally, we run the program

274

DP(lCurrentL1, CurrentL,(k − 1) on the remote node ac-

275

cording to conf ig[m] and assign its result to lk−1(line 12).

276

Fig. 4. Proposed design pattern of DNN programs.

Fig. 4shows the example of adaptive offloading of the five- 277

layer DNN, which is executed on three computation nodes. 278

Layers l1and l5are to be executed on end device, layers l2and l3 279

are to be executed on Node A, and layer l4is to be executed on 280

Node B. First, the DNN program DP(InputData, 1, 5) runs on ²⁸¹ end device, while CurrentL is 1 and EndL is 5, and l0is set to 282

InputData; P ipe(1) and l1 are executed, as conf ig[1] is end ²⁸³ device; P ipe(2) is executed and the remote service is invoked, ²⁸⁴ as conf ig[2] is Node A. Second, the DNN program DP (li, 2, 4) ²⁸⁵ runs on the Node A; P ipe(1) and l1are skipped, as CurrentL 286

is 2; P ipe(2), l2, P ipe(3), and l3 are executed in sequence, as 287

conf ig[2] and config[3] are both Node A; P ipe(3) is executed ²⁸⁸ and the remote service is invoked, as conf ig[4] is Node B. Third, ²⁸⁹ the DNN program DP(l3, 4, 4) runs on the Node B; P ipe(1), ²⁹⁰ l₁, P ipe(2), l2, P ipe(3), and l3 are all skipped, as CurrentL 291

is 4; P ipe(4) and l4are executed as conf ig[4] is Node B; then, ²⁹² CurrentL is 5 and thus return the calculation result to the DNN 293

program on end device. Finally, l5is executed on end device and 294

the output is produced. 295

C. Estimation Model for the Offloading Scheme 296

1) Predicting Cost With Random Forest Regression:The 297

execution time of each layer is an essential factor in the esti- 298

mation model. If the layer l_i is executed on the node n_k, we 299

define its execution cost as follows. 300

Definition 3 (Execution cost): Cost^l_nⁱ_k=< time, ³⁰¹ datasize >: time denotes the execution time from setting 302

input data to generating output data, which depends on the ³⁰³ performance of the execution node, while datasize denotes the 304

amount of data transmission, which is a fixed value obtained by 305

the extraction component. 306

With the number of layers and the diversity of computing 307

nodes, it is difficult to get execution time of each layer on each 308

computing node. Thus, we used the random forest regression to 309

build prediction models for different layer types and computing 310

nodes, which is to predict Cost^l_nⁱ_k.time. The RF regression 311

model is proposed by Brieman [22] and is proved to carry out 312

the nonlinear relation between the variables. It is a nonlinear 313

model-building tool, which is widely used in classification [23] 314

and prediction [24]. 315

(5)

IEEE Proof

TABLE I

FACTORSTHATCANINFLUENCE THEOFFLOADINGDECISION

Definition of the prediction model:

316

Y = predict(X) (1)

Xconv= (channel, ksize, knumber, stride, padding) X_pooling= (channel, ksize, stride)

X_relu= (innumber, out_number)

X_{f c}= (innumber, out_number). (2) We use the dataset of history data to train the prediction

317

model, which is collected from DNN applications, including

318

Alexnet [25], VGG16, VGG19 [26], ResNet-50, and ResNet-

319

152 [27]. The RF regression prediction model is represented

320

as Equation (1). The input(X) depends on the type of layers

321

as Equation (2), and the layer types include convolution layer,

322

pooling layer, activation layer, and fully connected layer.

323

2) Contributory Factor: In this subsection, we introduce a

324

context model that describes the environment (e.g., computation

325

nodes) and the factors that affect the offloading decision.

326

The context architecture consists of an end device (ED), sev-

327

eral nearby edges (NE), and a remote cloud (RC). We use a graph

328

to present this network G_C = (N, E), where N denotes a set of

329

compute nodes, including end device and remote servers, and E

330

represents a set of communication links among nodes n_i∈ N.

331

Each edge(ni, n_j) ∈ E is associated with a data transmission

332

rate v_n_i_n_j and a round-trip time rtt_n_i_n_j between n_i and n_j. A

333

typical offloading scenario is as follows: The data are generated

334

on the end device (the only nED), and the layers can be offloaded

335

to nearby edges (some nodes of nNE) or the remote cloud (nRC).

336

Table Ishows our factors for estimating an offloading scheme.

337

Among them, n_k∈ N, vninjand rtt_n_i_n_j are defined before. We

338

next introduce DEP = (dep(l1), dep(l2), . . . , dep(ln)), where

339

DEP denotes the offloading scheme. Each l_i∈ L is executed

340

on a computation node dep(li) ∈ N. Let Te(li) denote the

341

execution time of l_i and let T_d(lk, l_m) denote the data trans-

342

mission time between layer l_kand layer l_m. The response time

343

of application can be represented by T_response, which is equal to

344

the moment after the execution of the last layer (t_n). In addition,

345

an objective function is constructed to calculate T_response and

346

estimate the offloading scheme.

347

3) Objective Function: Our objective function makes predic-

348

tions, based on contributory factors. In particular, based on the

349

factors in Section III-C2, we construct the objective function

350

as shown in Equation (3). Here, we consider that a scheme is

351

optimal, if its objective value is the smallest. AsTable Ishows

352

Algorithm 2: Calculation of Response Time.

Input: P^lⁱ—the set of parent nodes of the layer l_i Output: t_n—the response time of an offloading scheme Declare:

l_i—the ith layer

t_i—the moment after the execution of the layer l_i; t_max—the maximum sum of the time at the moment after the execution of each parent layer with the addition of the 8: transmission time between two layers;

9: T_d(lk, l_m)—the data transmission time between the layer

l_kand the layer l_m;

T_e(li)—the execution time of the layer li

1: function currentTimeP^lⁱ, l_i 2: for each p^l_jⁱ ∈ P^lⁱdo 3: ift_pli

j not calculated then 4: t_pli

j ← currentT ime(P^p^li^j , p^l_jⁱ) 5: end if

6: t_max← max{tmax, t_p_li

j + Td(p^l_jⁱ, l_i)}

7: end for

8: t_i← tmax+ Te(li) 9: returnt_i

10: end Function 11: t₀← 0

12: t_n← currentT ime(P^lⁿ, l_n) 13: returnt_n

that t_i is the moment after the execution of layer l_i, the total 353

response time is obtained when the last layer l_nis executed 354

Tresponse= tn (3)

t_i= max t_pli

j + Td(p^l_jⁱ, l_i)

+ Te(li), ∀p^l_jⁱ∈ P^lⁱ (4) T_e(li) = Cost^l_dep(lⁱ _i₎.time (5)

T_d(p^l_jⁱ, l_i) = Cost^pⁱ^j.datasize v_dep(pi

j)dep(li) + rtt_dep(pⁱ_j_)dep(l_i₎. (6) The description of Equation (4) is expounded as follows: 355

First, the moment before the execution of current layer l_i is 356

calculated as the moment after the execution of previous layer 357

(t_pli

j ) with the addition of the transmission time between two 358

layers (T_d(p^l_jⁱ, l_i)). Second, according to the characteristic of the ³⁵⁹ DNN, the current layer can only be executed when all branches 360

from previous layers have already been executed. Hence, t_i 361

includes the execution time of layer l_i, and the maximum sum 362

of the time at the moment after the execution of each parent 363

layer with the addition of the transmission time between two 364

layers. Among them, the execution time of layer l_iis represented 365

as Equation (5) (Cost^l_dep(lⁱ

i).time is mentioned in Definition 366

3) and the transmission time with previous layer is represented 367

as Equation (6). 368

(6)

IEEE Proof

TABLE II

DEVICECONTEXTS INDIFFERENTLOCATIONS

As a result, given an offloading scheme, the calculation of

369

response time is shown in Algorithm 2. According to line 11 of

370

Algorithm 2, we first initialize the value of t₀. Then, we use the

371

“currentTime” function to calculate t_nrecursively according to

372

line 12. The calculation principle of the “currentTime” function

373

corresponds to Equation (4).

374

IV. EVALUATION 375

We implemented DNNOff and conducted evaluations to ex-

376

plore the following research questions.

377

r

(RQ1) To what degree does DNNOff improve perfor-

378

mance of DNN-based applications (see Section IV-A)?

379

r

(RQ2) How does DNNOff perform in cost prediction of

380

each neural network layer (see Section IV-B)?

381

r

(RQ3) How much extra overhead does DNNOff introduce

382

(see Section IV-C)?

383

For RQ1, our results show that DNNOff saved 12.4–66.6% re-

384

sponse time compared with other approaches. For RQ2, DNNOff

385

achieved high accuracy for predicting execution time in different

386

layer types and computing nodes. For RQ3, the overhead of our

387

offloading mechanism is acceptable.

388

A. RQ1 Improvement Over the State of the Art

389

1) Experimental Settings:

390

a) Network environment: The network context con-

391

sists of four computation nodes: one end devices and three

392

remote servers. We simulate four locations, which are named

393

community, traffic road, parking lot, and store. Table II lists

394

the connections among our computation nodes. The column

395

and the row of a cell denote the round-trip time and the data

396

transmission rate between computation nodes. We utilize the

397

network simulation tool Dummynet² to control the available

398

bandwidth. A smaller rtt and a higher v denotes a better signal

399

strength.

400

b) Devices: We take three desktop computers to emulate

401

the Elastic Compute Service (ECS) and edge servers E1 and E2.

402

The ECS is equipped with a 3.6-GHz 16-core CPU and 16-GB

403

RAM, server E1 is equipped with a 2.5-GHz eight-core CPU

404

and 8-GB RAM, and server E2 is equipped with a 3.0-GHz

405

eight-core CPU and 8-GB RAM. We further use a smartphone

406

to act as the end device, and the end device is equipped with a

407

2.2-GHz CPU and 4-GB RAM.

408

c) Application: We use a real-world DNN-based image

409

recognition application in the evaluation. It is written in Python

410

and powered by the Caffe2 deep learning framework.

411

2[Online]. Available: http://info.iet.unipi.it/luigi/dummynet/

We mainly concern with three models, which are the core 412

of the DNN-based application, including AlexNet, VGG16, ⁴¹³ and ResNet-50. The most complex model is ResNet-50, while 414

AlexNet is the simplest one. The inference latency and recogni- 415

tion accuracy are increasing as the model is more complex. 416

d) Compared approaches: In our evaluation, we com- 417

pared DNNOff with four other approaches. 418

The original application is executed on end device, without 419

any offloading. 420

Neurosurgeon [9] selected the best DNN partition point and 421

sent the remaining DNN layers from end device to the cloud. 422

Edgent [18] is similar to Neurosurgeon [9], but offloads 423

computation-intensive DNN layers to the remote server at a low 424

transmission overhead, namely, nearest computation node. 425

For the ideal plan, it has to get execution time of each layer on 426

each computing node and choose the fastest one after executing 427

all the schemes in reality. The ideal plan is infeasible in practice ⁴²⁸ since it needs to get the execution time at different levels in 429

advance and try all the possibilities. We introduce the ideal plan 430

to illustrate how close DNNOff is to the ideal one. 431

e) Measurement:To show the effectiveness of 432

DNNOff, we define the following metrics. 433

1) Total response time: We use the total response time as the 434

metric for performance. To make a fair comparison, we 435

pick ten different images from the video in each location 436

and calculate their averages for comparison. Here, the 437

start time is recorded when the image is input, and the end 438

time is recorded when the recognition result is output. It 439

includes local inference, data transmission, and remote 440

inference. The less response time indicates better results. 441

2) Local inference: This is the time about inference process 442

on the end device. The inference on remote servers is ⁴⁴³ usually more efficient than end device. 444

3) Remote inference: This is the time about inference process 445

on the remote servers. 446

4) Data transmission: This is the time to transmit the feature 447

vectors result by the partitioned layers of DNN model, and 448

it is often slow under poor network connection. 449

2) Results:The total response time consists of the inference 450

time and the transmission time.Fig. 5shows the time of com- 451

pared approaches in the four locations. For each approach, the 452

blue bar denotes the local inference time, the orange one denotes 453

the data transmission time, while the gray one denotes the remote 454

inference time. 455

Compared with the original application, DNNOff reduces the 456

total response time by 30.4–66.6%. The result also shows that the 457

more complex the model, the better the optimization of DNNOff. ⁴⁵⁸ In general, the optimization of community is better than that of 459

store, because community is closer to the better performing edge 460

(7)

IEEE Proof

Fig. 5. Process of a DNN-based image recognition application.

(a) Image recognition with the AlexNet model. (b) Image recognition with the VGG16 model. (c) Image recognition with the ResNet-50 model.

server, which can significantly reduce the reference time. In the

461

traffic road, the ResNet-50 is optimized to 66.6% with DNNOff,

462

since the data transfer volume between the layers in ResNet-50

463

is small and the location is connected to all remote servers, so

464

that the offloading can alleviate the bottleneck of local inference

465

time and, meanwhile, guarantee a lower data transmission time.

466

It should be noted that the parking lot is only connected to the

467

cloud server, so the performance improvement is not as obvious

468

as that in other locations, but it can still reduce the time by

469

30.4–47.1%. Hence, DNNOff is still effective even if there are

470

no edge servers.

471

Compared with Neurosurgeon, DNNOff reduces the total

472

response time by 26.5–53.2%. The results show that DNNOff

473

significantly outperforms Neurosurgeon in the traffic road with

474

the VGG model. Because traffic road has the best network

475

connection with remote servers, which provide more choices

476

to DNNOff for offloading . While in the parking lot, DNNOff

Q5 477

can keep the same performance as Neurosurgeon . Due to the

Q6 478

poor network connection, multiple partitions will increase the

479

TABLE III SAMPLEITEMS

data transmission time instead. In this case, DNNOff makes the 480

same offloading scheme as Neurosurgeon does. 481

Compared with Edgent, DNNOff reduces the total response 482

time by 12.4–39.3%. Although Edgent considers the use of 483

nearest computation node, DNNOff can cut the DNN at multiple 484

points and execute different parts over the end device, edges, and 485

the cloud. 486

Compared with the ideal plan, DNNOff can achieve compa- 487

rable performance in different cases, and the performance gap 488

between them is about 5%. ⁴⁸⁹

In summary, DNNOff saved 12.4–66.6% of the total response 490

time compared with other approaches. Meanwhile, the results 491

show that DNNOff achieves optimal/near-optimal performance 492

of offloading. 493

B. RQ2 Accuracy for Cost Prediction of DNN Layers 494

1) Experimental Settings: 495

a) Model training: We use the dataset of history data 496

to train the random forest regression prediction model, which 497

is collected from DNN-based applications running on different 498

computing nodes. In total, we collected the layer information 499

about convolution layers, pooling layers, activation layers, and 500

fully connected layers of 425 items, 320 items, 582 items, and 501

96 items, respectively.Table IIIshows some convolution layer ⁵⁰² items, which is collected on the end device, as an example. 503

Column “Channel” lists the number of channels of convolution 504

kernel. “k_size” and “k_number” list the size and the number of their 505

filters, respectively. Columns “Stride” and “Padding” list the 506

stride and the padding with which the filters are being applied. 507

The inputs (X) include the channel, ksize, k_number, stride, and 508

padding. The output (time) is denoted as the predicted value of 509

layer latency. Based on the dataset, we randomly split the data 510

items into two categories: 70% for training the prediction model 511

and 30% for testing the quality of our model. 512

b) Measurement: We regard root-mean-square error 513

(RMSE) and R-squared (R²) as the evaluation measures of the 514

prediction model 515

RMSE=

1 N

N t=1

(observedt− predicted_t)² (7)

R²= 1 −

(observedt− predicted_t)²

(observedt− meant)² . (8) RMSE is the sample standard deviation of the differences 516

between predicted and observed values. R² is commonly used 517

to evaluate the quality of regression models. They are calculated 518

according to Equations (7) and (8). 519

2) Results: Table IV shows the accuracy of the random 520

forest regression prediction model. It illustrates the RMSE and 521

(8)

IEEE Proof

TABLE IV

RMSEANDR-SQUARED OF THEPREDICTIONMODEL ON THETESTSET

Fig. 6. Optimization of random forest parameters using RMSE.

R²results for predicting in different layer types and computa-

522

tion nodes. For RMSE, the smaller RMSE indicates the better

523

model’s fitting degree [28], [29]. For R², an acceptable value of

524

R² is greater than 0.5 [30], and the closer to 1, the better the

525

model is.Table IVshows that the RMSE of the model is small

526

and R²is greater than 0.5, illustrating that the prediction model

527

is acceptable. And the high accuracy of prediction model lays

528

the foundation for scheme estimation.

529

In addition, there are two parameters in random forest: N tree,

530

the number of regression trees grown based on a bootstrap sam-

531

ple of the collected layers, and M try, the number of different

532

predictors tested at each node. The two parameters (N tree and

533

M try) are optimized based on the RMSE of calibration. Take the

534

training of convolution layers on the end device as an example.

535

N tree values from 500 to 4000 with intervals of length 50 were

536

tested, and M try was tested from 1 to 5. The results of random

537

forest parameters (N tree and M try) are shown inFig. 6, which

538

clearly indicates that random forest parameters affect the error

539

of prediction. The optimization was done using the calibration

540

dataset (n= 297) and RMSE. The result Ntree = 2000 and

541

M try= 3 yielded the lowest RMSE (0.289 ms). In this case,

542

we chose N tree= 2000 and Mtry = 3 as the best parameters.

543

C. RQ3 Extra Overhead

544

1) Experimental Settings:

545

a) Setting: We use a simple AlexNet [25] application

546

with 24 layers, which is a state-of-the-art DNN for image

547

TABLE V

FIVEOFFLOADINGSCHEMES FORALEXNET

Fig. 7. Overhead of DNNOff and manual-modified one.

classification, and simulate five typical offloading schemes, 548

which represent device-cloud, device-edge, and device-edge- 549

cloud offloading, as shown inTable V. 550

b) Compared approaches:We evaluate the overhead ⁵⁵¹ of DNNOff by comparing the performance of the adaptive of- 552

floaded application with the manual-modified offloaded applica- 553

tion for five typical offloading schemes. The adaptive offloaded 554

application is dynamically offloaded according to the offloading 555

scheme, which is supported by our framework. The manual 556

modified one is implemented by separating the code according 557

to the offload scheme case by case. 558

2) Results:We run the application in the five typical offload- 559

ing schemes and, respectively, record their average response 560

time, as shown in Fig. 7. We can see that the response time 561

of DNNOff is similar to the manual modified one, but with an 562

overhead of 120–150 ms. The slight increase of response time 563

(under 10%) is due to the condition statements of pipes that are 564

needed to go through for each layers execution in our framework. 565

For instance, the overheads in cases 1–3 are all about 120 ms ⁵⁶⁶ because the cutoff points of three offloading scheme are the 567

same. The overhead in cases 4 and 5 are both 150 ms because 568

there are two cutoff points in each offloading scheme, for which 569

more condition statements need to be executed. Overall, the 570

overhead is acceptable. 571

V. DISCUSSION 572

Some issues about applicability need to be further discussed. 573

A. Online Decision 574

For online decision, DNNOff uses the estimation model to 575

calculate the response time given an offloading scheme, based 576

on which the problem of online decision can be reduced to 577

a traditional optimization problem. Some algorithms can be 578

used to reduce overhead. For instance, it takes about minutes 579

to determine the offloading decision for the genetic algorithm, 580