Chapter 2. Background
2.1 Related Works
There have been many research efforts dedicated to offload computation intensive programs from a resource-poor mobile device [2, 3, 5-18]. Some of them focused on energy saving [3, 5, 6, 8-13] while others targeted at performance improvement. Only few of them considered both energy saving and performance improvement [19, 20]. For ease of reference, all these works are summarized in Table 1, which are classified into three categories: on energy saving, on time saving and on energy and time saving. For each work, we further characterize it by four attributes:
adaptability, portability, accuracy and offload target.
The adaptability indicates the capability of the proposed method to adapt itself efficiently to dynamic workload, resulting from the variance of data input at run-time.
If the proposed method can only handle deterministic workload, this shows its adaptability is poor. The portability represents the ability of the proposed method to be ported from one execution environment to another, such as Linux to Windows. The term Language/Framework/Kernel represents the way we port the method to another platform. For example, this work is Language level of portability and need to do modifications in programming language when moving to another platform. Methods with Language portability are desirable because most of the existing codes can be reused. The accuracy is the approximation error of energy model or execution time model of offloading. The higher accuracy indicates the fewer incorrect offloading decisions. For this attribute, the works that did not develop any execution model or energy model are labeled as “n/a”. The offload target is the targets, which can execute
5
offloaded programs. The more targets we have, the more flexible the execution environment will become. According to Table 1, our work is the only one that aims at saving both time and energy while maintaining high adaptability, high portability, high accuracy and multiple offload targets. In the following, we compare our work with each of related works in detail.
Table 1. Comparison of current offloading works
Paper Works [Reference #] Adaptability Portability Accuracy
Offload Target
On Energy Saving
Partition Scheme [3] No Framework Low Cloud
Study Energy Tradeoffs [5] No Framework n/a Cloud
Component Migration &
Can Offload Save Energy[6] Yes Language Medium Cloud
Face-Recognize with GPU [12] No Language n/a GPU
On Time Saving
Adaptive Offloading [2] No Framework Low Cloud
Effective Offload Service [17] No Framework n/a Cloud
Calling the Cloud [14] No Framework n/a Cloud
eyeDentify Cyber Foraging [15] No Framework n/a Cloud Heterogeneous Auto-Offload
On Energy and Time Saving Computation Offload Scheme [20] No Framework Medium Cloud Energy Efficiency of Mobile [19] Yes Language n/a Cloud
Our Work Yes Language High GPU, Cloud
6
Works On Energy Saving
Maximizing battery life time is one of the most crucial design objectives of smart phones because they are usually equipped with limited battery capacity. Z. Li [3], S.
Han [9], B. Seshasayee [11], and E. Cuervo et al. [8] adopted profiling-partitioning technology to identify offloaded parts of an application for energy saving. They first profiled the energy consumption of each function of the application. According to the profiling result, they then generated a cost graph, in which each node represents a function to be performed and each edge indicates the data to be transmitted. The maximum-flow/minimum-cut algorithm was then used to partition the cost graph to obtain client parts and server parts. Finally, the server parts were executed at remote servers for reducing energy consumption of mobile device. G. Chen et al. [5]
designed a similar method to determine whether Java methods and bytecode-to-native code compilation should be executed at remote servers for energy saving. In addition, they assumed that the workload was deterministic, which means that the workload will not vary at run-time. As a result, their methods cannot be applied to dynamic workload, resulting from the variance of data input at run-time. On the contrary, in order to reduce profiling overhead, we only profile the energy consumption and execution time of frequently-used modules, such as FFT, IFFT, convolution, matrix multiplication and so on. In addition, we take into account the impact of data size on execution time and energy consumption in order to handle dynamic workload at run-time.
X. Zhao [13], Y. J. Hong [10], and K. Kumar [6] built energy models to approximate the energy consumption of offloading. The energy models can be used to construct the above-mentioned cost graph or make offloading decisions. However, several key parameters, such as workload dynamics, bandwidth variability, and idle mode energy consumption, are not included in their models, which may lead to
7
inappropriate partitions or incorrect offloading decisions. According to our experiment results, our energy model ensures a higher accuracy than previous works by considering these key parameters. Y. C. Wang [12] demonstrated the possibility of utilizing GPU for offloading. They first identified bottlenecks of programs, and then used OpenGL|ES to rewrite and remove the bottlenecks. However, CPU and GPU are usually integrated on the same chip and cannot be switched off individually. Without considering the idle energy consumption of the chip, offloading programs to GPU may increase the total energy consumption. Our work, on the other hand, achieves a higher accuracy by modeling the idle energy consumption. We also provide the ability of offloading programs to GPU or Cloud.
Works On Time Saving
Responsiveness of mobile applications is important because the mobile applications are usually real-time and user-interactive. Many research efforts have been devoted to offload part of a program to remote servers in order to reduce execution time [2, 7, 14, 17]. Most of them adopted above-mentioned similar profiling-partitioning technology to identify the offloaded parts of an application. Gu et al. [2] designed an offloading engine that dynamically partitions an application when the required resources, such as memory and CPU, approach the maximum capacity of the mobile devices. Yang [17] developed an offloading service that dynamically partitions Java applications and transforms offloaded Java classes into a form that can be executed at remote servers. Giurgiu et al. [14] developed an exhaustive search algorithm, called ALL, to examine all possible partitions in order to find an optimal partition. They also proposed a heuristic algorithm to partition a program in reasonable time. All these methods perform well on small-size applications, but may induce significant overhead when partitioning large-size applications. On the contrary, we only profile the energy consumption and execution
8
time of frequently-used modules in order to reduce the overhead of profiling and partition. Unlike [2, 14, 17], R. Wolski dynamically predicted offloading cost at run-time according to the feedback of a resource monitor [7]. However, some important parameters, such as workload dynamics and bandwidth variability, are not included, which may lead to inappropriate predictions and incorrect offloading decisions. Our work, on the other hand, achieves a higher accuracy by modeling these important parameters. According to our experiment results, fewer incorrect offloading decisions are made.
Several works developed offloading mechanisms by integrating exiting software packages rather than started from scratch [15, 16, 18]. R. Kemp et al. [15] used Ibis middleware to offload computational intensive Java programs to remote servers. Y.
Zhang [18] adopted Firefox plug-in framework to transparently offload computations to remote servers. Since these works are closely coupled with specific software packages, it becomes difficult to extend their methods to other execution environments. Y. N. Lin [16] explored the possibility of offloading programs to network processors in order to reduce execution time. They first profiled the IPSec module to identify bottlenecks, and then rewrote IPSec-related kernel and driver code.
Although the performance improvement of network throughput can reach as much as 350%, the energy consumption of network processors may significantly increase. In addition, a modification of OS kernel and drivers is required, which reduces the portability of the proposed method. In this work, we realize our idea of offloading by developing a Linux program at user space in order to increase the portability. We do not rely on any specific software packages. In addition, we do not require any modifications of OS or drivers. Our method can be easily ported to other execution environments, such as Windows Embedded Compact 7.
9
Works on Energy and Time Saving
Both energy and time saving are crucial design objectives of smart phones.
However, few research efforts have been devoted to optimize the two objectives simultaneously [19, 20]. C. Wang [20] used similar profiling-partitioning technology to identify offloaded parts and consider energy and time saving at the same time. A similar method was developed by A. P. Miettinen [19] to offload the most power hungry parts in order to reduce energy consumption. However, both of them use execution time of a program to approximate its energy consumption. The estimated energy consumption, without considering the parameters of CPUs, may be incorrect.
In this work, we provide a higher accuracy energy and execution model by considering important parameters of CPU and offloading targets. Our experiment results indicate that the proposed method can achieve better performance in saving energy and time.