• 沒有找到結果。

As smart mobile embedded devices become more prevalent, the demand for their performance also rises. Limited computing resources and energy restraints have always been major design factors for these embedded mobile devices. In order to maximize performance, optimizing application and system software is often the most effective and economical approach. Since how to optimize the software is not always obvious, profiling is an essential methodology for pinpointing performance bottlenecks without introducing excessive overhead which could skew measured results. In addition to optimizing software, profiling can also be used to determine the hardware requirements when a target system software has been chosen, and assist the hardware-software co-design process.

Modern Central Processing Units today provide hardware based performance monitors [1, 2, 3]. These hardware counters are registers that gets updated by specific CPU events.

Although different CPU architecture offers their own specific combination of hardware performance monitors, generally they all provide clock cycle based and cache performance related functionalities. Profiling tools, such as Oprofile [4], Gprof [5], and Google Performance Tools [6], make use of these hardware performance monitors, in order to further reduce profiling overhead.

The majority of popular mobile operation systems today are fine tuned and tied to a few

2

selected hardware platforms in order to optimize performance. These types of specific hardware are often locked to prevent users from altering the mobile operation system, fearing it would diminish performance. However, there are still a variety of mobile operation systems that aims to support multiple architectures, such as Meamo [7] and Meego [8]. Currently the most widespread is Android [9].

Android is a fast evolving mobile platform that differentiates itself from the rest by delivering frequent version upgrades, supporting a variety of hardware devices and running applications on top of a runtime virtual machine. This Dalvik Virtual Machine is based on the traditional JAVA Virtual Machine [10], but modified to accommodate low memory requirements and allow multiple VM instances concurrently [11]. It relies on the underlying operating system for process isolation, memory management and threading support. DVM eliminates the need to recompile Android applications for different architecture that Android supports and it is an integral part of Android achieving architecture-neutrality. However, these features make optimizing Android hardware and software a more challenging task, as it becomes difficult to determine whether performance bottlenecks occurred in the virtual machine application level, the user-space libraries or deeper in the Linux kernel level.

The need for a system-wide performance analysis to accelerate the Android hardware-software co-design process has already produced many Java level trace tools, such as Android‟s own Dalvik Debug Monitor Server [12], logcat [13] and Traceview [14]. Most of the tools focus on tracing information on the DVM-level. However some of these tools approximate the time spent on Linux user-space libraries by adding instrumentations to the beginning and end of each function of a test application. By doing so, they create large operational and runtime overheads which introduce inaccuracies to the profiling results.

3

In order to make up for the lack of Linux user-space and kernel space information, Linux systems‟ profiling and tracing tools are used by Android developers along with those DVM-level specific tools. These traditional Linux profiling and tracing tools, such as LTTng [15], strace [16], ltrace [17] and Oprofile are unable to determine the relationship between Linux libraries and Java applications running inside Dalvik virtual machines, since runtime interpreted Dalvik opcode [18] segments for a DVM application are first loaded into its Dalvik virtual machine heap, then executed as the heap itself. Traditional Linux profiling and tracing tools can only see the heap being executed and can not distinguish which method is currently running. Furthermore, Android applications running inside Dalvik virtual machines are forked and controlled by the Zygote parent process. These issues create an information barrier, making it difficult for Linux user-space tools to retrieve useful information past the Dalvik virtual machine layer.

The information gap between host machine and guest virtual machine exists on most Java Virtual Machines. Various vertical profiling methods have been proposed to bridge this gap. Many of the methods are only implemented on Jikes RVM [19], which is a Java implementation of JVM, and by doing so eliminates some physical limitations of those JVM implementations on physical hosts. Others vertical profiling collects appropriate information and saves them to the file system, then modify the post-profiling analysis tool to combine collected virtual machine information and profiling samples. This creates temporal and spatial overheads on top of those already created by native Linux profiling.

To enable vertical profiling on a mobile platform with limited space and computing power, such as Android, overhead must be minimized to reduce interference to the system during execution for the sampled data to remain valuable. To achieve this goal, the following issues must be addressed.

4

Bridging the information barrier. The bridge over the DVM to Kernel space information gap should integrate into existing profiling flow. It should also be modular and avoids being overly intrusive to the host system or DVM. Since the profiler samples frequently, the information should be available in kernel space promptly after a method has been invoked by the virtual machine.

Sending useful DVM application runtime information to the profiler. Vertical profiling on the Android system requires an architecture-neutral method for retrieving and sending the least amount of relevant application information from the virtual machine to the kernel and profiler with the least amount of runtime overhead. The method must also avoid modifying each target Dalvik application for the purpose of profiling.

Correlating Dalvik application methods to profiled samples. Currently the profiler can only map a DVM application sample to the thread‟s Dalvik runtime interpreter heap. There needs to be a mapping mechanism and an algorithm to correlate addresses for the existing profiling flow to correlate samples to the application method-level.

This work proposes the Vertical Virtual Address Remapping Integrated (VARI) profiler for the Android runtime system, in which a low-overhead direct memory map address replacement tunnel, linking virtual machine to kernel, is devised to bridge the information barrier. This thesis focuses on providing techniques to send virtual memory address of Java application methods and relevant information from the Dalvik Instrumentation Module to the Memory Tunnel Virtual device. Additionally, this thesis proposes the mechanism and algorithm allowing the profiler to correlate samples to actual Java application methods, thus provides a way for performance analysis tools to tie the usage of Linux native libraries

5

with those Android applications that utilizes them. It is also this thesis‟ goal to minimize probe effect related inaccuracies. To that end, this thesis adopts a modified virtual machine event-based instrumentation approach that suits the unique process execution and stack management of Android DVM.

Furthermore, this profiler enables reconfigurable profiling by configuring the Dalvik Instrumentation Module to only send information related to the events of interest. The granularity can be classified as three levels: DVM Application method-level, DVM application level and Android service specific profiling. The proposed reconfigurable vertical profiling framework streamlines the difficult task of identifying system bottlenecks and accelerates the Android hardware-software co-design process [20].

This thesis is organized as follows. Chapter 2 surveys background and related works.

Chapter 3 presents the VARI profiler architecture and the reconfigurable vertical profiling flow. Chapter 4 contains an overview of the memory map tunnel from the virtual machine to Linux kernel-space and how to instrument the Dalvik virtual machine with the Dalvik Instrumentation Module. In addition, methodologies to ensure accurate correlation between user-space samples and virtual machine applications are introduced. Chapter 5 focuses the technique to lower profiling overhead with different granularity levels. The experiment result and overhead analysis are presented in Chapter 6. Chapter 7 concludes this thesis.

6

相關文件