• 沒有找到結果。

6.3 Comparison to other systems

6.3.3 Harbor

Harbor guarantees safety by adding checks on the host. A verifier on the node then verifies all checks are in place before executing the programme. While checks are added by the host with its ample resources, Harbor is still bound by the limited resources on the node since the node needs to be able to verify correctness.

Control flow safety In a Harbor application, run-time checks are added to protect writes.

To guarantee applications cannot jump past these run-time checks, Harbor disallows com-puted branches so the verifier can check the target address of each branch.

Function returns could also be used to jump to an arbitrary location if the return address can be corrupted. To prevent this, Harbor uses function entry and exit stubs that store the return address in a ’safe stack’ in a reserved section of memory not accessible to the application. This adds an overhead of 76 cycles per function call.

Memory safety Memory safety in Harbor differs significantly from CapeVM. The entire address space is split into fixed sized blocks, each of which can be assigned to a module.

A memory map keeps track of the ownership of each block.

Harbor supports multiple modules, and both the block size and the maximum number of modules are parameters that can be changed. The overhead for storing the memory map is 256 bytes for an 8-byte block size and maximum of 8 modules, which are the defaults used in the paper. Using only 2 modules, which is sufficient to isolate the OS from the application, this is reduced to 128 bytes. Increasing the block size will also reduce the size of the memory map, but at the cost of greater fragmentation.

All memory writes are preceded by a call to the write_access_check function, which checks the current module has permission to write to the target address. This im-poses a run-time overhead of 65 cycles per write.

CapeVM only needs a run-time check for writes to the heap, while Harbor checks all writes, including local variables. This, combined with Harbor’s more expensive write_access_check function, suggests its overhead will be significantly larger than CapeVM’s.

Verifier Although Harbor’s safety checks are added on the host, the correctness of the system only depends on the verifier running on the node. This verifier is a relatively small and simple component.

This is a significant advantage of Harbor’s approach. Malicious attacks often exploit bugs in the system they are trying to corrupt. Compared to more complex systems like t-kernel and CapeVM, the simplicity and small size of Harbor’s verifier reduces the chance of exploitable bugs, but this comes at the cost of an increased run-time overhead.

Chapter 7 Evaluation

This dissertation presents a number of techniques to improve the performance of sensor node virtual machines and make them safe, while staying within the constraints set out in Section 4.1. This chapter evaluates to what extent CapeVM meets these goals by measur-ing its performance and code size overhead for a number of different benchmarks.

First, Section 7.1 describes our experimental setup, the benchmarks used, and how the source code for these benchmarks was obtained.

Next, Section 7.2 uses the largest benchmark to examine the effect of the lack of opti-misations done by the standard javac compiler, and the manual optiopti-misations performed on the Java source.

Sections 7.3, 7.4 and 7.5 evaluate the result of the optimisations to the AOT translation process on performance and code size.

Sections 7.6 and 7.7 focus on two specific optimisations: adding support for constant arrays and lightweight method calls.

The cost of adding safety checks is examined in Section 7.8, which also compares CapeVM’s overhead to existing native code systems that provide safety.

Platform independence is one of the main reasons to use a VM. While CapeVM was only implemented for the ATmega128, Section 7.9 presents measurements that give an indication of the expected performance on other common sensor node platforms.

Finally, in Section 7.10 we discuss the limitations and cost of using a VM, and describe some known hard cases which CapeVM currently does not handle as well.

7.1 Benchmarks and experimental setup

This section describes the experimental setup, the benchmarks used, how the source code for each benchmark was obtained, and any relevant details in their implementation.

A set of twelve different benchmarks, shown in Table 7.1, is used to measure the ef-fect of the optimisations and the overhead of safety checks. Table 7.2 shows some key characteristics of these benchmarks: their code size, stack depth, and mix of executed in-structions. This mix of benchmarks was chosen for several reasons. Some benchmarks, bubble sort, binary search, MD5, FFT, and Outlier detection were chosen because they are used in various related work, allowing a comparison of CapeVM to these results.

A number of benchmarks are small benchmarks that process arrays of data. While the actual processing done may not be typical for sensor networks (although the MoteTrack application does do a bubble sort), the small size of these benchmarks make them useful to highlights specific behaviours that would be lost in the averages of a larger benchmark.

The CoreMark benchmark is an industry standard benchmark to measure CPU per-formance. It is a larger benchmark, mixing several kinds of processing: besides array processing in the form of matrix operations, it also contains linked list processing and a state machine. Since CoreMark mixes different kinds of processing, it is a good example of the expected average behaviour. The many different methods enables us to evaluate the effect of method calls and show CapeVM can efficiently handle larger, more complex applications.

Finally, Outlier detection, LEC, MoteTrack and heat detection are all code that was specifically developed for sensor nodes, and FFT is a typical signal processing operation, which is a common and potentially expensive task for sensor nodes.

Sensor nodes spend their time and energy on three main tasks: accessing sensor and actuators, communication, and data processing. The first two require interaction with the hardware, so they must be implemented in native code in the VM’s standard library. Since native code is not influenced by the performance of the VM, the benchmarks used in this evaluation focus on data processing.

Table 7.1: Benchmarks used in the evaluation

Benchmark Source and input data Size Typical sensor Used as a

node code benchmark in

Bubble sort Darjeeling sources [12] single method no [13, 24]

Input: 256 16-bit numbers sorted in reverse order, as in the original source.

Heap sort Standard heap sort taken from [3]. two methods no Input: 256 16-bit numbers sorted in reverse

order.

Binary search TakaTuka sources [6] single method no [24]

Input: worst case (not found) search in 100 16-bit values, as in [24].

XXTEA Wheeler and Needham [102] single method no

Input: 32 32-bit numbers. Contents do not affect performance.

MD5 Darjeeling sources [12] single method no [13, 24]

Input: the string ’message digest’ as in the original source.

RC5 LibTomCrypt [86] single method no

Input: first test case in libtomcrypt sources (64 bit).

FFT Fixed point FFT using the widespread fix_fft.c [94].

single method yes [49]

Input: 64 8-bit or 16-bit numbers. Contents do not affect performance.

Outlier detection Our implementation of the algorithm de-scribed in [49].

single method yes [49]

Input: 20 16-bit values increasing from 0 to 19, with outliers of 1000 and -1000 at index 2 and 11.

LEC Our implementation of the compression al-gorithm described in [63].

three methods yes Input: 256 16-bit ECG measurements

downloaded from PhysioNet [71].

CoreMark 1.0 EEMBC [89] full application no

Input: defined in CoreMark source.

MoteTrack Lorincz [60, 59] full application yes

Input: defined in MoteTrack source.

Heat detection Adapted from code used in our group to track objects using an 8x8 pixel heat sen-sor.

full application yes

Input: 101 frames of 8x8 16-bit values for calibration, 25 frames for detection.

Table 7.2: Benchmark characteristics, using optimised source code

B.sort H.sort Bin.Search XXTEA MD5 RC5 FFT Outlier LEC CoreMark MoteTrack HeatCalib HeatDetect average CODE SIZE (BYTES)

Bytecode 74 134 83 379 2983 453 441 287 334 2788 2552 310 2661

Native C 118 298 146 1442 9458 910 1292 380 560 6128 3906 1944 5294

AOT original 418 1012 412 3792 29502 4090 2576 1402 1628 13982 12784 2454 17248

AOT optimised 258 596 310 2236 14654 2018 1324 800 1056 8990 8478 1610 10346

EXECUTED BYTECODE INSTRUCTIONS (% of total executed bytecode instructions before optimisation)

Load/Store 79.8 71.7 58.1 44.9 43.3 41.1 61.1 69.0 59.5 54.1 70.3 51.8 48.0 57.9

Constant load 0.2 8.1 11.0 12.5 19.1 17.6 6.4 0.6 7.9 10.0 5.4 10.1 16.6 9.7

Processing 8.0 7.8 14.8 32.4 28.9 36.6 18.0 13.0 12.7 14.0 5.9 17.9 10.3 16.9

math 8.0 5.5 10.3 10.1 12.5 10.7 11.6 13.0 7.1 8.2 5.9 3.7 9.4 8.9

bit shift 0.0 2.2 4.5 8.1 5.4 8.0 6.1 0.0 3.8 2.2 0.0 8.5 0.9 3.8

bit logic 0.0 0.0 0.0 14.2 11.0 17.9 0.3 0.0 1.9 3.6 0.0 5.7 0.0 4.2

Branches 12.0 10.9 15.5 4.0 5.8 2.3 5.1 17.4 10.5 16.0 13.6 14.7 19.2 11.3

Invoke 0.0 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.2 0.9 0.3 0.0 0.1

Others 0.0 1.0 0.6 0.2 2.5 2.4 9.4 0.0 7.1 4.7 2.2 4.2 5.9 3.1

Total 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0

STACK (bytes)

Max. stack 6 8 4 24 20 14 10 6 18 16 12 22 16 13.5

Avg. stack 2.08 2.37 2.14 11.76 6.30 6.77 3.36 1.89 2.73 3.15 2.19 4.83 3.08 4.1

B.sort H.sort Bin.Search XXTEA MD5 RC5 FFT Outlier LEC CoreMark MoteTrack HeatCalib HeatDetect average

As noted before, to what extent an application is affected by the VM’s slowdown is highly application dependent. The Amulet smart watch system discussed in Chapter 1 notes that energy consumed when the CPU is in active mode is significant, and their breakdown of the CPU active time shows most is spent in application code rather than the OS. Similarly, the Mercury motion analysis platform show the energy spent on feature extraction or FFT becomes significant or dominant in the total energy consumption when multiplied by the typical slowdown seen in interpreting VMs.

We argue that some form of array processing will be common in many sensor node applications, and especially so in applications that are significantly affected by the VM’s slowdown. First, arrays of data appear in many sensor node application, both in the form of sensor data, and as sent or received radio messages that need to be constructed or parsed.

Second, processing such arrays is likely to be a significant part of the total processing time, for the simple reason that looping over an array of elements quickly takes more time than processing a single value.

Finally, we note that compared to complex high performance CPUs, performance on sensor node CPUs is less affected by the exact workload. The simple CPUs found on sen-sor nodes typically have no caches, and no (ATmega and MSP430) or very short (Cortex-M0) pipelines. Thus, factors like branch prediction and cache line alignment that can have a very large impact on more complex CPUs, have no impact on the results presented here.

The largest performance difference found in all benchmarks is between a 1.18x slowdown for FFT, and 2.56x slowdown for MoteTrack.