Another interesting question is, “Is our mechanism more suitable to combine with deferred coalescing, never coalescing, or immediate coalescing strategy?”
Remember that in automatic garbage collection environment, an allocator can affect two parts of program execution time, which are time for object allocation and time for garbage collection. An allocator using deferred coalescing or never coalescing strategy can provide very fast allocation while introducing longer garbage collection time. In contrast, an allocator using immediate coalescing strategy has slower allocation speed but result in shorter time for garbage collection.
To our knowledge, no previous research figured out that in garbage collection environment, whether fast allocation is more important or low fragmentation. Although deferred coalescing is generally considered the best strategy among the three, to what extend it wins over never coalescing or immediate coalescing strategy is not clear. Moreover, with our proposed Next Hit Table mechanism, many allocation request can be satisfied by a table lookup and then direct get the chunk from the right free list, even if immediate coalescing strategy is used. This let us to pose an ambitious assumption, immediate coalescing is the best coalescing strategy in the presence of Next Hit Table’s support. We will prove this by experiment.
Chapter 4 Experiment
The goal of this chapter is to set up some experiment to see the following things. First, the comparison between different coalescing strategies using in size-ordered segregated list allocators is shown. Second, we want to see to what extent the performance will gain after adding NHT. Third, it is interesting that whether adding Next Try Table will really change the feasible coalescing strategy.
4.1 Methodology
For evaluating the performance of allocators, several methodologies have been proposed.
For study the fragmentation incurred by an allocator, traditionally memory- trace-driven simulation is used. For study the allocation speed of an allocator, some research do simulation of the number of node being searched during object allocation and uses the number of node being searched to judge an allocator, while other research direct implement their allocators and uses execution time to evaluate the speed performance.
For our research, we do not look at fragmentation and speed separately like many previous research, since we are interested in the allocator’s speed performance and the interaction between the allocator and garbage collector. In a programming language with automatic garbage collection, the seriousness of fragmentation is converted to execution time, that the higher the fragmentation, the greater the time will be spent on garbage collection since higher fragmentation results in more frequent invocation of garbage collection. Whether
an allocator is good should be judged by the time spent on object allocation plus garbage collection, since garbage collection is a very sophisticated process and thus difficult to simulate its speed performance. Therefore, we implement our allocator and measure the execution time spent on object allocation and garbage collection.
Since we want to use execution time to judge the performance of allocator, we must use some way to collect the time information. A traditional technique to measure time is to use operating system’s system calls, but system calls are so expensive and object allocation activities occur very frequently, so the overhead using system calls will disturb the accuracy.
Thus we decide to use alternative approach to collect execution time information.
Some processor have built in hardware event counter which counts elapsed clock cycles and special registers which can be used to record clock cycles information. And using a single assembly instruction can read the content of the register. Thus, we decide to use cycles as the measure of time.
4.2 Environment
Sun’s CVM is chosen as the based Java virtual machine in our implementation because it is open-source and widely known. CVM is pure interpreter-based virtual machine intended to be used in embedded systems [8]. The underlying hardware is Pentium 4 and the chosen operating system is Debian Linux.
We have gathered many benchmarks, including SPECjvm98 and Embedded CaffeineMark. Embedded CaffeineMark is aimed for testing the performance of Java virtual machine in embedded environments. It a synthetic benchmark suite and only do very simple operation, such as looping, calling method, and allocated String. Hence it is not a
representative benchmark to reflect the object allocation behavior in real Java programs.
SPECjvm98 is an industry-standard benchmark suite for evaluating the performance of Java virtual machines [9]. Benchmarks in SPECjvm98 are for solving real-world problem and several of them are commercial applications. Thus, SPECjvm98 is chosen as our test benchmark suite, since we are interesting in the performance of dynamic object allocation.
Among the benchmarks, _222_mpegaudio and _228_jack are not tested because of porting problems.
Table 4-1: Overview of SPECjvm98 benchmark
Benchmarks Short Description
_201_compress
Modified Lempel-Ziv method (LZW).
Basically finds common substrings and replaces them with a variable size code.
_202_jess Java Expert Shell System is based on NASA's CLIPS expert shell system.
_205_raytrace A raytracer that works on a scene depicting a dinosaur.
_209_db Performs multiple database functions on memory resident database.
_213_javac This is the Java compiler from the JDK 1.0.2.
_227_mtrt
A raytracer that works on a scene depicting a dinosaur, where two threads each renders the scene in the input file time-test model.
_228_jack A Java parser generator that is based on the Purdue Compiler Construction Tool Set (PCCTS).