• 沒有找到結果。

CHAPTER 2 BACKGROUND AND RELATED WORK

2.2 A RCHITECTURE OF J AVA V IRTUAL M ACHINE

2.2.2 Heap

Whenever a class instance or array is created in a running Java application, the memory for the new object is allocated from a single heap. Because there is only one heap inside a Java virtual machine instance, all threads share the heap. Since a Java application runs inside its own exclusive Java virtual machine instance, there is a separate heap for every individual running application. Two different Java applications can not access each other’s heap data. Two different threads of the same application could access each other’s heap data. For this reason, we must concern about the proper synchronization of multi-threaded access to objects (heap data) in

Java programs.

The Java virtual machine has an instruction that allocates memory on the heap for a new object but has no instruction for freeing that memory. Just as you can not explicitly free an object in Java source code, you can not explicitly free an object in Java bytecodes. The virtual machine itself is responsible for deciding whether and when to free memory occupied by objects that are no longer referenced by the running application. Usually, a Java virtual machine implementation uses a garbage collector to manage the heap.

In Java virtual machine, there is no specification in regard to how objects should be represented on the heap. Object representation—an integral aspect of the overall design of the heap and garbage collector—is a decision left to implementation designers. The instance variables declared in the object’s class and all of its superclasses make up the primary data that must be represented for each object. Given an object reference, the virtual machine must have the capability to quickly locate the instance data for the object. In addition, there must be some way to access the object’s class data (stored in the method area) when given a reference to the object. For this reason, the memory allocated for an object usually includes some kind of pointer to the method area.

One possible heap design divides the heap into two parts: a handle pool and an object pool. An object reference is a native pointer to a handle pool entry. A handle pool entry has two components: a pointer to instance data in the object pool, and a pointer to class data in the method area. The advantage of this scheme is that the virtual machine can easily combat heap fragmentation. When the virtual machine moves an object in the object pool, it only needs to update one pointer with the object’s new address: the related pointer in the handle pool. The disadvantage of this approach is that every point of access to an object’s instance data requires

dereferencing two pointers. This approach to object representation is shown in Figure

the handle pool the object pool

the heap

Fig. 2-5: Splitting an object across a handle pool and an object pool

Another design of heap makes an object reference a native pointer to a bundle of data that contains the object’s instance data and a pointer to the object’s class data.

This approach requires dereferencing only one pointer to access an object’s instance data but makes moving objects more complicated. When the virtual machine moves an object to combat fragmentation of this kind of heap, it must update every reference to that object anywhere in the runtime data areas. This approach to object representation is shown in Figure 2-6.

ptr into object pool instance data instance data instance data instance data

the heap

the method area

class data ptr into handle pool

an object reference

Fig. 2-6: Keeping object data in one place

Chapter 3

Design and Simulation of Object Cache

In this chapter, the analysis of the object-field access behaviors is presented. First, an example and execution flow of the object-field access are presented. And then, an optimization method of rewriting bytecode proposed by Sun is shown. Next, the benchmark behavior analysis is given. Base on this result, a new acceleration mechanism for object-field access is proposed. Then, the issues that affect our proposed mechanism and the operations of the mechanism are presented. Finally, the performance of the proposed accelerated mechanism is evaluated.

3.1 Object-Field Access Behavior

Java is an object-oriented language. One of its important feature is the data encapsulation. The data and methods of a sturcture are encapsulation into a class. We have to access object data or method through object manipulation instructions.

Traditionally, this kind of instructions are performed by traps and always cost lots of cycles to execute. Figure 3-1 shows the dynamic instruction mix of SPECjvm98 benchmark [6]. We find that class object manipulation (COM) insturctions constitute 19% of total instruction counts. Therein, opcode “getfidle” and “putfield” constitute most of this kind of instructions. In this section, we explain the detail execution flow of object-field access instructions ,especially getfield and putfield, and declare that we want to accelerate the speed of these two instructions.

LS: load and store OC: object creation

A: arithmetic AOM: array object manipulation OSM: operand stack management MI: method invocation

TC: type conversion COM: class object manipulation CT: control transfer

Fig. 3-1: The dynamic instruction mix of SPECjvm98 benchmark

3.1.1 An Example of Object-Field Access

Figure 3-2 shows the formats of bytecode “getfield” and “putfield” and the changes of the operand stack before and after the execution of the bytecode. These two instructions are used to access object-field data. There are two indexbytes follow the opcode and are used to index into the constant pool. Bytecode “getfield” is used to fetch a field data from an object. Before the opcode “getfield” be executed, the object reference of the target field must be put on the top of stack (TOS). After execution, the value of target field is on top of stack. Bytecode “putfield” is used to set a field value in an object. Before the opcode “putfield” be executed, the object reference of the target fieldand the value must be put on the top of stack (TOS). After execution, the value is set in the target field.

Fig. 3-2: The format of opcode “getfield” and “putfield” and the changes of the operand stack before and after the execution of the bytecode

An example of object-field access is shown in Figure 3-3. Figure 3-3(a) is the example source code. We declare two classes (class A and B), each contains one field (field aInt and bInt). We use the keyword “new” to create the object instance from a

class. In this example, A1 is an instance created from class A and B1 is an instance created from class B. Figure 3-3(b) shows the Java bytecode compiled from the source code. It will call the opcode “new” to create the object instance. There is one indexbyte following the opcode. This byte is used to index into the constant pool.

Figure 3-3(c) shows the state of constant pool and local variables. When the bytecode

“new #1” is executed, it will go to constant pool entry #1 to get the necessary information. After execution, the created object reference is on the top of stack. It will call the opcode “astore” to save the object reference to local variables. When we want to access object field data, it will call the opcode “getfield” or “putfield” and need to load some information from constant pool or local variables.

(a) (b) (c) Fig. 3-3: An example of object-field access

3.1.2 Execution Flow of Object-Field Access

In the Java virtual machine, memory is allocated on the garbage-collected heap only as objects. You can not allocate memory for a primitive type on the heap, except as part of an object. On the other hand, only object references and primitive types can reside on the Java stack as local variables. Objects can never reside on the Java stack.

The architectural separation of objects and primitive types in the Java virtual machine is reflected in the Java programming language, in which objects can not be declared as local variables—only object references and primitive types can. Upon declaration, an object reference refers to nothing. Only after the reference has been explicitly initialized—either with a reference to an existing object or with a call to new, the reference refer to an actual object.

When we want to access an object method or field, some sequential actions will be executed. Opcode “getfield” and “putfield” are used to get and put object fields.

There are 2-byte operands called “indexbytes” followed the opcodes used to index to constant pool. Constant pool resolution is executed to find the physical memory location of the referenced field or method. It is the process of dynamically determining concrete values from the symbolic references in the constant pool. The 2-byte operand is used to index to constant pool to find offset. It may need to involve loading one or more classes or interfaces, binding several types, and initializing types.

This process always cost many execution cycles. And then, it is needed to translate the object reference on the top of stack and offset to the physical memory address to access data. This process may need another memory access of handle table to get the object base memory address. The flow of the object field access for getfield is shown in Figure 3-4.

Class A

offset into the constant pool .

Find the referenced field's offset in the constant pool.

Use the object reference

Fig. 3-4: Execution flow of object field access for getfield

Constant Pool Resolution

Java classes and interfaces are dynamically loaded, linked and initialized. Loading is the process of finding the binary form of a class or interface type with a particular name and constructing a class object to represent the class or interface. Linking is the process of taking a binary form of a class or interface type and combining it into the runtime state of the Java Virtual Machine so that it can be executed. Initialization of a class consists of executing its static initialization and the initialization for the static fields declared in the class.

A Java compiler does not presume to know the way in which a Java Virtual Machine lays out classes, interfaces, class instances, or arrays. References in the

constant pool are always initially symbolic. At run-time, the symbolic representation of the reference in the constant pool is used to work out the actual location of the referenced entity. The process of dynamically determining actual locations from symbolic references in the constant pool is known as constant pool resolution or dynamic linking. Constant pool resolution may involve loading one or more classes or interfaces, binding several types, and initializing types. This process always costs lots of cycles. After resolution, the useful information, such as the offset and type of the referenced target, will be placed in the corresponsive constant pool entry. We can get the resolved information when reference to the constant pool entry.

3.1.3 Mechanism of rewriting Java bytecode by SUN

In the optimization implemented in Sun’s version of Java Virtual Machine, compiled Java code is modified at run-time for better performance. The optimization works by dynamically replacing certain instructions by more efficient variants at the first time they are executed. The new instructions take advantage of loading and linking work done the first time the associated normal instruction is executed. For instructions that are rewritten, each instance of the instruction is replaced on its first execution by a _quick pseudo-instruction. Subsequent execution of that instruction instance is always the _quick variant.

In all cases, the instructions with _quick variants reference the constant pool. The _quick pseudo-instructions save time by exploiting that, while the first time an instruction referencing the constant pool must dynamically resolve the constant pool

entry, subsequent invocations of that same instruction must reference the same object and need not resolve the entry again. The rewriting process is as follows:

1. Resolve the specified item in the constant pool.

2. Throw an exception if the item in the constant pool can not be resolved.

3. Overwrite the instruction with the _quick pseudo-instruction and any new operands it requires.

4. Execute the new _quick pseudo-instruction.

For instance, if we execute the bytecode getfield to access object field data, only the first execution goes through the process as shown in Figure 3-4. Subsequent executions become much faster because the Java Virtual Machine substitutes getfield_quick (or getfield2_quick, depend on its type) in place of the getfield

bytecode. The index bytes after this _quick pseudo-instruction already becomes the offset of the target object as shown in Figure 3-5.

The benefit of dynamic linking via rewriting is that “Rewrite once, profited forever.” However, Java Virtual Machine must perform mechanism, such as coherency keeping between instruction cache and data cache, flushing the contents of the instruction buffer or instruction pipeline, to ensure correct functionality.

Class A

Fig. 3-5: Execution flow of object field access for getfield_quick

3.2 Design of Object Cache

In this thesis, we intend to accelerate the execution of object-field access instructions (getfield and putfield). In Section 3.1.2, we have given the detail execution flow of these instructions. To reach our purpose, we divide the design process into two parts: one is benchmark behavior analysis, the other is design issue consideration and simulation. In the first part, benchmark behavior analysis, we analyze the actual execution state of the SPECjvm98 benchmark. We have detailedly investigated the execution features of the benchmark suite, including temporal locality and reusing probability. Base on these results, we propose our acceleration technique—using of object cache. In the second part, cache design issues, design issues that may affect our acceleration mechanism are discussed and simulated. The design issues include the pipeline stage, index policies, cache line size, and cache

size.

3.2.1 Benchmark Behavior Analysis

In this subsection, we analyze the execution of SPECjvm98 benchmark. First, we describe our simulation approach including the benchmark and simulation environment. Then, some important features of benchmark behavior is presented.

Base on these features, we propose our acceleration approach.

Simulation Approach

We choose the SPECjvm98 benchmark as our testing program. A brief explanation of the SPECjvm98 benchmark suite is given below:

_200_check is a simple program to test various features of the JVM to ensure that it provides a suitable environment for Java programs.

_201_compress implements file compression and decompression. It performs five iterations over a set of five tar files, each of them between 0.9 Mbytes and 3 Mbytes large. Each file is read in, compressed, the result is written to memory, then read again, uncompressed, and finally the new file size is checked.

_202_jess is an expert system that reads a list of facts about several word games from an input file and attempts to solve the riddles.

_209_db simulates a simple database management system with a file of persistent records and a list of transactions as inputs. The task is to first build up the database by parsing the records file and then to apply the transactions to this

set.

_213_javac is the JDK 1.0.2 compiler iterating four times over several thousand lines of Java code; the source code of jess serves as input for javac.

_222_mpegaudio is an application that decompresses 4 Mbytes of audio data that conform to the ISO MPEG Layer-3 audio specification.

_228_jack is a Java parser generator that is based on the Purdue Compiler Construction Tool Set (PCCTS). This is an early version of what is now called JavaCC. The workload consists of a file named jack.jack, which contains instructions for the generation of jack itself. This file is fed to jack so that the parser generates itself multiple times.

The class files of the benchmarks were executed on a modified JDK 1.0.2 interpreter to obtain the traces of the instrumented execution characteristics.

These traces were then analyzed to identify the behavior of object-field access.

Moreover, architectural components to support object-field access were proposed and simulated. The benchmark traces were also used to evaluate the proposed architectural components. Figure 3-6 shows this approach.

Fig. 3-6: Simulation approach

Temporal Locality and Reusing Probability

To accelerate object-field access, we analyzed the actual execution of the instructions of SPECjvm98 first. We find that most of these instructions usually access the same fields. Figure 3-7 shows the probability of an object field that will be reused in the next n object field accesses. We can see that over 70% of the object fields will be reused in the next 100 times access. Then, we make a simple simulation.

We use an LRU buffer to store these object fields. Figure 3-8 shows the hit rate of the LRU buffer with m entries. We find that over 80% of the object fields can be hit in a 256-entry LRU buffer except for benchmark _209_db. Because _209_db simulate a large number of database records, only 60% of object fields can be hit in a 256-entry LRU buffer. Base on the analysis, we conclude that object fields have good temporal locality and are good for reuse.

Fig. 3-7: Probability of an object field that to be reused in the next n object field accesses

Fig. 3-8: Hit rate of the LRU buffer with m entries

Acceleration Approach — Object Cache

Through the analysis in the previous subsection, we know that object fields have good temporal locality and are good for reuse. Because of this feature of object fields, obviously, there is a good chance to earn speedup of performance by using a dedicated cache to store the data of referenced object fields. Therefore, we attach a cache, called object cache, in Java processor to execute _quick code directly without accessing constant pool and handle table. See Figure 3-9.

Fig. 3-9: Using an object cache to access object field data directly

3.2.2 Design Issues

The issues that affect the performance and the operations of our proposed mechanism for optimizing object field accesses are discussed in this subsection. The design issues of the object cache include indexing policy, pipeline stage design, cache line size, and cache size.

Issue I: Indexing Policy

Now we have to choose how to index the object cache entry, that is, how to identify a referenced object field. In the original bytecode “getfield” and “putfield”, the value of the constant pool base register is added with the index byte of the bytecode to access constant pool. It means that the constant pool base register and index byte pair is used to identify a referenced object field. However, when garbage collection happened, constant pool base address may be changed, i.e., the physical memory address of an object field is not always invariable. In other words, we can not use physical memory address to index object cache because of garbage collection.

Therefore, we consider another choice. The choice is to use object reference and offset supported by _quick code to index the object cache. Object reference is the unique id of an object and will never be changed. Offset is the location of the field inside the class instance and will never be changed, either. These mean that one object reference and offset pair map to only one object field. Hence, we use object reference and offset to index object cache.

To reduce the overhead of the tag field in the object cache, we analyzed the range

To reduce the overhead of the tag field in the object cache, we analyzed the range

相關文件