• 沒有找到結果。

Chapter 1 Introduction

1.5 Synopsis

The remainder of this thesis is organized as follows. Chapter 2 discusses Java Just-In-Time compiler and Andes 32bit-16bit Instruction Set Architectures. In Chapter 3, we introduce the Multiple Fixed-width ISA Emitter. In Chapter 4, experiments and the results are presented and be analyzed. In Chapter 5, the conclusion and future work are given.

Chapter 2

Java Just-In-Time compiler and Andes 32bit-16bit Instruction Set Architectures

As the Java language becomes more and more important for programming embedded systems, translation at the byte-code level has been proposed to increase program performance. Java VM proposes a just-in-time compiler architecture, which executes target-machine code for improve performance.

ANDES ISA proposes a special architecture which uses mixed-mode instructions without the need of mode-switching instructions. Its 16-bit ISA almost reflects to 32-bit ISA, but it has an alignment restriction: 32-bit memory instruction reference object that must be word-alignment.

In our research, our target is the ANDES processor. We port an existing JVM JIT to the ANDES platform and then modify the code emitter so that it can generate16-bit as well as 32-bit instructions. We use several benchmark tests to measure the performance of the code emitter.

2.1 CVM Internals

The virtual machine we use the Connected Device Configuration Hotspot Implementation (CVM) version of JAVA VM, which is highly optimized for resource-constrained devices, such as consumer electronic products and embedded

devices. Portability is the most important benefits of the Java system design. It includes a dynamic compiler, which is also called a just-in-time compiler (JIT). While a method in the Java program has been used frequently enough, JIT converts the method’s bytecodes to native code during execution time to improve future performance. This operation has two passes: First, the front end converts Java bytecode to an intermediate representation (IR); Second, the back end converts the IR to native code. The architecture is shown in Figure 2.1.

Figure 2.1 Java program execute

2.1.1 JIT Front End

The front end is portable for different execution environments. It converts the bytecode to an intermediate representation (IR). Figure 2.3 is an example of IR.

Figure 2.2 Frontend

2.1.2 JIT Back End

The back end converts IR to native instructions. An IR tree is parsed by a parser.

The parser

Figure 2.3 An example of IR.

, which is produced by the Java Code Select (JCS) tool at build time, performs pattern matching for tree-based data structures in which the patterns are specified as a set of JCS rules. These rules are translated into C source code and initialized data structures. Code generation is done with rule-based pattern matching on trees. When there are multiple possibilities, JCS choose the rules with the least

x = y + 1000;

Translate to IR Tree Compile to bytecodes iload y

sipush 1000 iadd

istore x

ASSIGN

LOCAL(X) ADD

LOCAL(Y) CONSTANT (1000)

Byte

Code IR Generator

Frontend

Method

IR

static costs. Figure 2.5(A) is an example of JCS rules.

Method IR

Register

Manager Emitter

Backend

Method Machine

Code JCS

Parser

Figure 2.4 Backend

Figure 2.5 An example of a JCS rule

The first part of this rule is the result and the second part is a pattern. They are used for pattern matching. For instance, the subtree in Figure 2.5 (B) will be matched by the JCS rule in Figure 2.5 (C). If a subtree can be matched in multiple ways, the rule with the lowest static cost will be selected. The static cost is specified as the third part of a rule. After a match is found, the fourth and the fifth parts of the rule will be used for setting up a register set. This is shown in Figure 2.6(A). First, a bottom-up traversal of the matched tree passes the use register set, shown in Figure 2.6(B). Second, a top-down traversal passes the accept register set, shown in Figure 2.6(C). After these two passes, the register manager knows which registers are provided. Finally, the last part is the semantic actions which will call the code emitter to emit native instructions. It is shown in Figure 2.7(A) and Figure 2.7(B).

Figure 2.6 Set Register Set

2.2 ANDES Instruction Set Architectures

In our system, we use the ANDES instruction set, which is a RISC-style register-based instruction set. In Andes ISA, we may freely mix 16-bit and 32-bit instructions without the need of mode-switching instructions. The 16-bit ISA almost reflects the 32-bit ISA, but there is an alignment restriction: When a 32-bit

instruction is written to the code buffer, the address of the memory cell in the code buffer that will hold the instruction must be word-aligned. Otherwise, the 32-bit instruction must be broken into two 16-bit halves. Each half is written to the code buffer separately. A Word-Alignment exception will be thrown when we attempt to write a 32-bit instruction at half-word alignment.

Figure 2.7 JCS rule calls emitter to emit native code

2.2.1 General Purpose Register

Andes 32-bit instructions can access thirty-two 32-bit general-purpose registers (GPR). A 16-bit instruction’s register index can be 5 bits, 4 bits, or 3 bits in different instruction formats. A 3-bit and 4-bit index can only access a part of the GPRs. The 3-bit and 4-bit register indices are mapped to real registers according to Table 2.1.

2.2.2 The Andes Instruction Set

In this section, we introduce the part of the Andes instruction set that is related to our research. In Andes, the memory address accessed by a 32-bit memory instruction has to be word-aligned. Otherwise, a Data Alignment Check exception will be

generated. Table 2.2 - 2.8 are examples of which the maps of 32-bit instruction translate to 16 bit instruction.

Table 2.1 Andes General Purpose Registers

Register 32/16-bit (5) 16-bit (4) 16-bit (3) Comments R0 A0 H0 O0

R15 Ta Temporary register for assembler

Implied register for slt(s|i)45, b[eq|ne]zs8

R26 P0 Reserved for Privileged-mode use.

R27 P1 Reserved for Privileged-mode use.

R28 S9/Fp Frame pointer / Saved by callee

R29 Gp Global pointer

R30 Lp Link pointer

R31 Sp Stack pointer

Table 2.2 Add/Sub Instruction

32-bit instruction 16-bit instruction Special case ADD ADD333

Table 2.3 Move instruction

32-bit instruction 16-bit instruction Special case MOVI MOVI55

ADDI/ORI MOV55 ADDI R# R# 0

Table 2.4 Shift Instruction

32-bit instruction 16-bit instruction Special case SRAI SRAI45

SRLI SRLI45 SLLI SLLI333

Table 2.5 Bit Filed Mask Instruction 32-bit instruction 16-bit instruction Special case

ZEB ZEB333

Table 2.6 Branch and Jump Instruction 32-bit instruction 16-bit instruction Special case

BEQ BEQS38 Branch on Equal Implied R5

BNE BNES38 Branch on Not Equal Implied R5 BEQZ BEQZ38

BNEZ BNEZ38 J J8 JR JR5 JRAL JRAL5

Table 2.7 Load/Store Instruction

32-bit instruction 16-bit instruction Special case LWI LWI450

LWI333

LWI37 Load Word with Implied FP LWI.bi LWI333.bi

LHI LHI333 LBI LBI333 SWI SWI450

SWI333

SWI37 Store Word with Implied FP SWI.bi SWI333.bi

SHI SHI333 SBI SBI333

Table 2.8 Compare and Branch Instruction 32-bit instruction 16-bit instruction Special case

SLTI SLTI45 SLTSI SLTSI45

SLT SLT45 SLTS SLTS45

BEQZ BEQZS8 Branch on Equal Zero Implied R15 BNEZ BNEZS8 Branch on Not Equal Zero Implied

R15

Chapter 3

The Multiple Fixed-width ISA Emitter

The Multiple Fixed-width ISA Emitter can emit 32-bit and 16-bit instructions in any desired mixture. The register manager will assign a register to a particular instruction and then the emitter will determine if a 16-bit instruction can be use. If not, a 32-bit instruction will generated instead. When a 16-bit instruction is to be generated, the register number must be converted according Table 2.1. Because, in Andes, the memory address accessed by a 32-bit memory instruction must be word-aligned, when the emitter wishes to write a 32-bit instruction to the code, it has to break that instruction into two 16-bit half-words and write the two half-words separately in order to avoid a Data-Alignment exception. It is essential for the register manager to choose an appropriate register if the emitter attempts to generate 16-bit instructions. The JIT writer can set up four register sets (CVMCPU_PHI_REG_SET,

CVMCPU_BUSY_SET, CVMCPU_NON_VOLATILE_SET, and CVMCPU_VOLATILE_SET) for the register manager to choose appropriate registers.

We may tune the four register sets to emit as many 16-bit instructions as possible. For certain patch points, we must be sure that patch instruction has the same size with the original instruction.

3.1 Multiple Fixed-width ISA Emitter Introduction

While JCS rules select one instruction, the emitter will be called to emit the instruction to code buffer. The Multiple Fixed-width ISA Emitter adds a test (“16-bitable” in Figure 3.1(b)) to determine if the emitter can emit 16-bit instruction.

If so, it will translate the 32-bit instruction to the corresponding 16-bit instruction.

(a) (b) Figure 3.1 (a) Original emitter. (b) Adding the “16-bitable” test.

3.1.1 Determine Instruction

In Andes ISA, there are six formats for 16-bit instructions---333–form, 45-form, 37-form, 38-form, 8-form, and 55-form. (333-form and 45-form are the two most popular formats for 16-bit instructions.) Some 32-bit instructions even do not have the

16-bit counterparts. The emitter first needs to determine if a 16-bit instruction can be issued. Figures 3.2 (a) is the flow chart for testing the 333-form and Figure 3.3 (a) is the flow chart for testing the 45-form. For example, in Figure 3.2 (b), an add instruction has registers R0 and R1 and the immediate value imm. R0 and R1 fall in the ranger for registers in an addi333 instruction. Furthermore, if the immediate value is no more than 7 (0x111), this instruction will be translated into a 16-bit instruction in the 333-form.

(a) (b) Ex:

((R0 |R1| imm)>>3) Addi R0 R1 imm

Addi333 R0 R1 imm

Figure 3.2 (a) Flow chart of testing the 333-form. (b) An example of Addi333.

When the immediate value is larger than 7, the emitter will try other forms, say the 45-form (4 bits for specifying a register and 5 bits for specifying the immediate value.) Figure 3.3 (a) shows the flow chart for testing if the 45-form can be used.

There are other forms for 16-bit instructions. The emitter will try each form in turn.

When no 16-bit form is applicable, a 32-bit instruction will be issued instead.

3.1.2 Translating Registers

A register may be encoded in 3, 4, or 5 bits according to the selected instruction formats. The encoding is shown in Table 3.1. For example, R17 is encoded as 10001 (T1) in 5 bits and as 1101 (H13) in 4 bits.

(a) (b) Ex:

R16 == R16 (31 >>5) ==0 15<R16 <20

Addi R16 R16 31

Addi45 H12 H12 31

Figure 3.3 (a) Flow chart for testing the 45-form. (b) An example of ADDI45.

When the emitter wants to emit a 16-bit instruction, the emitter will test if the register assigned by the register manager could be used in a 16-bit instruction. For example, the 333-form is restricted to use registers R0 through R7 while the 45-form can use only registers R0-R11 and R16-R19 in the 4-bit field. (There is no restriction for the 5-bit field since 5 bits are enough to address any of the 32 general-purpose registers.) If the assigned register can fit in a 16-bit instruction form, then the emitter will translate the encoding of the register according to Table 3.1. This means that R16-R19 will be translated into H11-H15. The flowchart for the translation is shown in Figure 3.4 (a). The used registers of different mode are shown in Figure 3.5.

Table 3.1. The difference of two kinds of register set.

Register 32/16 32/16-bit (5 bits) 16-bit (4 bits)

R0 A0 H0

R1 A1 H1

R2 A2 H2

R3 A3 H3

R4 A4 H4

R5 A5 H5

R6 S0 H6

R7 S1 H7

R8 S2 H8

R9 S3 H9

R10 S4 H10

R11 S5 H11

R16 T0 H12

R17 T1 H13

R18 T2 H14

R19 T3 H15

(a) (b)

R16 -4 = H12 Ex:

Addi R16 R16 31

Addi45 H12 H12 31

Figure 3.4 (a) Flow chart for translating register encoding. (b) An example of register translation.

r0

Figure 3.5 Register range of 333 mode and 45 mode Figure 3.5 Register range of 333 mode and 45 mode

3.1.3 Instruction Alignment 3.1.3 Instruction Alignment

In Andes, there is a restriction that the memory address accessed by a 32-bit memory instruction (Load/Store) must be word-aligned, that the least significant two bits of the address must be 0. When the emitter wants to place a 32-bit instruction into the code buffer, it will break the instruction into two half-words. Each half-word is written into the code buffer separately. This is explained in Figure 3.6.

In Andes, there is a restriction that the memory address accessed by a 32-bit memory instruction (Load/Store) must be word-aligned, that the least significant two bits of the address must be 0. When the emitter wants to place a 32-bit instruction into the code buffer, it will break the instruction into two half-words. Each half-word is written into the code buffer separately. This is explained in Figure 3.6.

3.2 Register Setting 3.2 Register Setting

A JIT writer may adjust the register setting to emit more 16-bit instructions.

There are two places in the JIT that can be adjusted: the VM register set and the four code generator register sets.

A JIT writer may adjust the register setting to emit more 16-bit instructions.

There are two places in the JIT that can be adjusted: the VM register set and the four code generator register sets.

Ins32-1 Ins32-2

Figure 3.6 Avoid the Data Alignment Check exceptions when writing a 32-bit instruction into code buffer.

3.2.1 The VM Register Set

The VM register set contains four special registers: JSP_REG, JFP_REG, CHUNKEND_REG, and CVMCPU_EE_REG. They must be mapped to Andes registers properly. In our emitter, we use register FP for JFP_REG because it can use the special 37-form instructions.

Table 3.2. VM Register Setting

VM Register Register

JSP_REG R11

3.2.2 Code Generator Register Set

There are four code generator register sets: CVMCPU_PHI_REG_SET, CVMCPU_BUSY_SET, CVMCPU_NON_VOLATILE_SET, and

CVMCPU_VOLATILE_SET in the header file jitrisc_cpu.h. The four register sets are used by the register manager to set up CVMRM_ANY_REG_SET,

CVMRM_SAFE_SET, and CVMRM_UNSAFE_SET. (The CVMRM_EMPTY_SET is always an empty set.) When the JCS rules requests for a register, the register manager will select a register out of one of these four register sets. We wish to distribute the registers that can be used to generate 16-bit instructions into these four sets so that such a register is available when JCS rules requests for a register. The best distribution should be determined by extensive benchmarks. Currently, the

distribution is shown in Table 3.3.

The register manager sets up the four sets CVMRM_ANY_REG_SET,

CVMRM_SAFE_SET, CVMRM_UNSAFE_SET, and CVMRM_EMPTY_SET as follows. The CVMRM_EMPTY_SET is always an empty set. The

CVMRM_ANY_REG_SET includes all registers except those in the

CVMCPU_BUSY_SET. The CVMRM_SAFE_SET includes all the registers that are in both CVMCPU_NON_VOLATILE_SET and CVMRM_ANY_SET. Equivalently, the CVMRM_SAFE_SET includes all the registers that are in

CVMCPU_NON_VOLATILE_SET but not in CVMCPU_BUSY_SET. The CVMRM_UNSAFE_SET includes all the registers that are in both CVMCPU_

VOLATILE_SET and CVMRM_ANY_SET. Equivalently, the

CVMRM_UNSAFE_SET includes all the registers that are in CVMCPU_

VOLATILE_SET but not in CVMCPU_BUSY_SET. Table 3.4 summarizes the above specification in the register manager.

Table 3.3. RISC_CPU Register Setting

RISC_CPU Register Set Register

CVMCPU_PHI_REG_SET S1, S4, S5 ,S6 ,S7 ,S8 ,GP

CVMCPU_BUSY_SET TA, P0, P1, FP

CVMCPU_NON_VOLATILE_SET S0-S8, FP, GP

CVMCPU_VOLATILE_SET ALL &

~CVMCPU_NON_VOLATILE_SET Table 3.4. Register Manager register setting

JIT RegMan Register Set Register set

CVMRM_BUSY_SET CVMCPU_BUSY_SET | 1U<<CVMCPU_SP_REG |

1U<<CVMCPU_JSP_REG | 1U<<CVMCPU_JFP_REG | CVMRM_CHUNKEND_BUSY_BIT | CVMRM_CVMGLOBALS_BUSY_BIT

| CVMRM_EE_BUSY_BIT | CVMRM_CP_BUSY_BIT | CVMRM_GC_BUSY_BIT

CVMRM_ANY_REG_SET ALL &~(BUSY_SET)

CVMRM_SAFE_SET (CVMCPU_NON_VOLATILE_SET &

CVMRM_ANY_SET)

CVMRM_UNSAFE_SET (CVMCPU_VOLATILE_SET &

CVMRM_ANY_SET)

CVMRM_EMPTY_SET Always empty set

3.3 Instruction Patch and Adjust

While the emitter emits a forward branch or jump to glue code, the address field in this instruction will be patched later. Since we do not know the size of the actual offset in the instruction, to be on the safe side, we always use 32-bit instructions for forward branch or jump to glue code.

Furthermore, the instructions for null check may also need additional patches. It is discussed in Sections 3.3.3.

3.3.1 Forward Branch

When the emitter emits a branch instruction with unknown offset, it will always issue a 32-bit instruction. The address field in this instruction will be patched later when the address of the branch target is known. Figure 3.7 shows that patch a forward branch instruction.

3.3.2 Glue Code

Sometimes the program has to calculate certain special values when it reaches a particular instruction the first time. (Ex. ResolveMethodTableOffsetGlue) The emitter will issue a “Jarl .glue” instruction to force the program to jump to the glue code.

The special value is calculated in the glue code. At the end of the glue code, the calculated vale will be written to the word immediately following the “Jarl”

instruction and the “Jarl” instruction is changed to a “J .skip” instruction. Having done that, the program continues execution following the “Jarl” instruction. Note that the glue code is executed only the once during program execution because it is a waste of time to calculate the same special value more than once. Changing the

“Jarl” instruction to “J .skip” instruction can prevent the glue code being executed

again. Figure 3.8 shows the execution of glue code. Note that the “Jarl” instruction is changed to a “J .skip” instruction after the glue code is executed. A variation of glue code does not compute a special value; however, it is also executed only once—the first time it is encountered. This variation of glue code also needs patching as described above.

Due to the existing implementation of glue code (which was written in the assembly language for the 32-bit platform and always patched instructions at word-alignment), whenever a “Jarl” instruction may be patched by glue code, that

“Jarl” instruction must be word-aligned. In this case, a two-byte “nop16” instruction might be inserted before the “Jarl” instruction in order to satisfy the requirement of word-alignment. This is because, in the existing glue code, instructions are always assumed to be word-aligned while in our target platform (Andes) instructions may be half-word aligned. In the future, we plan to rewrite glue code. Then the two-byte

“nop16” instructions will become unnecessary. On the other hand, if the “Jarl”

instruction will not be patched by the glue code, we can choose either a 16-bit (for half-word aligned) or a 32-bit (for word aligned) “Jarl” instruction. Note that the four reserved bytes (i.e., “.word ____”) following the “Jarl” instruction must always be word-aligned. The flow chart is shown in Figure 3.9. The list is shown in Table 3.5.

3.3.3 Trap-based Null Checks

Every time VM references an new object, the object must be check is null or not.

While JIT wants to do null checks, a null-pointer trap will occurs, the return address (which is the address of the instruction immediately following the trapping instruction) will be saved in the link-pointer register (LP). If the trapping instruction is a 16-bit instruction, the return address is 2 plus the address of the trapping instruction. On the

other hand, if the trapping instruction is a 32-bit instruction, the return address is 4 plus the address of the trapping instruction. In Andes, an instruction is 16-bit if and only if the first (leftmost) bit of the instruction is 1. The flow chart is shown in Figure 3.10.

translation in this thesis. In the next chapter, we will use benchmarks to verify the Branch endPC

Jump . . StartPc:

Address Instruction Branch offset StartPc:

Address Instruction

CVMJITcbufPushFixup StartPc (Add patch point)

`

Figure 3.8 Patch a forward branch instruction

3.4 Summary

Our emitter will issue mixed 16-bit and 32-bit instructions in an attempt to reduce the resulting code size. Due to the alignment requirement in the existing JIT implementation, the emitter has to take care of the alignment of the issued instructions, adding “Nop” instructions when necessary. Because only some, but not all, registers can be used in 16-bit instructions, register allocations must be done carefully in order to generate more 16-bit instructions. We propose a simple heuristic for instruction

EndPc: Branch endPC

EndPc: Branch endPC

相關文件