Procedure of Predecode Loop Buffer - Predecode Loop Buffer

CHAPTER 3. THE DESIGN OF PREDECODE LOOP BUFFER

3.2 Predecode Loop Buffer

3.2.4 Procedure of Predecode Loop Buffer

As above sections, every component of Predecode Loop Buffer is presented. In this section, we will introduce the procedure of Predecode Loop Buffer. The procedure includes four steps:

Step 1: Because the Execution Count of branch instruction smaller than threshold, Mode Controller will not change the value in mode register.

Step 2: When the Execution Count of branch instruction equals to the threshold, Mode Controller will set S-reg and Frequent Flag to one. The control signals of next executed instruction will be written into Predecode Loop Buffer.

Step 3: The Frequent Flag of branch instruction is one, so the next instruction is existed in Predecode Loop Buffer. Thus, Mode Controller sets the F-reg to one, and the next instruction can be accessed from Predecode Loop Buffer.

Step 4: When a miss is produced, it stands for the next instruction that must be fetched from Instruction Memory. In next section, we will present details about miss recovery mechanism of Predecode Loop Buffer.

ADD R3,R2,R1 ADD R6,R4,R5

Fig 3-16 shows a frequent loop in a program. When the BEQ instruction is decoded first time, the information of BEQ cannot be accessed from BIT. Because this is the first time to execute, the information of the BEQ are all zero. As soon as the decode stage is completed, these information are passed to EX1 stage. Because no branch prediction

mechanism is implemented in our processor, much design overhead can be reduced, Mode Controller only needs to wait the result of BEQ whether it is taken or not. If the BEQ is taken, Mode Controller will add Execution Counter by one and decide which mode should be written into mode registers. Otherwise, Mode Controller will pass the action of increasing Execution Counter. We assume the BEQ is taken. Because the BEQ is executed first time, the mode will not be changed. At the same time, the SUB instruction will be flushed by branch handling mechanism. Every time the BEQ comes before Execution Counter equals to the threshold, these actions will be repeated. When the value of Execution Counter equals to the threshold, Mode Controller sets S-reg to one in EX1 stage. Thus, the control signals of next instruction should be written into Predecode Loop Buffer. In addition to the control signals of ADD instruction, some of the control signals in Instruction Prefetch and Instruction Dispatch also need to be stored.

At this moment, storing these signals may have a problem. An instruction in our VLIW processor may be dispatched two times, because the instructions are stored in compressed form. The cases of the instruction that is dispatched are shown in Fig 3-17.

Instruction 1 Instruction 2 VLIW Instruction

Instruction 1 NOP Instruction 1 Instruction 2

NOP Instruction 2

Dispatch

(a)

(b)

Fig 3-17 The Cases of the Instruction

Instruction 1 Instruction 2

Index

Pipeline Latch

p-bit p-bit

change

Dir

Taken

Fig 3-18 The Hardware Solves the Instruction with Same Address

If the instruction 1 and instruction 2 can be executed in parallel, the last bit of instruction 1 is one. If the last bit of instruction 1 is zero, the two instructions cannot be executed in parallel. The last bit is called p-bit. In Fig 3-17 (b), storing control signal

would not produce any trouble because the two instructions can be executed in parallel.

In Fig 3-17 (a), storing control signal would produce a problem because the two instructions cannot be executed in parallel. That is because they have same PC values.

The instruction must be dispatched two times. Thus, when this instruction is fetched from Predecode Loop Buffer, we cannot decide which one is need.

To solve this problem, we add an XOR gate, an OR gate and an Index register. Fig 3-18 shows that the hardware solve that the instruction with same address. At first, we can decide if the instruction is a complete VLIW instruction or half through Dispatch stage according to p-bit. The p-bit of two instructions are connected to an OR gate. The output of OR gate is called change bit. Index register stores the value of direction of the VLIW instruction in EX1 stage. Then change bit and the output of Index register are connected to a XOR gate. Finally, the index of next VLIW instructions through Dispatch stage is the output of XOR gate.

The output would be computed in Decode stage and written into Index register in EX1 stage. If the value of Index register is 1, the direction of instruction is left.

Otherwise, the direction of instruction is right. Table 3-2 shows the relationship between p-bit and change bit. Table 3-3 shows the relationship between change bit and Index.

p-bit of Instruction 1 p-bit of Instruction2 Change

0 0 0

0 1 1

1 0 1

1 1 Not Used

Table 3-2 The Relationship between P-Bit and Change Bit

Change Index of this instruction Index of next instruction

0 0 1

0 1 0

1 0 0

1 1 1

Table 3-3 The Relationship between Change Bit and Index

After we solve the trouble of writing data to Predecode Loop Buffer, every instruction can be written and accessed at correct place. The next times of the BEQ that is executed, Mode Controller receives the data from BIT. Because the Frequent Flag is one, Mode Controller will set the F-reg to one in EX1 stage. The F-reg has the timing constraint. The constraint is the same as PC register timing constraint, because it is

possibly written by different stages. When the ADD instruction is executed again after F-reg is set to one, the current instruction can be fetched from Predecode Loop Buffer in Instruction Dispatch stage. The Instruction Prefetch and Dispatch stages can also be bypassed. The bypass mechanism can also be implemented with DeMux and Merge pair.

When S-reg is set to 1, Predecode Loop Buffer does not have information that we need. Otherwise, if F-reg is set to one, Predecode Loop Buffer already has data that we need. Finally, there are three cases that would change the value of F-reg to zero. First, a miss of Predecode Loop Buffer causes the information of instruction of loop that we need is lost. Second, the branch instruction is hot branch, and it is untaken at this time.

Third, there are other branch instructions to change Mode.

在文檔中非同步雙道超大指令字組處理器之預解碼迴圈緩衝器設計 (頁 50-56)