Example for explain predictive instruction address

Chapter 2 Design of Proposed Architecture

2.3 Design of Processor

2.3.2 Example for explain predictive instruction address

Now, take some example to explain more clearly. First, show you the timing diagram of initialization.

Figure 2.9 the diagram of Pipeline: there has 1 circuit delay, 2 wakeup time

In fig 2.9, the left blocks are represents the address executes in instruction. The middles represent the pipeline stage. And the right blocks is represents the address in PCL_FIFO, left is the bottom of FIFO and right is top of FIFO.

About S1, S2… and SX, the ‘S’ represent the sequential instruction. About W1 and W2, the ‘W’ represents wakeup time. Therefore, “W1” represents the cache line of the address referred is preactivated 1 cycle. About “C1”, it represents one cycle circuit delay.

The circuit delay is the latency of extra mechanism in instruction cache side, and explains on next chapter in detail.

Now, start to explain

1. First cycle (first row) of fig 2.9. First, “S1” is transmitted from CPU side

through Instruction Address Bus to the bottom of “PCL_FIFO”. And the top of

“PCL_FIFO” is NULL. Therefore, the pipeline is idol.

2. Second cycle (second row) of fig 2.9. First, “S2” is transmitted from CPU side through Instruction Address Bus to the bottom of “PCL_FIFO”. So, “S1” pop one level to “W2”, it is say the Cache line that “S1” referred is preactivated one clock. And the top of “PCL_FIFO” is NULL. Therefore, the pipeline is idol.

3. Similar as 2.

4. Forth cycle (forth row) of fig 2.9. First, “S4” is transmitted from CPU side through Instruction Address Bus to the bottom of “PCL_FIFO”. At the moment, “S1” pop to the top of “PCL_FIFO”, it is say the Cache line that “S1”

referred is preactivated 3 clocks and the Cache line is active. And the top of

“PCL_FIFO” is not NULL. Therefore, the pipeline isn’t idol, the “S1” is fetched into “IF” stage.

From this example, it let us know there will be three unit of latency when the timing of initialization.

Continuously, introduce the timing of predictive target address. (Refer to the fig 2.10)

Figure 2.10 Timing diagram of predictive target:

There has 1 circuit delay, 2 wakeup times

In fig 2.10, about B2, the ‘B’ represents the Branch instruction. About T2, the ‘T’

represents the Target instruction.

Now, start to explain from forth cycle:

1. Forth cycle (forth row) of fig 2.10. First, “B2” is transmitted from CPU side through Instruction Address Bus to the bottom of “PCL_FIFO”. And assume the top of “PCL_FIFO” doesn’t NULL. Therefore, the pipeline isn’t idol.

2. Fifth cycle (Fifth row) of fig 2.10. Because “B2” meets, therefore, the next address is predicted as “T2” due to the prediction of “Dynamic Branch Predictor”. So, “T2” is transmitted from CPU side through Instruction Address Bus to the bottom of “PCL_FIFO”. So, “B2” pop one level to “W2”. And assume the top of “PCL_FIFO” doesn’t NULL. Therefore, the pipeline isn’t idol.

3. Assume sixth cycle and seventh cycle don’t meet Branch instruction.

Therefore, the “PCL_FIFO” is just pop one element each clock.

4. Eighth cycle (eighth row) of fig 2.10. At the moment, “T2” pop to the top of

“PCL_FIFO”, it is say the Cache line that “T2” referred is preactivated 3 clocks and the Cache line is active. Therefore, the pipeline isn’t idol, the “T2”

is fetched into “IF” stage.

From this example, it let us know where will not be occurred any performance loss when the prediction of “Dynamic Branch Predictor” is correct.

Continuously, introduce the timing of predictive target address. (Refer to the fig 2.11)

Figure 2.11 Timing diagram of wrong prediction

Now, start to explain from first cycle:

1. First cycle (first row) of fig 2.11. Because “B1” meets, therefore, the next address is predicted as “S” due to the prediction of “Dynamic Branch Predictor”. So, “S” is transmitted from CPU side through Instruction Address Bus to the bottom of “PCL_FIFO”. And assume the top of “PCL_FIFO”

doesn’t NULL. Therefore, the pipeline isn’t idol.

2. Assume second cycle and third cycle don’t meet Branch instruction. Therefore, the “PCL_FIFO” is just pop one element each clock.

3. Forth cycle (Forth row) of fig 2.11. At the moment, “S” pop to the top of

“PCL_FIFO”, it is say the Cache line that “S” referred is preactivated 3 clocks and the Cache line is active. Therefore, the “S” is fetched into “IF” stage. After the calculation of “B1” on “ID” stage, the calculated address isn’t same as the address of “S”. Therefore, the status of “IF” stage will be refreshed. And this penalty is original architecture. On the other hand, the whole “PCL_FIFO” will be cleared when each “Wrong Prediction” occurs.

4. 5^th cycle (5^th row) of fig 2.11. “T1” is transmitted from CPU side to I-Cache side through Instruction Address Bus. And “T1” is transmitted to the bottom of

“PCL_FIFO” when I-Cache received the address form Instruction Address Bus.

And the top of “PCL_FIFO” is NULL. Therefore, the pipeline is idol.

5. 6^th and 7^th cycle are similar. Assume there don’t meet Branch instruction., therefore, the “PCL_FIFO” is just pop one element each clock. And the top of

“PCL_FIFO” is NULL. Therefore, the pipeline is idol.

6. Eighth cycle (eighth row) of fig 2.11. At the moment, “T1” pop to the top of

“PCL_FIFO”, it is say the Cache line that “T1” referred is preactivated 3 clocks and the Cache line is active. Therefore, the pipeline isn’t idol, the “T1”

is fetched into “IF” stage.

From this example, it let us know where will be occurred performance loss when the prediction of “Dynamic Branch Predictor” is wrong. And there will be 1 penalty of

“Wrong Prediction” and additional latency according to wakeup time and circuit delay.

在文檔中指令快取記憶體的電源管理-(對程式流程有感知能力的昏睡指令記憶體) (頁 33-37)