• 沒有找到結果。

Key Ideas in Design

Chapter 2 Design of Proposed Architecture

2.2 Key Ideas in Design

In conventional architecture, BTB will update program counter in CPU when a predictive taken branch is found. On the contrary, because BTB is inside iAIM, it can not update program counter of CPU. Therefore, it is necessary to add at least one control signal line that iAIM uses to inform CPU of its branch prediction. Similarly, when branch prediction of BTB in iAIM is wrong or some situation like procedure return happens, CPU needs at least one control signal line to inform iAIM of actual branch result and provide correct PC value so that iAIM can supply correct instruction to CPU and do BTB maintenance. Figure 2.1 and Figure 2.2 show block diagrams of conventional architecture and iAIM design respectively.

Figure 2.1 Block diagram of conventional architecture

Figure 2.2 Block diagram of iAIM design

Key ideas to implement iAIM design are discussed as follows.

Firstly (Idea 1), iAIM must have instruction address automatic generation mechanism.

Because the philosophy of iAIM is to reduce instruction address traffic between CPU and instruction memory to a minimum, iAIM always tries to generate the next fetched instruction address by itself.

With help of BTB inside iAIM, the branch target address is supplied by BTB when a branch entry is found in BTB and its prediction is taken; otherwise, current used program counter value plus a word size is used at the next coming clock cycle.

Therefore, a PC incrementer that that adds a word size to the current PC value is necessary. Figure 2.3 shows automatic instruction address generator inside iAIM.

Figure 2.3 Automatic instruction address generator inside iAIM

Secondly (Idea 2), iAIM needs to inform CPU of branch prediction result.

In MIPS pipeline, when a branch entry is found in BTB of iAIM and its prediction is taken, on the next coming clock cycle, iAIM needs to assert a signal to inform CPU that the instruction address used is already replaced by branch target address, not the next sequential PC. At the end of the next coming clock cycle, CPU will resolve the result of branch and know the prediction is correct or not. If the prediction is not correct, CPU needs to take some action to force iAIM to use correct instruction address.

The proposed signal that iAIM uses to inform CPU is one control line called

“Predict Taken”, or “P-Taken” control line for brevity hereafter :

When a branch entry is found at current clock cycle, this signal is set to 1 at the next coming clock cycle; otherwise, it is set to 0. Figure 2.4 shows this control line that iAIM uses to inform CPU.

Figure 2.4 Control line that iAIM uses to inform CPU

Thirdly (Idea 3), CPU needs to force iAIM to use correct instruction address

when iAIM’s branch prediction is wrong.

In MIPS pipeline, when a branch instruction is resolved at ID stage in CPU, CPU will check if iAIM asserts “P-Taken” line to 1 or not at current clock cycle :

If the prediction is wrong, at the next coming clock cycle CPU will prepare correct instruction address on instruction address bus and inform iAIM of “Wrong Prediction” situation to indicate that branch prediction 2 clock cycles ago is wrong and the instruction address on instruction address bus should be used.

Fourthly (Idea 4), CPU always forces iAIM to use correct instruction address after changing target branch is resolved.

After changing target branch is resolved, CPU will prepare correct instruction address on instruction address bus and inform iAIM of “Compulsory” situation to indicate iAIM to use the instruction address on instruction address bus at current clock cycle.

Fifthly (Idea 5), CPU needs to inform iAIM of the pipeline stall situation.

When pipeline stall happens in conventional architecture, the same instruction address as the one used at last clock cycle will be sent to instruction memory. The reason that CPU needs to inform iAIM of “Pipeline Stall” situation is iAIM has its own instruction address auto-generation mechanism. Such mechanism should cease functioning when pipeline stall happens.

Sixthly (Idea 6), Idea 3, Idea 4 and Idea 5 deal with the situations that iAIM can not use the instruction address generated by its instruction address auto-generation mechanism. CPU needs inform iAIM of “Autonomous” situation to indicate iAIM to use the instruction address generated by its instruction address auto-generation mechanism. This situation also help do BTB maintenance when CPU finds branch prediction in iAIM.

Summarized from idea 3 to idea 6, there are 4 kinds of situations that CPU uses to inform iAIM. In situations of Idea 3 and Idea 4, CPU prepares the instruction address on instruction address bus, and iAIM is forced to use the instruction address on instruction address bus on. In situation of Idea 5, CPU freezes instruction address bus and iAIM uses the same the instruction address as

the one used at last clock cycle. In situation of Idea 6, CPU freezes instruction address bus and iAIM uses the instruction address generated by its instruction address auto-generation mechanism. Two control lines (called “Situation Indication” or “S-Indicate” control lines for brevity hereafter) can be used for CPU to inform iAIM of one of 4 kinds of situations at the beginning of every clock cycle :

00 for “Autonomous” situation, 01 for “Pipeline Stall” situation, 10 for “Wrong Prediction” situation, 11 for “Compulsory” situation.

Figure 2.5 shows S-Indicate control lines that CPU uses to inform iAIM.

Figure 2.5 S-Indicate control lines that CPU uses to inform iAIM

Seventhly (Idea 7), in order to maintain original BTB operation, two additional 34-bit registers organized as FIFO are necessary :

1. First 34-bit register that store information in iAIM 1 colck cycle ago consists of the following fields :

32-bit field that stores PC used 1 clock cycles ago (called “PCt-1” for brevity),

1bit field that stores branch entry found in BTB or not 1 clock cycle ago (called “InBTBt-1” for brevity),

1bit field that stores taken branch predicted by BTB or not 1 clock cycle ago (called “PTakent-1” for brevity).

2. Second 34-bit register that store information in iAIM 2 colck cycle ago consists of the following fields :

32-bit field that stores PC used 2 clock cycles ago (called “PCt-2” for brevity),

1bit field that stores branch entry found in BTB or not 2 clock cycle ago (called “InBTBt-2” for brevity),

1bit field that stores taken branch predicted by BTB or not 2 clock cycle ago (called “PTakent-2” for brevity).

If a branch instruction enters IF stage at the first clock, CPU will inform iAIM of either “Autonomous” or “Wrong Prediction” situation at the third clock cycle.

BTB operation in iAIM is the same as the description of section 1.1.3 in Chapter 1 : When CPU informs iAIM of “Wrong Prediction” situation at the third clock cycle, there are 2 cases :

Case 1 : InBTBt-2 is 1,

Use PCt-2 as index to do searching in BTB and update its “predictor”

field according to PTakent-2 :

If PTakent-2 is 1, update this field toward not-taken direction.

If PTakent-2 is 0, update this field toward taken direction.

Case 2 : InBTBt-2 is 0,

It means no such entry exists in BTB. Enter a new entry into BTB with its initial values listed as below :

“valid bit” field is set to 1,

“branch instruction address” field is set to PCt-2,

“branch target address” field is set to the value on instruction address bus,

“predictor” field is set to the initialized value according to adopted n-bit prediction scheme (it may be weakly-taken in 2 bit prediction scheme).

When CPU informs iAIM of “Autonomous” situation at the third clock cycle, there are 2 cases :

Case 1 : InBTBt-2 is 1,

Use PCt-2 as index to do searching in BTB and update its “predictor”

field according to PTakent-2 :

If PTakent-2 is 1, update this field toward taken direction.

If PTakent-2 is 0, update this field toward not-taken direction.

Case 2 : InBTBt-2 is 0,

Do nothing in BTB. Because a not-taken branch will not be entered into BTB if it does not exits before.

相關文件