• 沒有找到結果。

Design Restriction and Execution Examples

Chapter 2 Design of Proposed Architecture

2.6 Design Restriction and Execution Examples

Top-level instruction memory in proposed designs is assumed to have the same clock rate with CPU. Although not all sorts of memory are clock-aware, all self-managed multi-power mode memories are now equipped with clock signals. For example, DRAMs are clocked always. The only restriction in iAIM design is how to synchronize memory clock with CPU’s.

In order to illustrate the validity of iAIM design, a representative scenario of instruction execution is taken as an example :

There are a part of instructions in a program which comprise 2 branch instructions B1, B2 and other instructions S1, S2, S3, … In this execution scenario, B1 is not taken and B2 is taken.

B1 (address : 0x80000400, branch target : 0x80000a00, not-taken in this scenario) B2 (address : 0x80000404, branch target : 0x80000800, taken in this scenario)

There are 4 possible cases of instruction execution flow in iAIM design depending on BTB’s prediction :

1. B1 is predicted not-taken, B2 is predicted taken :

In this case, both B1 and B2 are correctly predicted by BTB in iAIM.

2. B1 is predicted not-taken, B2 is predicted not-taken :

In this case, B1 is correctly predicted but B2 is incorrectly predicted by BTB in iAIM. Penalty of 1 clock cycles is incurred.

3. B1 is predicted taken, B2 is predicted taken :

In this case, B1 is incorrectly predicted but B2 is correctly predicted by BTB in iAIM. Penalty of 1 clock cycles is incurred.

4. B1 is predicted taken, B2 is predicted not-taken :

In this case, B1 and B2 are all incorrectly predicted by BTB in iAIM. Penalty of 2 clock cycles is incurred.

Execution detail of iAIM design in the first case (B1 is predicted not-taken, B2 is predicted taken) is shown below :

Actions taken in CPU

1) Before the end of clock cycle, CPU resolves the instruction(address 0x800003FC) is not a branch instruction, CPU will set S-Indicate to 0b00 at next clock cycle.

2) Next PC used in CPU will be updated to current PC plus 4.

X 0x80000400 0x80000400 0 0 0

1) Because S-Indicate is set to 0b00, iAIM uses self-generated address as its PC.

2) Before the end of clock cycle, BTB predicts current PC is a not taken branch. iAIM will set P-Taken to 0 at next clock cycle.

3) Next self-generated PC in iAIM will be current PC plus 4.

1) Before the end of clock cycle, CPU resolves the instruction(address 0x80000400) is a not-taken branch. Because iAIM doesn’t set P-Taken to 1 at current clock cycle, CPU finds BTB prediction in iAIM at last clock cycle was correct. CPU will set S-Indicate to 0b00 at next clock cycle.

2) Next PC used in CPU will be updated to current PC plus 4.

X+1 0x80000404 0x80000404 0 0 0

1) Because S-Indicate is set to 0b00, iAIM uses self-generated address as its PC.

2) Before the end of clock cycle, BTB predicts current PC is a taken branch. iAIM will set P-Taken to 1 at next clock cycle.

3) Next self-generated PC in iAIM will be branch target address.

1) Before the end of clock cycle, CPU resolves the instruction(address 0x80000404) is a taken branch.

Because iAIM sets P-Taken to 1 at current clock cycle, CPU finds BTB prediction in iAIM at last clock cycle was correct. CPU will set S-Indicate to 0b00 at next clock cycle.

2) Next PC used in CPU will be updated to branch target address plus 4.

X+2 0x80000408 0x80000800 0 0 1

1) Because S-Indicate is set to 0b00, iAIM uses self-generated address as its PC.

2) iAIM updates an entry in BTB when a not-taken branch was found in BTB 2 clock cycles ago : source is PC used in iAIM 2 clock cycles ago, target is address on instruction address bus, direction is not-taken.

3) Before the end of clock cycle, BTB predicts current PC is not a branch. iAIM will set P-Taken to 0 at next clock cycle.

4) Next self-generated PC in iAIM will be current PC plus 4.

1) Before the end of clock cycle, CPU resolves the instruction(address 0x80000800) is not a branch.

CPU will set S-Indicate to 0b00 at next clock cycle.

2) Next PC used in CPU will be updated to current PC plus 4.

X+3 0x80000804 0x80000804 0 0 0

1) Because S-Indicate is set to 0b00, iAIM uses self-generated address as its PC.

2) iAIM updates an entry in BTB : source is PC used in iAIM 2 clock cycles ago, target is address on instruction address bus, direction is taken.

3) Before the end of clock cycle, BTB finds current PC is not a branch. iAIM will set P-Taken to 0 at next clock cycle.

4) Next self-generated PC in iAIM will be current PC plus 4.

Execution detail of iAIM design in the second case (B1 is predicted not-taken, B2 is predicted not-taken) is shown below :

Actions taken in CPU

1) Before the end of clock cycle, CPU resolves the instruction(address 0x800003FC) is not a branch instruction, CPU will set S-Indicate to 0b00 at next clock cycle.

2) Next PC used in CPU will be updated to current PC plus 4.

X 0x80000400 0x80000400 0 0 0

1) Because S-Indicate is set to 0b00, iAIM uses self-generated address as its PC.

2) Before the end of clock cycle, BTB predicts current PC is a not taken branch. iAIM will set P-Taken to 0 at next clock cycle.

3) Next self-generated PC in iAIM will be current PC plus 4.

1) Before the end of clock cycle, CPU resolves the instruction(address 0x80000400) is a not-taken branch. Because iAIM doesn’t set P-Taken to 1 at current clock cycle, CPU finds BTB prediction in iAIM at last clock cycle was correct. CPU will set S-Indicate to 0b00 at next clock cycle.

2) Next PC used in CPU will be updated to current PC plus 4.

X+1 0x80000404 0x80000404 0 0 0

1) Because S-Indicate is set to 0b00, iAIM uses self-generated address as its PC.

2) Before the end of clock cycle, BTB predicts current PC is a not-taken branch. iAIM will set P-Taken to 0 at next clock cycle.

3) Next self-generated PC in iAIM will be current PC plus 4.

X+2 0x80000408 0x80000408 0 0 0 1) Before the end of clock cycle, CPU resolves the instruction(address 0x80000404) is a not-taken branch. Because iAIM doesn’t set P-Taken to 1 at current clock cycle, CPU finds BTB prediction in iAIM at last clock cycle was incorrect. CPU will set S-Indicate to 0b10 at next clock cycle.

2) Next PC used in CPU will be updated to branch target address.

1) Because S-Indicate is set to 0b00, iAIM uses self-generated address as its PC.

2) iAIM updates an entry in BTB when a not-taken branch was found in BTB 2 clock cycles ago : source is PC used in iAIM 2 clock cycles ago, target is address on instruction address bus, direction is not-taken.

3) Before the end of clock cycle, BTB predicts current PC is not branch. iAIM will set P-Taken to 0 at next clock cycle.

4) Next self-generated PC in iAIM will be current PC plus 4.

1) iAIM starts fetching correct address at current cycle, there is no instruction to be decoded in ID stage of CPU. CPU will set S-Indicate to 0b00 at next clock cycle.

2) Next PC used in CPU will be updated to current PC plus address plus 4.

X+3 0x80000800 0x80000800 1 0 0

1) Because S-Indicate is set to 0b10, iAIM uses address on instruction address bus as its PC.

2) iAIM inserts an entry into BTB when PC used in iAIM 2 clock cycles ago was not found in BTB, updates a entry in BTB otherwise : source is PC used in iAIM 2 clock cycles ago, target is address on instruction address bus, direction is taken.

3) Before the end of clock cycle, BTB predicts current PC is not a branch. iAIM will set P-Taken to 0 at next clock cycle.

4) Next self-generated PC in iAIM will be current PC plus 4.

X+4 0x80000804 0x80000804 0 0 0 1) Before the end of clock cycle, CPU resolves the instruction(address 0x80000800) is not a branch.

CPU will set S-Indicate to 0b00 at next clock cycle.

2) Next PC used in CPU will be updated to current PC plus 4.

1) Because S-Indicate is set to 0b00, iAIM uses self-generated address as its PC.

2) Before the end of clock cycle, BTB finds current PC is not a branch. iAIM will set P-Taken to 0 at next clock cycle.

3) Next self-generated PC in iAIM will be current PC plus 4.

Execution detail of iAIM design in the third case (B1 is predicted taken, B2 is predicted taken) is shown below :

Actions taken in CPU

1) Before the end of clock cycle, CPU resolves the instruction(address 0x800003FC) is not a branch instruction, CPU will set S-Indicate to 0b00 at next clock cycle.

2) Next PC used in CPU will be updated to current PC plus 4.

X 0x80000400 0x80000400 0 0 0

1) Because S-Indicate is set to 0b00, iAIM uses self-generated address as its PC.

2) Before the end of clock cycle, BTB predicts current PC is a taken branch. iAIM will set P-Taken to 1 at next clock cycle.

3) Next self-generated PC in iAIM will be branch target address.

X+1 0x80000404 0x80000a00 0 0 1 1) Before the end of clock cycle, CPU resolves the instruction(address 0x80000400) is a not-taken branch. Because iAIM sets P-Taken to 1 at current clock cycle, CPU finds BTB prediction in iAIM at last clock cycle was incorrect. CPU will set S-Indicate to 0b10 at next clock cycle.

2) Next PC used in CPU will be the same with current PC.

1) Because S-Indicate is set to 0b00, iAIM uses self-generated address as its PC.

2) Before the end of clock cycle, BTB predicts current PC is not a branch. iAIM will set P-Taken to 0 at next clock cycle.

3) Next self-generated PC in iAIM will be current PC plus 4.

1) iAIM starts fetching correct address at current cycle, there is no instruction to be decoded in ID stage of CPU. CPU will set S-Indicate to 0b00 at next clock cycle.

2) Next PC used in CPU will be updated to current PC plus 4.

X+2 0x80000404 0x80000404 1 0 0

1) Because S-Indicate is set to 0b10, iAIM uses address on instruction address bus as its PC.

2) iAIM updates an entry in BTB : source is PC used in iAIM 2 clock cycles ago, target is address on instruction address bus, direction is not-taken.

3) Before the end of clock cycle, BTB predicts current PC is a taken branch. iAIM will set P-Taken to 1 at next clock cycle.

4) Next self-generated PC in iAIM will be branch target address.

1) Before the end of clock cycle, CPU resolves the instruction(address 0x80000404) is a taken branch.

Because iAIM does set P-Taken to 1 at current clock cycle, CPU finds BTB prediction in iAIM at last clock cycle was correct. CPU will set S-Indicate to 0b00 at next clock cycle.

2) Next PC used in CPU will be updated to branch target address plus 4.

X+3 0x80000408 0x80000800 0 0 1

1) Because S-Indicate is set to 0b00, iAIM uses self-generated address as its PC.

2) Before the end of clock cycle, BTB predicts current PC is not a branch. iAIM will set P-Taken to 0 at next clock cycle.

3) Next self-generated PC in iAIM will be current PC plus 4.

1) Before the end of clock cycle, CPU resolves the instruction(address 0x80000800) is not a branch.

CPU will set S-Indicate to 0b00 at next clock cycle.

2) Next PC used in CPU will be updated to current PC plus 4.

X+4 0x80000804 0x80000804 0 0 0

1) Because S-Indicate is set to 0b00, iAIM uses self-generated address as its PC.

2) iAIM updates an entry in BTB : source is PC used in iAIM 2 clock cycles ago, target is address on instruction address bus, direction is taken.

3) Before the end of clock cycle, BTB finds current PC is not a branch. iAIM will set P-Taken to 0 at next clock cycle.

4) Next self-generated PC in iAIM will be current PC plus 4.

Execution detail of iAIM design in the fourth case (B1 is predicted taken, B2 is predicted not-taken) is shown below :

Actions taken in CPU

1) Before the end of clock cycle, CPU resolves the instruction(address 0x800003FC) is not a branch instruction, CPU will set S-Indicate to 0b00 at next clock cycle.

2) Next PC used in CPU will be updated to current PC plus 4.

X 0x80000400 0x80000400 0 0 0

1) Because S-Indicate is set to 0b00, iAIM uses self-generated address as its PC.

2) Before the end of clock cycle, BTB predicts current PC is a taken branch. iAIM will set P-Taken to 1 at next clock cycle.

3) Next self-generated PC in iAIM will be branch target address.

X+1 0x80000404 0x80000a00 0 0 1 1) Before the end of clock cycle, CPU resolves the instruction(address 0x80000400) is a not-taken

clock cycle, CPU finds BTB prediction in iAIM at last clock cycle was incorrect. CPU will set S-Indicate to 0b10 at next clock cycle.

2) Next PC used in CPU will be the same with current PC.

1) Because S-Indicate is set to 0b00, iAIM uses self-generated address as its PC.

2) Before the end of clock cycle, BTB predicts current PC is not a branch. iAIM will set P-Taken to 0 at next clock cycle.

3) Next self-generated PC in iAIM will be current PC plus 4.

1) iAIM starts fetching correct address at current cycle, there is no instruction to be decoded in ID stage of CPU. CPU will set S-Indicate to 0b00 at next clock cycle.

2) Next PC used in CPU will be updated to current PC plus 4.

X+2 0x80000404 0x80000404 1 0 0

1) Because S-Indicate is set to 0b10, iAIM uses address on instruction address bus as its PC.

2) iAIM updates an entry in BTB : source is PC used in iAIM 2 clock cycles ago, target is address on instruction address bus, direction is not-taken.

3) Before the end of clock cycle, BTB predicts current PC is a not-taken branch. iAIM will set P-Taken to 0 at next clock cycle.

4) Next self-generated PC in iAIM will be current PC plus 4.

X+3 0x80000408 0x80000408 0 0 0 1) Before the end of clock cycle, CPU resolves the instruction(address 0x80000404) is a taken branch.

Because iAIM doesn’t set P-Taken to 1 at current clock cycle, CPU finds BTB prediction in iAIM at last clock cycle was incorrect. CPU will set S-Indicate to 0b10 at next clock cycle.

2) Next PC used in CPU will be updated to branch target address.

1) Because S-Indicate is set to 0b00, iAIM uses self-generated address as its PC.

2) Before the end of clock cycle, BTB predicts current PC is not a branch. iAIM will set P-Taken to 0 at next clock cycle.

3) Next self-generated PC in iAIM will be current PC plus 4.

1) iAIM starts fetching correct address at current cycle, there is no instruction to be decoded in ID stage of CPU. CPU will set S-Indicate to 0b00 at next clock cycle.

2) Next PC used in CPU will be updated to current PC plus 4.

X+4 0x80000800 0x80000800 1 0 0

1) Because S-Indicate is set to 0b10, iAIM uses address on instruction address bus as its PC.

2) iAIM inserts an entry into BTB when PC used in iAIM 2 clock cycles ago was not found in BTB, updates a entry in BTB otherwise : source is PC used in iAIM 2 clock cycles ago, target is address on instruction address bus, direction is taken.

3) Before the end of clock cycle, BTB predicts current PC is not a branch. iAIM will set P-Taken to 0 at next clock cycle.

4) Next self-generated PC in iAIM will be current PC plus 4.

1) Before the end of clock cycle, CPU resolves the instruction(address 0x80000800) is not a branch.

CPU will set S-Indicate to 0b00 at next clock cycle.

2) Next PC used in CPU will be updated to current PC plus 4.

X+5 0x80000804 0x80000804 0 0 0

1) Because S-Indicate is set to 0b00, iAIM uses self-generated address as its PC.

2) Before the end of clock cycle, BTB finds current PC is not a branch. iAIM will set P-Taken to 0 at next clock cycle.

3) Next self-generated PC in iAIM will be current PC plus 4.

相關文件