As a result, we focus on how the unit’s latency affects the critical path of bne :

a. This unit is not on the critical path, so changes to its latency do not affect the clock cycle time unless the latency of the unit becomes so large to create a new critical path through this unit, the branch add, and the PC Mux. The latency of this path is 230ps and it needs to be above 780ps, so the latency of the Add-4 unit needs to be more 650ps for it to be on the critical path.

b. This unit is not used by BNE nor by ADD, so it cannot affect the critical path for either instruction.

Solution 4.7

4.7.1 The longest-latency path for ALU operations is through I-Mem, Regs, Mux

(to select ALU operand), ALU, and Mux (to select value for register write). Note

that the only other path of interest is the PC-increment path through Add (PC + 4)

and Mux, which is much shorter. So for the I-Mem, Regs, Mux, ALU, Mux path we have:

a. 400ps + 200ps + 30ps + 120ps + 30ps = 780ps b. 500ps + 220ps + 100ps + 180ps + 100ps = 1100ps

4.7.2 The longest-latency path for lw is through I-Mem, Regs, Mux (to select ALU input), ALU, D-Dem, and Mux (to select what is written to register). The only other interesting paths are the PC-increment path (which is much shorter) and the path through Sign-extend unit in address computation instead of through Registers.

However, Regs has a longer latency than Sign-extend, so for I-Mem, Regs, Mux, ALU, D-Mem, and Mux path we have:

a. 400ps + 200ps + 30ps + 120ps + 350ps + 30ps = 1130ps b. 500ps + 220ps + 100ps + 180ps + 1000ps + 100ps = 2100ps

4.7.3 The answer is the same as in 4.7.2 because the lw instruction has the longest critical path. The longest path for sw is shorter by one Mux latency (no write to register), and the longest path for add or bne is shorter by one D-Mem latency.

4.7.4 The data memory is used by lw and sw instructions, so the answer is:

a. 20% + 10% = 30%

b. 35% + 15% = 50%

4.7.5 The sign-extend circuit is actually computing a result in every cycle, but its output is ignored for add and not instructions. The input of the sign-extend cir-cuit is needed for addi (to provide the immediate ALU operand), beq (to provide the PC-relative offset), and lw and sw (to provide the offset used in addressing memory) so the answer is:

a. 15% + 20% + 20% + 10% = 65%

b. 5% + 15% + 35% + 15% = 70%

4.7.6 The clock cycle time is determined by the critical path for the instruction that has the longest critical path. This is the lw instruction, and its critical path goes through I-Mem, Regs, Mux, ALU, D-Mem, and Mux so we have:

a. I-Mem has the longest latency, so we reduce its latency from 400ps to 360ps, making the clock cycle 40ps shorter. The speed-up achieved by reducing the clock cycle time is then 1130ps/

1090ps = 1.037

b. D-Mem has the longest latency, so we reduce its latency from 1000ps to 900ps, making the clock cycle 100ps shorter. The speed-up achieved by reducing the clock cycle time is then 2100ps/2000ps = 1.050

Solution 4.8

4.8.1 To test for a stuck-at-0 fault on a wire, we need an instruction that puts that wire to a value of 1 and has a different result if the value on the wire is stuck at zero:

a. Bit 7 of the instruction word is only used as part of an immediate/offset part of the instruction, so one way to test would be to execute ADDI $1, zero, 128 which is supposed to place a value of 128 into $1. If instruction bit 7 is stuck at zero, $1 will be zero because value 128 has all bits at zero except bit 7.

b. The only instructions that set this signal to 1 are loads. We can test by fi lling the data memory with zeros and executing a load instruction from a non-zero address, e.g., LW $1, 1024(zero).

After this instruction, the value in $1 is supposed to be zero. If the MemtoReg signal is stuck at 0, the value in the register will be 1024 (the Mux selects the ALU output (1024) instead of the value from memory).

4.8.2 The test for stuck-at-zero requires an instruction that sets the signal to 1 and the test for stuck-at-1 requires an instruction that sets the signal to 0. Because the signal cannot be both 0 and 1 in the same cycle, we cannot test the same signal simultaneously for stuck-at-0 and stuck-at-1 using only one instruction. The test for stuck-at-1 is analogous to the stuck-at-0 test:

a. We can use ADDI $1, zero, 0 which is supposed to put a value of 0 in $1. If Bit 7 of the instruction word is stuck at 1, the immediate operand becomes 128 and $1 becomes 128 instead of 0.

b. We cannot reliably test for this fault, because all instructions that set the MemtoReg signal to zero also set the ReadMem signal to zero. If one of these instructions is used as a test for MemtoReg stuck-at-1, the value written to the destination register is “random” (whatever noise is there at the data output of Data Memory). This value could be the same as the value already in the register, so if the fault exists the test may not detect it.

4.8.3

a. It is possible to work around this fault, but it is very diffi cult. We must fi nd all instructions that have zero in this bit of the offset or immediate operand and replace them with a sequence of

“safe” instruction. For example, a load with such an offset must be replaced with an instruction that subtracts 128 from the address register, then the load (with the offset larger by 128 to set bit 7 of the offset to 1), then subtract 128 from the address register.

b. We cannot work around this problem, because it prevents all instructions from storing their result in registers, except for load instructions. Load instructions only move data from memory to registers, so they cannot be used to emulate ALU operations “broken” by the fault.

4.8.4

a. If MemRead is stuck at 0, data memory is read for every instruction. However, for non-load instructions the value from memory is discarded by the Mux that selects the value to be written to the Register unit. As a result, we cannot design this kind of test for this fault, because the processor still operates correctly (although ineffi ciently).

b. To test for this fault, we need an instruction whose opcode is zero and MemRead is 1. However, instructions with a zero opcode are ALU operations (not loads), so their MemRead is 0. As a result, we cannot design this kind of test for this fault, because the processor operates correctly.

4.8.5

a. If Jump is stuck-at-1, every instruction updates the PC as if it were a jump instruction. To test for this fault, we can execute an ADDI with a non-zero immediate operand. If the Jump signal is stuck-at-1, the PC after the ADDI executes will not be pointing to the instruction that follows the ADDI.

b. To test for this fault, we need an instruction whose opcode is zero and Jump is 1. However, the opcode for the jump instruction is non-zero. As a result, we cannot design this kind of test for this fault, because the processor operates correctly.

4.8.6 Each single-instruction test “covers” all faults that, if present, result in dif-ferent behavior for the test instruction. To test for as many of these faults as possi-ble in a single instruction, we need an instruction that sets as many of these signals to a value that would be changed by a fault. Some signals cannot be tested using this single-instruction method, because the fault on a signal could still result in completely correct execution of all instruction that trigger the fault.

Solution 4.9

4.9.1

Binary Hexadecimal

a. 100011 00110 00001 0000000000101000 8CC10028 b. 000101 00001 00010 1111111111111111 1422FFFF

4.9.2

Read register 1 Actually read? Read register 2 Actually read?

a. 6 (00110_b) Yes 1 (00001_b) Yes (but not used)

b. 1 (00001_b) Yes 2 (00010_b) Yes

4.9.3

Read register 1 Register actually written?

a. 1 (00001_b) Yes

b. Either 2 (00010_b) of 31 (11111_b) (don’t know because RegDst is X)

4.9.4

Control signal 1 Control signal 2

a. RegDst = 0 MemRead = 1

b. RegWrite = 0 MemRead = 0

4.9.5 We use I31 through I26 to denote individual bits of Instruction[31:26], which is the input to the Control unit:

a. RegDst = NOT I31

b. RegWrite = (NOT I28 AND NOT I27) OR (I31 AND NOT I29)

4.9.6 If possible, we try to reuse some or all of the logic needed for one signal to help us compute the other signal at a lower cost:

a. RegDst = NOT I31

MemRead = I31 AND NOT I29 b. MemRead = I31 AND NOT I29

RegWrite = (NOT I28 AND NOT I27) OR MemRead

Solution 4.10

To solve problems in this exercise, it helps to fi rst determine the latencies of dif-ferent paths inside the processor. Assuming zero latency for the Control unit, the critical path is the path to get the data for a load instruction, so we have I-Mem, Mux, Regs, Mux, ALU, D-Mem and Mux on this path.

4.10.1 The Control unit can begin generating MemWrite only after I-Mem is

read. It must fi nish generating this signal before the end of the clock cycle. Note

that MemWrite is actually a write-enable signal for D-Mem fl ip-fl ops, and the

actual write is triggered by the edge of the clock signal, so MemWrite need not

arrive before that time. So the Control unit must generate the MemWrite in one clock cycle, minus the I-Mem access time:

Critical path Maximum time to generate MemWrite a. 400ps + 30ps + 200ps + 30ps +

120ps + 350ps + 30ps = 1160ps

1160ps – 400ps = 760ps

b. 500ps + 100ps + 220ps + 100ps + 180ps + 1000ps + 100ps = 2200ps

2200ps – 500ps = 1700ps

4.10.2 All control signals start to be generated after I-Mem read is complete. The most slack a signal can have is until the end of the cycle, and MemWrite and Reg-Write are both needed only at the end of the cycle, so they have the most slack.

The time to generate both signals without increasing the critical path is the one

在文檔中 1.1.1 Computer used to run large problems and usually accessed via a network: (頁 119-124)