• 沒有找到結果。

ALUMemoryReg File

N/A
N/A
Protected

Academic year: 2022

Share "ALUMemoryReg File"

Copied!
35
0
0

加載中.... (立即查看全文)

全文

(1)

Lecture 7: Midterm Review

•Quantitative principle of Computer Design

•ISA Design

•Pipeline

(2)

Quantitative Principle of Computer Design

(3)

Performance Terminology

 is n% faster than Y? means:

ExTime(Y) Performance(X) n --- = --- = 1 + ---

ExTime(X) Performance(Y) 100

n = 100(Performance(X) - Performance(Y)) Performance(Y)

Example: Y takes 15 seconds to complete a task,

X takes 10 seconds. What % faster is X?

(4)

Example

15

10 = 1.5

1.0 = Performance (X) Performance (Y) ExTime(Y)

ExTime(X) =

n = 100 (1.5 - 1.0)

1.0

n = 50%

(5)

Amdahl's Law

Speedup due to enhancement E:

ExTime w/o E Performance w/ E Speedup(E) = --- = ---

ExTime w/ E Performance w/o E

Suppose that enhancement E accelerates a fraction F of the task by a factor S, and the remainder of the task is unaffected, then:

ExTime(E) =

Speedup(E) =

(6)

Amdahl’s Law

ExTime

new

= ExTime

old

x (1 - Fraction

enhanced

) + Fraction

enhanced

Speedup

overall

=

ExTime

old

ExTime

new

Speedup

enhanced

= 1

(1 - Fraction

enhanced

) + Fraction

enhanced

Speedup

enhanced

(7)

Amdahl’s Law

• Floating point instructions improved to run 2X; but only 10% of actual instructions are FP

Speedup

overall

=

ExTime

new

=

(8)

Amdahl’s Law

• Floating point instructions improved to run 2X; but only 10% of actual instructions are FP

Speedup

overall

= 1 0.95

= 1.053

ExTime

new

= ExTime

old

x (0.9 + .1/2) = 0.95 x ExTime

old

(9)

Aspects of CPU Performance

CPU time = Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle CPU time = Seconds = Instructions x Cycles x Seconds

Program Program Instruction Cycle

Inst Count CPI Clock Rate

Program X

Compiler X

Inst. Set. X X

Organization X X

Technology X

(10)

Base Machine (Reg / Reg)

Op Freq Cycles

ALU 50% 1

Load 20% 2

Store 10% 2

Branch 20% 2

Typical Mix

Example

Add register / memory operations:

– One source operand in memory – One source operand in register – Cycle count of 2

Branch cycle count to increase to 3.

What fraction of the loads must be eliminated for this

to pay off?

(11)

Example Solution

Exec Time = Instr Cnt x CPI x Clock

OpFreq Cycles Freq Cycles

ALU .50 1 .5 .5 -X 1 .5 -X

Load .20 2 .4 .2 -X 2 .4 -2X

Store .10 2 .2 .1 2 .2

Branch .20 2 .3 .2 3 .6

Reg/Mem X 2 2X

1.00 1.5 1 -X (1.7 -X)/(1 -X)

Instr CntOld x CPIOld x ClockOld = Instr CntNewx CPINew x ClockNew 1.00 x 1.5 = (1 -X) x (1.7 -X)/(1-X)

1.5 = 1.7 -X 0.2 = X ALL loads must be eliminated for this to be a win!

(12)

IC cost = Die cost + Testing cost + Packaging cost Final test yield

Die cost = Wafer cost

Dies per Wafer * Die yield

Dies per wafer = Π* ( Wafer_diam / 2)2 Π * Wafer_diam Die Area (2 * Die Area)^(1/2)

Die Yield = Wafer yield * 1 + Defects_per_unit_area * Die_Area

α

Integrated Circuits Costs

Die Cost goes roughly with die area

4

( ) − α

-

(13)

Instruction Set Design

instruction set software

hardware

(14)

Basic ISA Classes

Accumulator:

add A acc ← acc + mem[A]

Stack:

add top ← top + next

General Purpose Register:

register-memory

add R1, A R1 = R1 + mem[A]

Load/Store:(register to register)

add Ra Rb Rc Ra ← Rb + Rc load Ra Rb Ra ← mem[Rb]

store Ra Rb mem[Rb] ← Ra

(15)

•Design goal

•simplify compilation of high-level languages

•optimize code size

•Variable format, 2 and 3 operand instruction

• Rich set of addressing modes (apply to any operand)

• Rich set of operations

• Rich set of data types (B, W, L, Q, O, F, D, G, H)

• Condition codes

• Examples: Vax, Intel

Problem: increase hardware design complexity!

Complex instruction set computer:

(16)

Reduced Instruction Set Architecture

• Instruction set simplicity leads to a faster machine

– efficient pipelining 32-bit fixed format instruction (3 formats)

• 32 32-bit GPR

• 3-operand, reg-reg arithmetic instruction

• Supporting very few addressing modes for load/store

– displacement – immediate

• Simple branch conditions

• Delayed branch

see: SPARC, MIPS, MC88100, AMD2900, i960, i860 PARisc, DEC Alpha, Clipper,

CDC 6600, CDC 7600, Cray-1, Cray-2, Cray-3

(17)

Example: DLX

Op

6

rs rd Immediate

I-type instruction

5 5 16

Load, store, conditional branch

example: load rd, mem(rs+immediate)

R-type instruction

Op

6

rs1 rs2 func

5 5 11

rd

5

Register-register ALU operations rd <- rs1 func rs2

6

Offset added to PC

26

J-type instruction

Op

Jump and jump-link

(18)

5 Steps of MIPS Datapath

Figure 3.4, Page 134 , CA:AQA 2e

Memory

Access Write Instruction Back

Fetch Instr. Decode

Reg. Fetch Execute Addr. Calc

ALU

Memory Reg File MUXMUX DataMemory MUX

ExtendSign

Zero?

IF/ID ID/EX MEM/WBEX/MEM

4

Adder Next SEQ PC Next SEQ PC

RD RD RD WB Data

Next PC

Address RS1

RS2

Imm

MUX

(19)

Visualizing Pipelining

Figure 3.3, Page 133 , CA:AQA 2e

In st r.

Or de r

Time (clock cycles)

Reg ALU DMem

Ifetch Reg

Reg ALU DMem

Ifetch Reg

Reg ALU DMem

Ifetch Reg

Reg ALU DMem

Ifetch Reg

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7

(20)

Ideal speedup from pipelining

• Idel case: an instruction is issued every cycle

– ideal pipelined CPI = 1

(21)

What is the reality?

• Pipeline overhead

– pipeline register delay and clock skew

– Clock Cycle

pipelined

could be larger than

Clock Cycleunpipelined

• Pipeline hazard

– prevent the CPU from issuing one instruction every cycle

(22)

Hazards

• Limits to pipelining: Hazards prevent next instruction from executing during its designated clock cycle

– Structural hazards: HW cannot support this combination of

instructions

– Data hazards: Instruction depends on result of prior instruction still

in the pipeline

– Control hazards: Pipelining of branches & other instructions

• Common solution is to stall the pipeline until the hazard

resolved.

(23)

How does hazard affect performance?

CPIpipelined = Ideal CPI + Pipeline stall clock cycles per instr Speedup = CPIunpipelined x Clock Cycleunpipelined

CPIpipelined x Clock Cyclepipelined

= Ideal CPI x Pipeline depth Clock Cycleunpipelined

Ideal CPI + Pipeline stall CPI Clock Cyclepipelined Speedup = Pipeline depth Clock Cycleunpipelined

1 + Pipeline stall CPI Clock Cyclepipelined x

x

(24)

Three Generic Data Hazards

• Read After Write (RAW)

– Add r1, r2, r3 – Add r1, r1, r2

• Write After Read (WAR)

– Add r2,r1,r3 – Add r1, r4, r5

• Write After Write (WAW)

– Add r1, r2, r3

– Add r1, r4, r5

(25)

Time (clock cycles)

Forwarding to Avoid Data Hazard

Figure 3.10, Page 149 , CA:AQA 2e

In st r.

Or de r

add r1,r2,r3 sub r4,r1,r3

and r6,r1,r7 or r8,r1,r9 xor r10,r1,r11

Reg ALU DMem

Ifetch Reg

Reg ALU DMem

Ifetch Reg

Reg ALU DMem

Ifetch Reg

Reg ALU DMem

Ifetch Reg

Reg ALU DMem

Ifetch Reg

(26)

Try producing fast code for a = b + c;

d = e -f;

assuming a, b, c, d ,e, and f in memory.

Slow code:

LW Rb,b LW Rc,c

ADD Ra,Rb,Rc SW a,Ra

LW Re,e LW Rf,f

SUB Rd,Re,Rf

SW d,Rd

Software Scheduling to Avoid Load Hazards

Fast code:

LW Rb,b LW Rc,c

LW Re,e ADD Ra,Rb,Rc LW Rf,f

SW a,Ra

SUB Rd,Re,Rf

SW d,Rd

(27)

Branch Operation

Figure 3.4, Page 134 , CA:AQA 2e

Memory

Access Write Instruction Back

Fetch Instr. Decode

Reg. Fetch Execute Addr. Calc

ALU

Memory Reg File MUXMUX DataMemory MUX

ExtendSign

Zero?

IF/ID ID/EX MEM/WBEX/MEM

4

Adder Next SEQ PC Next SEQ PC

RD RD RD WB Data

Next PC

Address RS1

RS2

Imm

MUX

(28)

Control Hazard on Branches Three Stage Stall

Branch Inst IF ID EX MEM WB

Br successor IF S S IF ID EX MEM WB

Br successor+1 IF ID EXE

(29)

Adder

IF/ID

New Pipelined DLX Datapath

Figure 3.22, page 163, CA:AQA 2/e

Memory

Access Write Instruction Back

Fetch Instr. Decode

Reg. Fetch Execute Addr. Calc

ALU

Memory Reg File MUX DataMemory MUX

ExtendSign

Zero?

MEM/WB

EX/MEM

4

Adder

Next SEQ PC

RD RD RD WB Data

Next PC

Address RS1

RS2

Imm

MUX ID/EX

(30)

Control Hazard on Branches One Stage Stall

Branch Inst IF ID EX MEM WB

Br successor IF IF ID EX MEM WB

Br successor+1 IF ID EXE

(31)

Four Branch Hazard Alternatives

#1: Stall until branch direction is clear

#2: Predict Branch Not Taken

– Execute successor instructions in sequence

– squash instructions in pipeline if branch actually taken

#3: Predict Branch Taken

- can’t implement in the DLX pipeline

(32)

Four Branch Hazard Alternatives

#4: Delayed Branch

– schedule instructions into branch-delay slots branch instruction

sequential successor1 sequential successor2 ...

sequential successorn branch target if taken

– 1 slot delay allows proper decision and branch target address in 5 stage pipeline

– DLX uses this

Branch delay of length n

(33)

Multi-Cycle DLX Pipe

IF ID

EX

M1 M2 ::::::::::::: M7

A1 A2 A3 A4

DIV

MEM WB

(34)

Instruction scheduling

MULTD IF ID M1 M2 M3 M4 M5 M6 M7 MEM WB ADDD IF ID A1 A2 A3 A4 MEM WB

LD IF ID EX MEM WB

SD IF ID EX MEM WB

Multi-cycle vs. single cycle DLX pipeline

• Introducing new hazards

• Instructions complete out of order

(35)

Precise Exceptions

inst 1 inst 2

::::

inst i-1

inst i <- faulting instruction inst i+1

inst i+2 ::::

complete

should not affect the machine state

Problem & Solution ?

參考文獻

相關文件

6G - Index and rates of change of the Composite CPI at section, class, group and principal subgroups levels of goods and services. 6A - Index and rates of change of CPI-A at

6A - Index and rate of change of CPI-A at section, class, group and principal subgroup levels 6B - Index and rate of change of CPI-B at section, class, group and principal

6A - Index and rate of change of CPI-A at section, class, group and principal subgroup levels 6B - Index and rate of change of CPI-B at section, class, group and principal

6A - Index and rate of change of CPI-A at section, class, group and principal subgroup levels 6B - Index and rate of change of CPI-B at section, class, group and principal

6A - Index and rate of change of CPI-A at section, class, group and principal subgroup levels 6B - Index and rate of change of CPI-B at section, class, group and principal

6A - Index and rate of change of CPI-A at section, class, group and principal subgroup levels 6B - Index and rate of change of CPI-B at section, class, group and principal

6A - Index and rate of change of CPI-A at section, class, group and principal subgroup levels 6B - Index and rate of change of CPI-B at section, class, group and principal

6A - Index and rate of change of CPI-A at section, class, group and principal subgroup levels 6B - Index and rate of change of CPI-B at section, class, group and principal