EI 338: Computer Systems Engineering

(1)

EI 338: Computer Systems Engineering

(Operating Systems & Computer Architecture)

Dept. of Computer Science & Engineering Chentao Wu

[email protected]

(2)

Download lectures

• ftp://public.sjtu.edu.cn

• User: wuct

• Password: wuct123456

• http://www.cs.sjtu.edu.cn/~wuct/cse/

(3)

3

Appendix A

Instruction Set Principles

Computer Architecture

A Quantitative Approach, Fifth Edition

(4)

4

Outline



Instruction Set Architecture



5 stage pipelining



Structural and Data Hazards



Forwarding



Branch Schemes



Exceptions and Interrupts



Conclusion

(5)

Instruction Set Architecture



Instruction set architecture is the structure of a computer that a machine language

programmer must understand to write a

correct (timing independent) program for that machine.



The instruction set architecture is also the

machine description that a hardware designer must understand to design a correct

implementation of the computer.

(6)

Evolution of Instruction Sets

Single Accumulator (EDSAC 1950) Accumulator + Index Registers

(Manchester Mark I, IBM 700 series 1953) Separation of Programming Model

from Implementation

High-level Language Based Concept of a Family

(B5000 1963) (IBM 360 1964)

General Purpose Register Machines

Complex Instruction Sets Load/Store Architecture

RISC

(Vax, Intel 432 1977-80) (CDC 6600, Cray 1 1963-76)

(Mips,Sparc,HP-PA,IBM RS6000,PowerPC . . .1987) LIW/”EPIC”? (IA-64. . .1999)

(7)

Evolution of Instruction Sets



Major advances in computer architecture are typically associated with landmark instruction set designs

 Ex: Stack vs GPR (System 360)



Design decisions must take into account:

 technology

 machine organization

 programming languages

 compiler technology

 operating systems



And they in turn influence these

(8)

Instructions Can Be Divided into 3 Classes (I)



Data movement instructions

 Move data from a memory location or register to another memory location or register without changing its form

 Load—source is memory and destination is register

 Store—source is register and destination is memory



Arithmetic and logic (ALU) instructions

 Change the form of one or more operands to produce a result stored in another location

 Add, Sub, Shift, etc.



Branch instructions (control flow instructions)

 Alter the normal flow of control from executing the next instruction in sequence

 Br Loc, Brz Loc2,—unconditional or conditional branches

(9)

Classifying ISAs

Accumulator (before 1960):

1 address add A acc <- acc + mem[A]

Stack (1960s to 1970s):

0 address add tos <- tos + next

Memory-Memory (1970s to 1980s):

2 address add A, B mem[A] <- mem[A] + mem[B]

3 address add A, B, C mem[A] <- mem[B] + mem[C]

Register-Memory (1970s to present):

2 address add R1, A R1 <- R1 + mem[A]

load R1, A R1 <_ mem[A]

Register-Register (Load/Store) (1960s to present):

3 address add R1, R2, R3 R1 <- R2 + R3 load R1, R2 R1 <- mem[R2]

store R1, R2 mem[R1] <- R2

(10)

Classifying ISAs

(11)

Stack Architectures



Instruction set:

add, sub, mult, div, . . . push A, pop A



**Example: AB - (A+CB)**

push A push B mul push A push C push B mul add sub

A B

A

A*B

A*B A

A C

A*B

A A*B

A C B B*C A+B*C result

(12)

Stacks: Pros and Cons



Pros

 Good code density (implicit operand addressing top of stack)

 Low hardware requirements

 Easy to write a simpler compiler for stack architectures



Cons

 Stack becomes the bottleneck

 Little ability for parallelism or pipelining

 Data is not always at the top of stack when need, so additional instructions like TOP and SWAP are needed

 Difficult to write an optimizing compiler for stack architectures

(13)

Accumulator Architectures

• Instruction set:

add A, sub A, mult A, div A, . . . load A, store A

• Example: A*B - (A+C*B)

load B mul C add A store D load A mul B sub D

B B*C A+B*C A+B*C A A*B result

(14)

Accumulators: Pros and Cons

• Pros

– Very low hardware requirements – Easy to design and understand

• Cons

– Accumulator becomes the bottleneck

– Little ability for parallelism or pipelining

– High memory traffic

(15)

Memory-Memory Architectures

(3 operands) add A, B, C sub A, B, C mul A, B, C

– 3 operands

mul D, A, B

mul E, C, B

add E, A, E

sub E, D, E

(16)

Memory-Memory: Pros and Cons

• Pros

– Requires fewer instructions (especially if 3 operands) – Easy to write compilers for (especially if 3 operands)

• Cons

– Very high memory traffic (especially if 3 operands)

– Variable number of clocks per instruction (especially if

2 operands)

– With two operands, more data movements are required

(17)

Register-Memory Architectures

add R1, A sub R1, A mul R1, B load R1, A store R1, A

load R1, A

mul R1, B /* **A*B** */

store R1, D load R2, C

mul R2, B /* **C*B** */

add R2, A /* A + CB */

sub R2, D /* **AB - (A + C*B)** */

(18)

Memory-Register: Pros and Cons

• Pros

– Some data can be accessed without loading first – Instruction format easy to encode

– Good code density

• Cons

– Operands are not equivalent (poor orthogonality) – Variable number of clocks per instruction

– May limit number of registers

(19)

Load-Store Architectures

add R1, R2, R3 sub R1, R2, R3 mul R1, R2, R3 load R1, R4 store R1, R4

load R1, &A load R2, &B load R3, &C load R4, R1 load R5, R2 load R6, R3

mul R7, R6, R5 /* C*B */

add R8, R7, R4 /* A + C*B */

mul R9, R4, R5 /* A*B */

sub R10, R9, R8 /* A*B - (A+C*B) */

(20)

Load-Store: Pros and Cons

• Pros

– Simple, fixed length instruction encoding – Instructions take similar number of cycles – Relatively easy to pipeline

• Cons

– Higher instruction count

– Not all instructions need three operands

– Dependent on good compiler

(21)

Registers:

Advantages and Disadvantages

• Advantages

– Faster than cache (no addressing mode or tags) – Deterministic (no misses)

– Can replicate (multiple read ports) – Short identifier (typically 3 to 8 bits) – Reduce memory traffic

• Disadvantages

– Need to save and restore on procedure calls and context switch

– Can’t take the address of a register (for pointers)

– Fixed size (can’t store strings or structures efficiently) – Compiler must manage

(22)

General Register Machine and Instruction Formats

M e m ory

O p1 Addr: O p1 loa d

N e xti P rogra m

counte r

loa d R 8 , O p1 (R 8 ฌ O p1 ) C P U

R e giste rs

R 8

R 6

R 4

R 2

In struction form a ts

R 8

loa d O p1 A ddr

a dd R 2 , R 4 , R 6 (R 2 ฌ R 4 + R 6 ) R 2

a dd R 4 R 6

(23)

General Register Machine and Instruction Formats



It is the most common choice in today’s general-purpose computers



Which register is specified by small “address”

(3 to 6 bits for 8 to 64 registers)



Load and store have one long & one short address: One and half addresses



Arithmetic instruction has 3 “half” addresses

(24)

Real Machines Are Not So Simple



Most real machines have a mixture of 3, 2, 1, 0, and 1- address instructions



A distinction can be made on whether

arithmetic instructions use data from memory



If ALU instructions only use registers for operands and result, machine type is load- store

 Only load and store instructions reference memory



Other machines have a mix of register-

memory and memory-memory instructions

(25)

Alignment Issues

• If the architecture does not restrict memory accesses to be aligned then

– Software is simple

– Hardware must detect misalignment and make 2 memory accesses

– Expensive detection logic is required – All references can be made slower

• Sometimes unrestricted alignment is required for backwards compatibility

• If the architecture restricts memory accesses to be aligned then

– Software must guarantee alignment

– Hardware detects misalignment access and traps – No extra time is spent when data is aligned

• Since we want to make the common case fast, having restricted alignment is often a better choice, unless compatibility is an issue

(26)

Types of Addressing Modes (VAX)

1. Register direct Ri 2. Immediate (literal)#n

3. Displacement M[Ri + #n]

4. Register indirect M[Ri]

5. Indexed M[Ri + Rj]

6. Direct (absolute) M[#n]

7. Memory Indirect M[M[Ri] ] 8. Autoincrement M[Ri++]

9. Autodecrement M[Ri - -]

10. Scaled M[Ri + Rj*d + #n]

memory

reg. file

(27)

Summary of Use of Addressing

Modes

(28)

Distribution of Displacement Values

(29)

Frequency of Immediate Operands

(30)

Types of Operations



Arithmetic and Logic: AND, ADD



Data Transfer: MOVE, LOAD, STORE



Control BRANCH, JUMP, CALL



System OS CALL, VM



Floating Point ADDF, MULF, DIVF



Decimal ADDD, CONVERT



String MOVE, COMPARE



Graphics (DE)COMPRESS

(31)

Distribution of Data Accesses by Size

(32)

Relative Frequency of Control

Instructions

(33)

Control instructions (contd.)



Addressing modes



PC-relative addressing (independent of

program load & displacements are close by)

 Requires displacement (how many bits?)

 Determined via empirical study. [8-16 works!]



For procedure returns/indirect

jumps/kernel traps, target may not be known at compile time.

 Jump based on contents of register

 Useful for switch/(virtual) functions/function ptrs/dynamically linked libraries etc.

(34)

Branch Distances (in terms of

number of instructions)

(35)

Frequency of Different Types of

Compares in Conditional Branches

(36)

Encoding an Instruction set



a desire to have as many registers and addressing mode as possible



the impact of size of register and addressing

mode fields on the average instruction size and hence on the average program size



a desire to have instruction encode into

lengths that will be easy to handle in the

implementation

(37)

Three choice for encoding the

instruction set

(38)

Compilers and ISA



Compiler Goals



All correct programs compile correctly



Most compiled programs execute quickly



Most programs compile quickly



Achieve small code size



Provide debugging support



Multiple Source Compilers



Same compiler can compiler different languages



Multiple Target Compilers



Same compiler can generate code for different

machines

(39)

Compilers Phases

(40)

Compiler Based Register Optimization



Assume small number of registers (16-32)



Optimizing use is up to compiler



HLL programs have no explicit references to registers

 usually – is this always true?



Assign symbolic or virtual register to each candidate variable



Map (unlimited) symbolic registers to real registers



Symbolic registers that do not overlap can share real registers



If you run out of real registers some variables

use memory

(41)

Allocation of Variables

 Stack

 used to allocate local variables

 grown and shrunk on procedure calls and returns

 register allocation works best for stack-allocated objects

 Global data area

 used to allocate global variables and constants

 many of these objects are arrays or large data structures

 impossible to allocate to registers if they are aliased

 Heap

 used to allocate dynamic objects

 heap objects are accessed with pointers

 never allocated to registers

(42)

Designing ISA to Improve Compilation



Provide enough general purpose registers to ease register allocation ( more than 16).



Provide regular instruction sets by keeping the operations, data types, and addressing modes orthogonal.



Provide primitive constructs rather than trying to map to a high-level language.



Simplify trade-off among alternatives.



Allow compilers to help make the common

case fast.

(43)

ISA Metrics

 Orthogonality

 No special registers, few special cases, all operand modes available with any data type or instruction type

 Completeness

 Support for a wide range of operations and target applications

 Regularity

 No overloading for the meanings of instruction fields

 Streamlined Design

 Resource needs easily determined. Simplify tradeoffs.

 Ease of compilation (programming?), Ease of implementation, Scalability

(44)

Quick Review of

Design Space of ISA

Five Primary Dimensions

 Number of explicit operands ( 0, 1, 2, 3 )

 Operand Storage Where besides memory?

 Effective Address How is memory location specified?

 Type & Size of Operands byte, int, float, vector, . . . How is it specified?

 Operations add, sub, mul, . . . How is it specifed?

Other Aspects

 Successor How is it specified?

 Conditions How are they

determined?

 Encodings Fixed or variable? Wide?

 Parallelism

(45)

ISA Metrics

Aesthetics:

 Orthogonality

 No special registers, few special cases, all operand modes available with any data type or instruction type

 Completeness

 Support for a wide range of operations and target applications

 Regularity

 No overloading for the meanings of instruction fields

 Streamlined

 Resource needs easily determined

Ease of compilation (programming?) Ease of implementation

Scalability

(46)

A "Typical" RISC



32-bit fixed format instruction (3 formats)



32 32-bit GPR (R0 contains zero, Double Precision takes a register pair)



3-address, reg-reg arithmetic instruction



Single address mode for load/store:

base + displacement



no indirection



Simple branch conditions



Delayed branch

see: SPARC, MIPS, MC88100, AMD2900, i960, i860 PARisc, DEC Alpha, Clipper,

CDC 6600, CDC 7600, Cray-1, Cray-2, Cray-3

(47)

MIPS data types



Bytes



characters



Half-words



Short ints, OS related data-structures



Words



Single FP, Integers



Doublewords



Double FP, Long Integers (in some

implementations)

(48)

Instruction Layout for MIPS

(49)

MIPS (32 bit instructions)

Op

31 26 25 2120 16 15 0

Rs1 Rd Immediate

Op

31 26 25 0

Op

31 26 25 2120 16 15 0

Rs1 Rs2

target

Rd Opx

1. Register-Register

5 10 6

11

2a. Register-Immediate

Op

31 26 25 2120 16 15 0

Rs1 Rs2/Opx Displacement 2b. Branch (displacement)

3. Jump / Call

(50)

MIPS (addressing modes)



Register direct



Displacement



Immediate



Byte addressable & 64 bit address



R0  always contains value 0



Displacement = 0 register indirect



R0 + Displacement=0  absolute addressing

(51)

Types of Operations



Loads and Stores



ALU operations



Floating point operations



Branches and Jumps (control-related)

(52)

Load/Store Instructions

(53)

Sample ALU Instructions

(54)

Control Flow Instructions

(55)

(56)

56

Datapath vs Control

 Datapath: Storage, Functional Units, Interconnections sufficient to perform the desired functions

 Inputs are Control Points

 Outputs are signals

 Controller: State machine to orchestrate operation on the data path

 Based on desired function and signals

Datapath Controller

Control Points signals

(57)

57

Approaching an ISA

 Instruction Set Architecture

 Defines set of operations, instruction format, hardware supported data types, named storage, addressing modes, sequencing

 Meaning of each instruction is described by RTL (register transfer language) on architected registers and memory

 Given technology constraints, assemble adequate datapath

 Architected storage mapped to actual storage

 Function Units (FUs) to do all the required operations

 Possible additional storage (eg. Internal registers: MAR, MDR, IR,

…{Memory Address Register, Memory Data Register, Instruction Register}

 Interconnect to move information among registers and function units

 Map each instruction to a sequence of RTL operations

 Collate sequences into symbolic controller state transition diagram (STD)

 Lower symbolic STD to control points

 Implement controller

(58)

58

Homework

 A.1, A.5, A.7

EI 338: Computer Systems Engineering