• 沒有找到結果。

EI 338: Computer Systems Engineering

N/A
N/A
Protected

Academic year: 2022

Share "EI 338: Computer Systems Engineering"

Copied!
58
0
0

加載中.... (立即查看全文)

全文

(1)

EI 338: Computer Systems Engineering

(Operating Systems & Computer Architecture)

Dept. of Computer Science & Engineering Chentao Wu

[email protected]

(2)

Download lectures

• ftp://public.sjtu.edu.cn

• User: wuct

• Password: wuct123456

• http://www.cs.sjtu.edu.cn/~wuct/cse/

(3)

3

Appendix A

Instruction Set Principles

Computer Architecture

A Quantitative Approach, Fifth Edition

(4)

4

Outline

Instruction Set Architecture

5 stage pipelining

Structural and Data Hazards

Forwarding

Branch Schemes

Exceptions and Interrupts

Conclusion

(5)

Instruction Set Architecture

Instruction set architecture is the structure of a computer that a machine language

programmer must understand to write a

correct (timing independent) program for that machine.

The instruction set architecture is also the

machine description that a hardware designer must understand to design a correct

implementation of the computer.

(6)

Evolution of Instruction Sets

Single Accumulator (EDSAC 1950) Accumulator + Index Registers

(Manchester Mark I, IBM 700 series 1953) Separation of Programming Model

from Implementation

High-level Language Based Concept of a Family

(B5000 1963) (IBM 360 1964)

General Purpose Register Machines

Complex Instruction Sets Load/Store Architecture

RISC

(Vax, Intel 432 1977-80) (CDC 6600, Cray 1 1963-76)

(Mips,Sparc,HP-PA,IBM RS6000,PowerPC . . .1987) LIW/”EPIC”? (IA-64. . .1999)

(7)

Evolution of Instruction Sets

Major advances in computer architecture are typically associated with landmark instruction set designs

Ex: Stack vs GPR (System 360)

Design decisions must take into account:

technology

machine organization

programming languages

compiler technology

operating systems

And they in turn influence these

(8)

Instructions Can Be Divided into 3 Classes (I)

Data movement instructions

Move data from a memory location or register to another memory location or register without changing its form

Load—source is memory and destination is register

Store—source is register and destination is memory

Arithmetic and logic (ALU) instructions

Change the form of one or more operands to produce a result stored in another location

Add, Sub, Shift, etc.

Branch instructions (control flow instructions)

Alter the normal flow of control from executing the next instruction in sequence

Br Loc, Brz Loc2,—unconditional or conditional branches

(9)

Classifying ISAs

Accumulator (before 1960):

1 address add A acc <- acc + mem[A]

Stack (1960s to 1970s):

0 address add tos <- tos + next

Memory-Memory (1970s to 1980s):

2 address add A, B mem[A] <- mem[A] + mem[B]

3 address add A, B, C mem[A] <- mem[B] + mem[C]

Register-Memory (1970s to present):

2 address add R1, A R1 <- R1 + mem[A]

load R1, A R1 <_ mem[A]

Register-Register (Load/Store) (1960s to present):

3 address add R1, R2, R3 R1 <- R2 + R3 load R1, R2 R1 <- mem[R2]

store R1, R2 mem[R1] <- R2

(10)

Classifying ISAs

(11)

Stack Architectures

Instruction set:

add, sub, mult, div, . . . push A, pop A

Example: A*B - (A+C*B)

push A push B mul push A push C push B mul add sub

A B

A

A*B

A*B

A*B

A*B A

A C

A*B

A A*B

A C B B*C A+B*C result

(12)

Stacks: Pros and Cons

Pros

Good code density (implicit operand addressing top of stack)

Low hardware requirements

Easy to write a simpler compiler for stack architectures

Cons

Stack becomes the bottleneck

Little ability for parallelism or pipelining

Data is not always at the top of stack when need, so additional instructions like TOP and SWAP are needed

Difficult to write an optimizing compiler for stack architectures

(13)

Accumulator Architectures

• Instruction set:

add A, sub A, mult A, div A, . . . load A, store A

• Example: A*B - (A+C*B)

load B mul C add A store D load A mul B sub D

B B*C A+B*C A+B*C A A*B result

(14)

Accumulators: Pros and Cons

• Pros

– Very low hardware requirements – Easy to design and understand

• Cons

– Accumulator becomes the bottleneck

– Little ability for parallelism or pipelining

– High memory traffic

(15)

Memory-Memory Architectures

• Instruction set:

(3 operands) add A, B, C sub A, B, C mul A, B, C

• Example: A*B - (A+C*B)

– 3 operands

mul D, A, B

mul E, C, B

add E, A, E

sub E, D, E

(16)

Memory-Memory: Pros and Cons

• Pros

– Requires fewer instructions (especially if 3 operands) – Easy to write compilers for (especially if 3 operands)

• Cons

– Very high memory traffic (especially if 3 operands)

– Variable number of clocks per instruction (especially if

2 operands)

– With two operands, more data movements are required

(17)

Register-Memory Architectures

• Instruction set:

add R1, A sub R1, A mul R1, B load R1, A store R1, A

• Example: A*B - (A+C*B)

load R1, A

mul R1, B /* A*B */

store R1, D load R2, C

mul R2, B /* C*B */

add R2, A /* A + CB */

sub R2, D /* AB - (A + C*B) */

(18)

Memory-Register: Pros and Cons

• Pros

– Some data can be accessed without loading first – Instruction format easy to encode

– Good code density

• Cons

– Operands are not equivalent (poor orthogonality) – Variable number of clocks per instruction

– May limit number of registers

(19)

Load-Store Architectures

• Instruction set:

add R1, R2, R3 sub R1, R2, R3 mul R1, R2, R3 load R1, R4 store R1, R4

• Example: A*B - (A+C*B)

load R1, &A load R2, &B load R3, &C load R4, R1 load R5, R2 load R6, R3

mul R7, R6, R5 /* C*B */

add R8, R7, R4 /* A + C*B */

mul R9, R4, R5 /* A*B */

sub R10, R9, R8 /* A*B - (A+C*B) */

(20)

Load-Store: Pros and Cons

• Pros

– Simple, fixed length instruction encoding – Instructions take similar number of cycles – Relatively easy to pipeline

• Cons

– Higher instruction count

– Not all instructions need three operands

– Dependent on good compiler

(21)

Registers:

Advantages and Disadvantages

• Advantages

– Faster than cache (no addressing mode or tags) – Deterministic (no misses)

– Can replicate (multiple read ports) – Short identifier (typically 3 to 8 bits) – Reduce memory traffic

• Disadvantages

– Need to save and restore on procedure calls and context switch

– Can’t take the address of a register (for pointers)

– Fixed size (can’t store strings or structures efficiently) – Compiler must manage

(22)

General Register Machine and Instruction Formats

M e m ory

O p1 Addr: O p1 loa d

N e xti P rogra m

counte r

loa d R 8 , O p1 (R 8 ฌ O p1 ) C P U

R e giste rs

R 8

R 6

R 4

R 2

In struction form a ts

R 8

loa d O p1 A ddr

a dd R 2 , R 4 , R 6 (R 2 ฌ R 4 + R 6 ) R 2

a dd R 4 R 6

(23)

General Register Machine and Instruction Formats

It is the most common choice in today’s general-purpose computers

Which register is specified by small “address”

(3 to 6 bits for 8 to 64 registers)

Load and store have one long & one short address: One and half addresses

Arithmetic instruction has 3 “half” addresses

(24)

Real Machines Are Not So Simple

Most real machines have a mixture of 3, 2, 1, 0, and 1- address instructions

A distinction can be made on whether

arithmetic instructions use data from memory

If ALU instructions only use registers for operands and result, machine type is load- store

Only load and store instructions reference memory

Other machines have a mix of register-

memory and memory-memory instructions

(25)

Alignment Issues

• If the architecture does not restrict memory accesses to be aligned then

– Software is simple

– Hardware must detect misalignment and make 2 memory accesses

– Expensive detection logic is required – All references can be made slower

• Sometimes unrestricted alignment is required for backwards compatibility

• If the architecture restricts memory accesses to be aligned then

– Software must guarantee alignment

– Hardware detects misalignment access and traps – No extra time is spent when data is aligned

• Since we want to make the common case fast, having restricted alignment is often a better choice, unless compatibility is an issue

(26)

Types of Addressing Modes (VAX)

1. Register direct Ri 2. Immediate (literal)#n

3. Displacement M[Ri + #n]

4. Register indirect M[Ri]

5. Indexed M[Ri + Rj]

6. Direct (absolute) M[#n]

7. Memory Indirect M[M[Ri] ] 8. Autoincrement M[Ri++]

9. Autodecrement M[Ri - -]

10. Scaled M[Ri + Rj*d + #n]

memory

reg. file

(27)

Summary of Use of Addressing

Modes

(28)

Distribution of Displacement Values

(29)

Frequency of Immediate Operands

(30)

Types of Operations

Arithmetic and Logic: AND, ADD

Data Transfer: MOVE, LOAD, STORE

Control BRANCH, JUMP, CALL

System OS CALL, VM

Floating Point ADDF, MULF, DIVF

Decimal ADDD, CONVERT

String MOVE, COMPARE

Graphics (DE)COMPRESS

(31)

Distribution of Data Accesses by Size

(32)

Relative Frequency of Control

Instructions

(33)

Control instructions (contd.)

Addressing modes

PC-relative addressing (independent of

program load & displacements are close by)

Requires displacement (how many bits?)

Determined via empirical study. [8-16 works!]

For procedure returns/indirect

jumps/kernel traps, target may not be known at compile time.

Jump based on contents of register

Useful for switch/(virtual) functions/function ptrs/dynamically linked libraries etc.

(34)

Branch Distances (in terms of

number of instructions)

(35)

Frequency of Different Types of

Compares in Conditional Branches

(36)

Encoding an Instruction set

a desire to have as many registers and addressing mode as possible

the impact of size of register and addressing

mode fields on the average instruction size and hence on the average program size

a desire to have instruction encode into

lengths that will be easy to handle in the

implementation

(37)

Three choice for encoding the

instruction set

(38)

Compilers and ISA

Compiler Goals

All correct programs compile correctly

Most compiled programs execute quickly

Most programs compile quickly

Achieve small code size

Provide debugging support

Multiple Source Compilers

Same compiler can compiler different languages

Multiple Target Compilers

Same compiler can generate code for different

machines

(39)

Compilers Phases

(40)

Compiler Based Register Optimization

Assume small number of registers (16-32)

Optimizing use is up to compiler

HLL programs have no explicit references to registers

usually – is this always true?

Assign symbolic or virtual register to each candidate variable

Map (unlimited) symbolic registers to real registers

Symbolic registers that do not overlap can share real registers

If you run out of real registers some variables

use memory

(41)

Allocation of Variables

Stack

used to allocate local variables

grown and shrunk on procedure calls and returns

register allocation works best for stack-allocated objects

Global data area

used to allocate global variables and constants

many of these objects are arrays or large data structures

impossible to allocate to registers if they are aliased

Heap

used to allocate dynamic objects

heap objects are accessed with pointers

never allocated to registers

(42)

Designing ISA to Improve Compilation

Provide enough general purpose registers to ease register allocation ( more than 16).

Provide regular instruction sets by keeping the operations, data types, and addressing modes orthogonal.

Provide primitive constructs rather than trying to map to a high-level language.

Simplify trade-off among alternatives.

Allow compilers to help make the common

case fast.

(43)

ISA Metrics

Orthogonality

No special registers, few special cases, all operand modes available with any data type or instruction type

Completeness

Support for a wide range of operations and target applications

Regularity

No overloading for the meanings of instruction fields

Streamlined Design

Resource needs easily determined. Simplify tradeoffs.

Ease of compilation (programming?), Ease of implementation, Scalability

(44)

Quick Review of

Design Space of ISA

Five Primary Dimensions

Number of explicit operands ( 0, 1, 2, 3 )

Operand Storage Where besides memory?

Effective Address How is memory location specified?

Type & Size of Operands byte, int, float, vector, . . . How is it specified?

Operations add, sub, mul, . . . How is it specifed?

Other Aspects

Successor How is it specified?

Conditions How are they

determined?

Encodings Fixed or variable? Wide?

Parallelism

(45)

ISA Metrics

Aesthetics:

Orthogonality

No special registers, few special cases, all operand modes available with any data type or instruction type

Completeness

Support for a wide range of operations and target applications

Regularity

No overloading for the meanings of instruction fields

Streamlined

Resource needs easily determined

Ease of compilation (programming?) Ease of implementation

Scalability

(46)

A "Typical" RISC

32-bit fixed format instruction (3 formats)

32 32-bit GPR (R0 contains zero, Double Precision takes a register pair)

3-address, reg-reg arithmetic instruction

Single address mode for load/store:

base + displacement

no indirection

Simple branch conditions

Delayed branch

see: SPARC, MIPS, MC88100, AMD2900, i960, i860 PARisc, DEC Alpha, Clipper,

CDC 6600, CDC 7600, Cray-1, Cray-2, Cray-3

(47)

MIPS data types

Bytes

characters

Half-words

Short ints, OS related data-structures

Words

Single FP, Integers

Doublewords

Double FP, Long Integers (in some

implementations)

(48)

Instruction Layout for MIPS

(49)

MIPS (32 bit instructions)

Op

31 26 25 2120 16 15 0

Rs1 Rd Immediate

Op

31 26 25 0

Op

31 26 25 2120 16 15 0

Rs1 Rs2

target

Rd Opx

1. Register-Register

5 10 6

11

2a. Register-Immediate

Op

31 26 25 2120 16 15 0

Rs1 Rs2/Opx Displacement 2b. Branch (displacement)

3. Jump / Call

(50)

MIPS (addressing modes)

Register direct

Displacement

Immediate

Byte addressable & 64 bit address

R0  always contains value 0

Displacement = 0 register indirect

R0 + Displacement=0  absolute addressing

(51)

Types of Operations

Loads and Stores

ALU operations

Floating point operations

Branches and Jumps (control-related)

(52)

Load/Store Instructions

(53)

Sample ALU Instructions

(54)

Control Flow Instructions

(55)
(56)

56

Datapath vs Control

Datapath: Storage, Functional Units, Interconnections sufficient to perform the desired functions

Inputs are Control Points

Outputs are signals

Controller: State machine to orchestrate operation on the data path

Based on desired function and signals

Datapath Controller

Control Points signals

(57)

57

Approaching an ISA

Instruction Set Architecture

Defines set of operations, instruction format, hardware supported data types, named storage, addressing modes, sequencing

Meaning of each instruction is described by RTL (register transfer language) on architected registers and memory

Given technology constraints, assemble adequate datapath

Architected storage mapped to actual storage

Function Units (FUs) to do all the required operations

Possible additional storage (eg. Internal registers: MAR, MDR, IR,

…{Memory Address Register, Memory Data Register, Instruction Register}

Interconnect to move information among registers and function units

Map each instruction to a sequence of RTL operations

Collate sequences into symbolic controller state transition diagram (STD)

Lower symbolic STD to control points

Implement controller

(58)

58

Homework

A.1, A.5, A.7

參考文獻

相關文件

• The memory storage unit is where instructions and data are held while a computer program is running.. • A bus is a group of parallel wires that transfer data from one part of

Full ascending (FA) LDMFA LDMDA STMFA STMIB Full descending (FD) LDMFD LDMIA STMFD STMDB Full descending (FD) LDMFD LDMIA STMFD STMDB Empty ascending (EA) LDMEA LDMDB STMEA STMIA E t

bgez Branch on greater than or equal to zero bltzal Branch on less than zero and link. bgezal Branch on greter than or equal to zero

 Machine language ( = instruction set) can be viewed as a programmer- oriented abstraction of the hardware platform.  The hardware platform can be viewed as a physical means

 Machine language ( = instruction set) can be viewed as a programmer- oriented abstraction of the hardware platform.  The hardware platform can be viewed as a physical means

a single instruction.. Thus, the operand can be modified before it can be modified before it is used. Useful for fast multipliation and dealing p g with lists, table and other

2.1 a Number of schools by medium of instruction, system of school and level of instruction 2.2 a Number of schools by supervisory body/individual, system of school and level

• A conditional jump instruction branches to a label when specific register or flag conditions are met.