• 沒有找到結果。

decode I/ODEVICEI/ODEVICE

N/A
N/A
Protected

Academic year: 2022

Share "decode I/ODEVICEI/ODEVICE"

Copied!
22
0
0

加載中.... (立即查看全文)

全文

(1)

IA-32 Architecture

Computer Organization and Assembly Languages Yung-Yu Chuang

2005/10/6

with slides by Kip Irvine and Keith Van Rhein

Virtual machines

High-Level Language

Assembly Language

Operating System

Instruction Set Architecture

Microarchitecture

Digital Logic Level 0 Level 1 Level 2 Level 3 Level 4 Level 5

Abstractions for computers

Truth tables

• Example: (Y ∧ S) ∨ (X ∧ ¬S) mux

X Y

S

Z

Two-input multiplexer

Combinational logic

(2)

Sequential logic

IN REG OUT

EN(RD)

WR

COUNTER

IN OUT

EN(RD)

INC

register counter

WR

Memory

8K 8-bit memory

Virtual machines

High-Level Language

Assembly Language

Operating System

Instruction Set Architecture

Microarchitecture

Digital Logic Level 0 Level 1 Level 2 Level 3 Level 4 Level 5

Abstractions for computers

Instruction set

OPCODE MNEMONIC OPCODE MNEMONIC 0 NOP A CMP addr 1 LDA addr B JG addr 2 STA addr C JE addr 3 ADD addr D JL addr 4 SUB addr

5 IN port 6 OUT port 7 JMP addr 8 JN addr 9 HLT

OPCODE OPERAND

4 12

(3)

Virtual machines

High-Level Language

Assembly Language

Operating System

Instruction Set Architecture

Microarchitecture

Digital Logic Level 0 Level 1 Level 2 Level 3 Level 4 Level 5

Abstractions for computers

Basic microcomputer design

• clock synchronizes CPU operations

• control unit (CU) coordinates sequence of execution steps

• ALU performs arithmetic and logic operations

Central Processor Unit (CPU)

Memory Storage Unit registers

ALU clock

I/O Device

#1

I/O Device

#2 data bus

control bus address bus

CU

Basic microcomputer design

• The memory storage unit holds instructions and data for a running program

• A bus is a group of wires that transfer data from one part to another (data, address, control)

Central Processor Unit (CPU)

Memory Storage Unit registers

ALU clock

I/O Device

#1

I/O Device

#2 data bus

control bus address bus

CU

Clock

• synchronizes all CPU and BUS operations

• machine (clock) cycle measures time of a single operation

• clock is used to trigger events

one cycle 1

0

• Basic unit of time, 1GHz→clock cycle=1ns

• A instruction could take multiple cycles to complete, e.g. multiply in 8088 takes 50 cycles

(4)

Instruction execution cycle

Fetch

Decode

• Fetch operands

Execute

• Store output

I-1 I-2 I-3 I-4

PC program

I-1 instruction register op1

op2

memory fetch

ALU registers

write decode

execute read

write

(output)

registers

flags

program counter

instruction queue

A simple microcomputer

IR

DECODE

CONTROL AND SEQUENCING

PC

ACC B

ALU

CLOCK

I/ODEVICE I/ODEVICE DATA BUS

CONTROL BUS ADDRESS BUS

MEMORY I/O

PORT

FLAG

ALU

ALU and Flag

X Y

N C G E L

16-bit subtractor

X Y

16 16

16

Z-

16 16

16

16-bit adder

Z+ Cout

16-bit comparator

X Y

X>Y X=Y X<Y

16 16

16

Z

Z15

2-MUX 0

ALUOP 1 00 NOP 01 CMP 10 ADD 11 SUB

Flag

Flags

N C G E L

PC 4-MUX

0 1 2 3

FLAGRD

PCWR

WR

ADDRESS BUS RD

INC PCRD

PCINC

FLAGOP

(5)

Control signals (20 in total)

IR

DECODE

CONTROL AND SEQUENCING

PC

ACC B

ALU

CLOCK

I/ODEVICE I/ODEVICE DATA BUS

CONTROL BUS ADDRESS BUS

MEMORY I/O

PORT

FLAG RD

WR

WR RD WR RD

WRRDINC

WR RD WR RD OP

OP RD RD

WR

LDA (execution cycle 1): IR

RD

IR

DECODE

CONTROL AND SEQUENCING

PC

ACC B

ALU

CLOCK

I/ODEVICE I/ODEVICE DATA BUS

CONTROL BUS ADDRESS BUS

MEMORY I/O

PORT

FLAG RD

LDA (execution cycle 2): MEM

RD

IR

DECODE

CONTROL AND SEQUENCING

PC

ACC B

ALU

CLOCK

I/ODEVICE I/ODEVICE DATA BUS

CONTROL BUS ADDRESS BUS

MEMORY I/O

PORT

FLAG

RD

LDA (execution cycle 3): ACC

WR

IR

DECODE

CONTROL AND SEQUENCING

PC

ACC B

ALU

CLOCK

I/ODEVICE I/ODEVICE DATA BUS

CONTROL BUS ADDRESS BUS

MEMORY I/O

PORT

FLAG WR

(6)

ALU

ALU and Flag

X Y

N C G E L

16-bit subtractor

X Y

16 16

16

Z-

16 16

16

16-bit adder

Z+ Cout

16-bit comparator

X Y

X>Y X=Y X<Y

16 16

16

Z

Z15

2-MUX 0

ALUOP 1 00 NOP 01 CMP 10 ADD 11 SUB

Flag

ADD (execution cycle 1): IR

RD

IR

DECODE

CONTROL AND SEQUENCING

PC

ACC B

ALU

CLOCK

I/ODEVICE I/ODEVICE DATA BUS

CONTROL BUS ADDRESS BUS

MEMORY I/O

PORT

FLAG RD

ADD (execution cycle 2): MEM

RD

IR

DECODE

CONTROL AND SEQUENCING

PC

ACC B

ALU

CLOCK

I/ODEVICE I/ODEVICE DATA BUS

CONTROL BUS ADDRESS BUS

MEMORY I/O

PORT

FLAG

RD

ADD (execution cycle 3): B

WR

IR

DECODE

CONTROL AND SEQUENCING

PC

ACC B

ALU

CLOCK

I/ODEVICE I/ODEVICE DATA BUS

CONTROL BUS ADDRESS BUS

MEMORY I/O

PORT

FLAG

RD WR

(7)

ADD (execution cycle 4): ALU

10

,ACC

WR

IR

DECODE

CONTROL AND SEQUENCING

PC

ACC B

ALU

CLOCK

I/ODEVICE I/ODEVICE DATA BUS

CONTROL BUS ADDRESS BUS

MEMORY I/O

PORT

FLAG

RD WR

Flags

N C G E L

PC 4-MUX

0 1 2 3

FLAGRD

PCWR

WR

ADDRESS BUS RD

INC PCRD

PCINC

FLAGOP

JMP (execution cycle 1): IR

RD

IR

DECODE

CONTROL AND SEQUENCING

PC

ACC B

ALU

CLOCK

I/ODEVICE I/ODEVICE DATA BUS

CONTROL BUS ADDRESS BUS

MEMORY I/O

PORT

FLAG RD

JMP (execution cycle 2): PC

WR

IR

DECODE

CONTROL AND SEQUENCING

PC

ACC B

ALU

CLOCK

I/ODEVICE I/ODEVICE DATA BUS

CONTROL BUS ADDRESS BUS

MEMORY I/O

PORT

FLAG WR

(8)

JG (execution cycle 1): IR

RD

,FLAG

RD

IR

DECODE

CONTROL AND SEQUENCING

PC

ACC B

ALU

CLOCK

I/ODEVICE I/ODEVICE DATA BUS

CONTROL BUS ADDRESS BUS

MEMORY I/O

PORT

FLAG RD

RD

JG (execution cycle 2): FLAG

01

IR

DECODE

CONTROL AND SEQUENCING

PC

ACC B

ALU

CLOCK

I/ODEVICE I/ODEVICE DATA BUS

CONTROL BUS ADDRESS BUS

MEMORY I/O

PORT

FLAG OP

Microcode sequence

LDA 510 PCRD MEMRD IRWR PCINC IRRD

DECODERRD

μPCWR IRRD MEMRD ACCWR

JMP 10 PCRD MEMRD IRWR PCINC IRRD

DECODERRD

μPCWR IRRD PCWR

Decoder

4-bit opcode

0 0000

1 0006 μcode for LDA

2 000F LDA

NOP

STA

μcode for JMP

7 JMP

(9)

Control and sequencing unit

μPC CONTROL

SETACC

CLOCK

from decoder PCRD

MEMRD

WR

Control and sequencing unit

PCRD MEMRD MEMWR

0000 1 0 0

1 0

0

0….

0001 0002

IRWR

1 0 0 0 0

0

….

fetch

0003 0004

IRRD

DECODERRD

decode

exec 0006

fetch decode

000F

PCINC

1 0 0

0005 μPCWR

IRRD MEMRD ACCWR 0007

0008 LDA

NOP

Virtual machines

High-Level Language

Assembly Language

Operating System

Instruction Set Architecture

Microarchitecture

Digital Logic Level 0 Level 1 Level 2 Level 3 Level 4 Level 5

Abstractions for computers

X=min of X,Y,Z

int X=7; Y=2; Z=9;

if (X>Y) then if (Y>Z) then

X=Z;

else X=Y;

end else

if (X<Z) then X=Z;

end end

else?

.DATA

X 007

Y 002

Z 009

.CODE

LDA X CMP Y JG L1 CMP Z JL L0 JMP END L0 LDA Z

STA X L1 LDA Y CMP Z JG L2 STA X JMP END L2 LDA Z

STA X END HLT

compiler

(10)

Virtual machines

High-Level Language

Assembly Language

Operating System

Instruction Set Architecture

Microarchitecture

Digital Logic Level 0 Level 1 Level 2 Level 3 Level 4 Level 5

Abstractions for computers

Memory layout

code segment

data segment

1K

3K

X=min of X,Y,Z

.DATA

X 007

Y 002

Z 009

.CODE

LDA X CMP Y JL L1 LDA Y L1 CMP Z JL L2 LDA Z L2 STA X

HLT

.DATA

X 007

Y 002

Z 009

.CODE

LDA Y CMP Z JL L1 LDA Z L1 CMP X

JG END STA X END HLT

X=min of X,Y,Z

.DATA

X 007

Y 002

Z 009

.CODE

LDA Y CMP Z JL L1 LDA Z L1 CMP X

JG END STA X END HLT

1401 A402 D004 1402 A400 B007 2400 9000

X 400 Y 401 Z 402

L1 4 END 7

0 1 2 3 4 5 6 7

(11)

IR

DECODE

CONTROL AND SEQUENCING

ACC B

ALU

DATA BUS

CONTROL BUS ADDRESS BUS

MEMORY

FLAG PC

1401 A402 D004 1402 A400 B007 2400 9000

0007 0002 0009 000 001 002 003 004 005 006 007

400 401 402

LDA 401 CMP 402 JL 004 LDA 402 CMP 400 JG 007 STA 400 HLT

Advanced architecture

Multi-stage pipeline

• Pipelining makes it possible for processor to execute instructions in parallel

• Instruction execution divided into discrete stages

S1 S2 S3 S4 S5

1

Cycles

Stages

S6

2 3 4 5 6 7 8 9 10 11 12

I-1

I-2 I-1

I-2 I-1

I-2 I-1

I-2 I-1

I-2 I-1

I-2

Example of a non- pipelined processor.

For example, 80386.

Many wasted cycles.

Pipelined execution

• More efficient use of cycles, greater throughput of instructions: (80486 started to use pipelining)

S1 S2 S3 S4 S5

1

Cycles

Stages

S6

2 3 4 5 6 7

I-1 I-2 I-1

I-2 I-1 I-2 I-1

I-2 I-1 I-2 I-1

I-2

For k stages and n instructions, the number of

required cycles is:

k + (n – 1) compared to k*n

(12)

Wasted cycles (pipelined)

• When one of the stages requires two or more clock cycles, clock cycles are again wasted.

S1 S2 S3 S4 S5

1

Cycles

Stages

S6

2 3 4 5 6 7

I-1 I-2 I-3

I-1 I-2 I-3

I-1 I-2 I-3

I-1

I-2 I-1 I-1 8

9

I-3 I-2 I-2 exe

10 11

I-3 I-3 I-1

I-2

I-3

For k stages and n instructions, the number of required cycles is:

k + (2n – 1)

Superscalar

A superscalar processor has multiple execution pipelines. In the following, note that Stage S4 has left and right pipelines (u and v).

S1 S2 S3 u S5

1

Cycles

Stages

S6

2 3 4 5 6 7

I-1 I-2 I-3 I-4

I-1 I-2 I-3 I-4

I-1 I-2 I-3 I-4

I-1

I-3 I-1

I-2 I-1 v

I-2

I-4 S4

8 9

I-3 I-4

I-2 I-3

10 I-4

I-2

I-4 I-1

I-3

For k states and n instructions, the number of required cycles is:

k + n

Pentium: 2 pipelines Pentium Pro: 3

Reading from memory

Multiple machine cycles are required when reading from memory, because it responds much more slowly than the CPU. The four steps are:

address placed on address bus Read Line (RD) set low

CPU waits one cycle for memory to respond

Read Line (RD) goes to 1, indicating that the data is on the data bus

Cycle 1 Cycle 2 Cycle 3 Cycle 4

Data Address

CLK

ADDR

RD

DATA

Cache memory

• High-speed expensive static RAM both inside and outside the CPU.

Level-1 cache: inside the CPU Level-2 cache: outside the CPU

• Cache hit: when data to be read is already in cache memory

• Cache miss: when data to be read is not in cache memory. When? compulsory, capacity and conflict.

• Cache design: cache size, n-way, block size, replacement policy

(13)

How a program runs Multitasking

• OS can run multiple programs at the same time.

• Multiple threads of execution within the same program.

• Scheduler utility assigns a given amount of CPU time to each running program.

• Rapid switching of tasks

gives illusion that all programs are running at once the processor must support task switching

scheduling policy, round-robin, priority

IA-32 Architecture

IA-32 architecture

• From 386 to the latest 32-bit processor, P4

• From programmer’s point of view, IA-32 has not changed substantially except the introduction of a set of high-performance instructions

(14)

Modes of operation

• Protected mode

native mode (Windows, Linux), full features, separate memory

• Real-address mode

native MS-DOS

• System management mode

power management, system security, diagnostics

• Virtual-8086 mode

• hybrid of Protected

• each program has its own 8086 computer

Addressable memory

• Protected mode

4 GB

32-bit address

• Real-address and Virtual-8086 modes

1 MB space 20-bit address

General-purpose registers

CS SS DS

ES

EIP EFLAGS

16-bit Segment Registers EAX

EBX ECX EDX

32-bit General-Purpose Registers

FS GS EBP ESP ESI EDI

Named storage locations inside the CPU, optimized for speed.

Accessing parts of registers

• Use 8-bit name, 16-bit name, or 32-bit name

• Applies to EAX, EBX, ECX, and EDX

AH AL

16 bits 8

AX

EAX 8

32 bits 8 bits + 8 bits

(15)

Index and base registers

• Some registers have only a 16-bit name for their lower half. The 16-bit registers are usually used only in real-address mode.

Some specialized register uses

(1 of 2)

• General-Purpose

EAX – accumulator (automatically used by division and multiplication)

ECX – loop counter

ESP – stack pointer (should never be used for arithmetic or data transfer)

ESI, EDI – index registers (used for high-speed memory transfer instructions)

EBP – extended frame pointer (stack)

Some specialized register uses

(2 of 2)

• Segment

CS – code segment DS – data segment SS – stack segment

ES, FS, GS - additional segments

• EIP – instruction pointer

• EFLAGS

status and control flags

each flag is a single binary bit (set or clear)

Status flags

• Carry

unsigned arithmetic out of range

• Overflow

signed arithmetic out of range

• Sign

result is negative

• Zero

result is zero

• Auxiliary Carry

carry from bit 3 to bit 4

• Parity

sum of 1 bits is an even number

(16)

Floating-point, MMX, XMM registers

Eight 80-bit floating-point data registers

ST(0), ST(1), . . . , ST(7) arranged in a stack

used for all floating-point arithmetic

Eight 64-bit MMX registers

Eight 128-bit XMM registers for single-instruction multiple-data (SIMD) operations

ST(0) ST(1) ST(2) ST(3) ST(4) ST(5) ST(6) ST(7)

IA-32 Memory Management

Real-address mode

• 1 MB RAM maximum addressable (20-bit address)

• Application programs can access any area of memory

• Single tasking

• Supported by MS-DOS operating system

Segmented memory

Segmented memory addressing: absolute (linear) address is a combination of a 16-bit segment value added to a 16- bit offset

00000 10000 20000 30000 40000 50000 60000 70000 80000 90000 A0000 B0000 C0000 D0000 E0000 F0000

8000:0000 8000:FFFF

seg ofs

8000:0250 0250

linear addresses

one segment

(17)

Calculating linear addresses

• Given a segment address, multiply it by 16 (add a hexadecimal zero), and add it to the offset

• Example: convert 08F1:0100 to a linear address

Adjusted Segment value: 0 8 F 1 0 Add the offset: 0 1 0 0 Linear address: 0 9 0 1 0

• A typical program has three segments: code, data and stack. Segment registers CS, DS and SS are used to store them separately.

Example

What linear address corresponds to the segment/offset address 028F:0030?

028F0 + 0030 = 02920

Always use hexadecimal notation for addresses.

Example

What segment addresses correspond to the linear address 28F30h?

Many different segment-offset addresses can produce the linear address 28F30h. For example:

28F0:0030, 28F3:0000, 28B0:0430, . . .

Protected mode

(1 of 2)

• 4 GB addressable RAM (32-bit address)

(00000000 to FFFFFFFFh)

• Each program assigned a memory partition which is protected from other programs

• Designed for multitasking

• Supported by Linux & MS-Windows

(18)

Protected mode

(2 of 2)

• Segment descriptor tables

• Program structure

code, data, and stack areas CS, DS, SS segment descriptors global descriptor table (GDT)

• MASM Programs use the Microsoft flat memory model

Multi-segment model

Each program has a local descriptor table (LDT) holds descriptor for each segment used by the program

3000 RAM

00003000

Local Descriptor Table

0002 00008000 000A 00026000 0010

base limit access

8000 26000

multiplied by 1000h

Flat segmentation model

All segments are mpped to the entire 32-bit physical address space, at least two, one for data and one for code

global descriptor table (GDT)

Paging

• Virtual memory uses disk as part of the memory, thus allowing sum of all programs can be larger than physical memory

• Divides each segment into 4096-byte blocks called pages

• Page fault (supported directly by the CPU) – issued by CPU when a page must be loaded from disk

• Virtual memory manager (VMM) – OS utility that manages the loading and unloading of pages

(19)

Components of an IA-32 microcomputer

Components of an IA-32 Microcomputer

• Motherboard

• Video output

• Memory

• Input-output ports

Motherboard

• CPU socket

• External cache memory slots

• Main memory slots

• BIOS chips

• Sound synthesizer chip (optional)

• Video controller chip (optional)

• IDE, parallel, serial, USB, video, keyboard, joystick, network, and mouse connectors

• PCI bus connectors (expansion cards)

dynamic RAM Intel 486 socket

Speaker

IDE drive connectors

mouse, keyboard, parallel, serial, and USB connectors

AGP slot

Batter y Video

Power connector memory controller hub

Diskette connector PCI slots

I/O Controller Firmware hub

Audio chip

Source: Intel® Desktop Board D850MD/D850MV Technical Product Specification

Intel D850MD motherboard

(20)

Video Output

• Video controller

on motherboard, or on expansion card AGP (accelerated graphics port)

• Video memory (VRAM)

• Video CRT Display

uses raster scanning horizontal retrace vertical retrace

• Direct digital LCD monitors

no raster scanning required

Memory

ROM

read-only memory

EPROM

erasable programmable read-only memory

Dynamic RAM (DRAM)

inexpensive; must be refreshed constantly

Static RAM (SRAM)

expensive; used for cache memory; no refresh required

Video RAM (VRAM)

dual ported; optimized for constant video refresh

CMOS RAM

refreshed by a battery system setup information

Input-output ports

• USB (universal serial bus)

intelligent high-speed connection to devices up to 12 megabits/second

USB hub connects multiple devices enumeration: computer queries devices supports hot connections

• Parallel

short cable, high speed common for printers

bidirectional, parallel data transfer Intel 8255 controller chip

Input-output ports

(cont)

• Serial

RS-232 serial port one bit at a time

used for long cables and modems

16550 UART (universal asynchronous receiver transmitter)

programmable in assembly language

(21)

Intel microprocessor history

Early Intel microprocessors

Intel 8080

64K addressable RAM 8-bit registers

CP/M operating system 5,6,8,10 MHz

29K transistros

Intel 8086/8088 (1978) IBM-PC used 8088 1 MB addressable RAM 16-bit registers

16-bit data bus (8-bit for 8088) separate floating-point unit (8087) used in low-cost microcontrollers now

The IBM-AT

Intel 80286 (1982) 16 MB addressable RAM Protected memory

several times faster than 8086 introduced IDE bus architecture 80287 floating point unit

Up to 20MHz 134K transistors

Intel IA-32 Family

Intel386 (1985)

4 GB addressable RAM 32-bit registers

paging (virtual memory) Up to 33MHz

Intel486 (1989)

instruction pipelining Integrated FPU 8K cache

Pentium (1993)

Superscalar (two parallel pipelines)

(22)

Intel P6 Family

Pentium Pro (1995)

advanced optimization techniques in microcode More pipeline stages

On-board L2 cache

Pentium II (1997)

MMX (multimedia) instruction set Up to 450MHz

Pentium III (1999)

SIMD (streaming extensions) instructions (SSE) Up to 1+GHz

Pentium 4 (2000)

NetBurst micro-architecture, tuned for multimedia 3.8+GHz

Pentium D (Dual core)

CISC and RISC

• CISC – complex instruction set

large instruction set

high-level operations (simpler for compiler?)

requires microcode interpreter (could take a long time) examples: Intel 80x86 family

• RISC – reduced instruction set

small instruction set simple, atomic instructions

directly executed by hardware very quickly

easier to incorporate advanced architecture design examples:

ARM (Advanced RISC Machines)

DEC Alpha (now Compaq)

參考文獻

相關文件

Union of green and round: garden hose grass peas ball pie grapes Intersection of green and round: peas grapes.

• When paging in from disk, we need a free frame of physical memory to hold the data we’re reading in. • In reality, size of physical memory is

• The memory storage unit holds instructions and data for a running program.. • A bus is a group of wires that transfer data from one part to another (data,

(B)Data Bus 是在 CPU 和 Memory 之間傳送資料,所以是雙向性 (C)Address Bus 可用來標明 Memory 或 I/O Port 位址的地方 (D)Data Bus 的長度和 Address

Bootstrapping is a general approach to statistical in- ference based on building a sampling distribution for a statistic by resampling from the data at hand.. • The

● In computer science, a data structure is a data organization, management, and storage format that enables efficient access and

In digital systems, a register transfer operation is a basic operation that consists of a transfer of binary information from one set of registers into another set of

Following the supply by the school of a copy of personal data in compliance with a data access request, the requestor is entitled to ask for correction of the personal data