IA-32 Processor Architecture

60  Download (0)

Full text

(1)

Computer Organization &

Computer Organization &

Assembly Languages Assembly Languages

Pu-Jen Cheng

Computer Organization (II)

IA-32 Processor Architecture

(2)

Materials

„ Some materials used in this course are adapted from

¾ The slides prepared by Kip Irvine for the book, Assembly Language for Intel-Based Computers, 5th Ed.

¾ Assembly Language & Computer Organization, NTU (http://www.csie.ntu.edu.tw/~cyy/courses/assembly/

05fall/news/) 05fall/news/)

(http://www.csie.ntu.edu.tw/~acpang/course/asm_2004)

(3)

IA-32 architecture

„

From 386 to the latest 32-bit processor, P4

„

From programmer’s point of view, IA-32 has not

changed substantially except the introduction of a

set of high-performance instructions

(4)

IA-32 Processor Architecture

„

Modes of operation

„

Basic execution environment

„

Floating-point unit

(5)

Modes of Operation

„

Protected mode

¾ native mode (Windows, Linux)

„

Real-address mode

¾ native MS-DOS

„

System management mode

¾ power management, system security, diagnostics

• Virtual-8086 mode

• hybrid of Protected

• each program has its own 8086 computer

(6)

Multitasking (supported by protected mode)

„

OS can run multiple programs at the same time.

„

Multiple threads of execution within the same program.

„

Scheduler utility assigns a given amount of CPU

i h i

time to each running program.

„

Rapid switching of tasks

¾ gives illusion that all programs are running at once

¾ the processor must support task switching.

(7)

Process

„

A process is an instance of a running program.

¾ Not the same as “program” or “processor”

„

Process provides each program with two key abstractions:

Logical control flow

¾ Logical control flow

„ Each program seems to have exclusive use of the CPU.

¾ Private address space

„ Each program seems to have exclusive use of main memory.

„

How are these Illusions maintained?

¾ Process executions interleaved (multitasking)

¾ Address spaces managed by virtual memory system

(8)

Process States

(9)

Basic Execution Environment

„

Addressable memory

„

General-purpose registers

„

Index and base registers

„

Specialized register uses

„

Status flags

„

Status flags

„

Floating-point, MMX, XMM registers

(10)

Addressable Memory

„

Protected mode

¾ 4 GB

¾ 32-bit address

„

Real-address and Virtual-8086 modes

1 MB

¾ 1 MB space

¾ 20-bit address

(11)

General-Purpose Registers

EAX EBX

32-bit General-Purpose Registers

EBP ESP

Named storage locations inside the CPU, optimized for speed.

CS SS DS

ES

EIP EFLAGS

16-bit Segment Registers EBX

ECX EDX

FS GS ESP

ESI EDI

(12)

Accessing Parts of Registers

„

Use 8-bit name, 16-bit name, or 32-bit name

„

Applies to EAX, EBX, ECX, and EDX

AH AL

8 8

8 bits + 8 bits

16 bits

AX

EAX 32 bits

(13)

Index and Base Registers

„ Some registers have only a 16-bit name for their lower half

„ The 16-bit registers are usually used only in real-address mode

(14)

Some specialized register uses

„

General-Purpose

¾ EAX – accumulator (automatically used by division and multiplication)

¾ ECX – loop counter

¾ ESP – stack pointer (should never be used for ith ti d t t f )

arithmetic or data transfer)

¾ ESI, EDI – index registers (used for high-speed memory transfer instructions)

¾ EBP – extended frame pointer (stack)

(15)

Some specialized register uses (cont.)

„

Segment

¾ CS – code segment

¾ DS – data segment

¾ SS – stack segment

¾ ES, FS, GS - additional segments

„

EIP – instruction pointer

„

EFLAGS

¾ control flags (control CPU’s operation, e.g. break, interrupt, enter 8086/protected mode)

¾ Status flag

¾ each flag is a single binary bit (set or clear)

(16)

Status Flags

Carry (CF)

¾ unsigned arithmetic out of range

Overflow (OF)

¾ signed arithmetic out of range

Sign (SF)

¾ result is negative

¾ result is negative

Zero (ZF)

¾ result is zero

Auxiliary Carry (AC)

¾ carry from bit 3 to bit 4 in 8-bit operand

Parity (PF)

¾ sum of 1 bits in least-significant byte is an even number

(17)

System registers

„

Accessed by operating system kernel at highest privilege level, not by application programs

¾ IDTR (Interrupt Descriptor Table Register)

¾ GDTR (Global Descriptor Table Register)

¾ LDTR (Local Descriptor Table Register)

¾ LDTR (Local Descriptor Table Register)

¾ Task register

¾ Debug registers

¾ Control registers (e.g. task switching, paging, enabling cache memory)

¾ Model-specific registers (e.g. performance monitoring, checking architecture)

(18)

Floating-Point, MMX, XMM Registers

„ Eight 80-bit floating-point data registers

¾ ST(0), ST(1), . . . , ST(7)

¾ arranged in a stack

ST(0) ST(1) ST(2) ST(3) ST(4)

¾ used for all floating-point arithmetic

„ Eight 64-bit MMX registers

„ Eight 128-bit XMM registers for single-instruction multiple-data (SIMD) operations

ST(5) ST(6) ST(7)

SIMD: A single computer instruction perform the same identical action (retrieve, calculate, or store) simultaneously on two or more pieces of data

(19)

IA-32 Memory Management

„

Real-address mode

„

Calculating linear addresses

„

Protected mode

„

Multi-segment model

„

Paging

(20)

Real-Address mode

„

1 MB RAM maximum addressable

„

Application programs can access any area of memory

„

Single tasking

„

Supported by MS-DOS operating system

(21)

Segmented Memory

„ Segmented memory addressing: absolute (linear) address is a combination of a 16-bit segment value added to a 16-bit offset

„8086 processor only has 16-bit registers

E0000 F0000

8000:FFFF

00000 10000 20000 30000 40000 50000 60000 70000 80000 90000 A0000 B0000 C0000 D0000

8000:0000

seg ofs

8000:0250 0250

one segment

Segment: 64K units

(22)

Calculating Linear Addresses

„

Given a segment address, multiply it by 16 (add a hexadecimal zero), and add it to the offset

„

Example: convert 08F1:0100 to a linear address

Adjusted Segment value: 0 8 F 1 0 Adjusted Segment value: 0 8 F 1 0 Add the offset: 0 1 0 0 Linear address: 0 9 0 1 0

„

A typical program has three segments: code,

data and stack. Segment registers CS, DS and

SS are used to store them separately.

(23)

Your turn . . .

What linear address corresponds to the segment/offset address 028F:0030?

028F0 + 0030 = 02920

Always use hexadecimal notation for addresses.

(24)

Your turn . . .

What segment addresses correspond to the linear address 28F30h?

Many different segment-offset addresses can produce the

li dd 28F30h F l

linear address 28F30h. For example:

28F0:0030, 28F3:0000, 28B0:0430, . . .

(25)

Protected Mode

„

4 GB addressable RAM

¾ (00000000 to FFFFFFFFh)

„

Each program assigned a memory partition which is protected from other programs

D i d f ltit ki

„

Designed for multitasking

„

Supported by Linux & MS-Windows

(26)

Protected Mode (cont.)

„ Segments

¾ variable-sized areas of memory for code & data

„ Segment descriptor

¾ 64-bit value identifying and describing a single memory segment

¾ contains segment’s base address, access rights, size limit, type, and usage

„ Segment descriptor tableS g p

¾ contains segment descriptors by which OS keep track of locations of individual program segments

„ Segment registers

¾ points to segment descriptor tables

„ Program structure

¾ code, data, and stack areas

¾ CS, DS, SS segment descriptors

¾ global descriptor table (GDT)

„ MASM Programs use the Microsoft flat memory model

(27)

Flat Segment Model

„ Single global descriptor table (GDT) whose base address is in GDTR

„ Created when OS switches the processor into protected mode during boot up

„ All segments mapped to entire 32-bit address space

FFFFFFFF (4GB)

not u

00000000

physical RAM

00000000

Segment descriptor, in the Global Descriptor Table

00040 - - - - base address limit access

00040000

sed

(28)

Multi-Segment Model

„ Each program has a local descriptor table (LDT)

¾ holds descriptor for each segment used by the program

RAM

3000 00003000

Local Descriptor Table

0002 00008000 000A 00026000 0010

base limit access

8000 26000

(29)

Translating Addresses (See Ch11.4)

„

The IA-32 processor uses a one- or two-step process to convert a variable's logical address into a unique memory location.

„

The first step combines a segment value (16- bit segment register) with a variable’s offset bit, segment register) with a variable s offset (32-bit) to create a linear address.

„

The second optional step, called page

translation, converts a linear address to a

physical address.

(30)

Converting Logical to Linear Address

The segment selector (16-bit) points to a

segment

descriptor, which contains the base address of a

Selector Offset Logical address

Descriptor table

address of a

memory segment.

The 32-bit offset from the logical address is added to the segment’s base address,

generating a 32-bit linear address.

Segment Descriptor +

GDTR/LDTR

(contains base address of descriptor table)

Linear address

(31)

Indexing into a Descriptor Table

„ Each segment descriptor indexes into the program's local descriptor table (LDT)

„ Each table entry is mapped to a linear address:

Logical addresses

(unused) Linear address space

Logical addresses

0018 0000003A

DRAM

SS ESP

001A0000 0002A000 0001A000 00003000 Local Descriptor Table

0010 000001B6

0008 00002CD3

LDTR register DS

18 10 08 00 (index)

IP offset

cs

(32)

Virtual Memory Concepts

„ Implements a mapping function

¾ Between virtual address space and physical

address space

Examples

„ Examples

¾ PowerPC

„ 48-bit virtual address

„ 32-bit physical address

¾ Pentium

„ Both are 32-bit addresses

„ But uses

segmentation

(33)

Virtual Memory Concepts (cont.)

„

Virtual address space is divided into fixed-size chunks

¾ These chunks are called virtual pages

¾ Virtual address is divided into

„ Virtual page numberp g

„ Byte offset into a virtual page

¾ Physical memory is also divided into similar-size chunks

„ These chunks are referred to as physical pages

„ Physical address is divided into

„ Physical page number

„ Byte offset within a page

(34)

Virtual Memory Concepts (cont.)

„

Page size is similar to cache line size

„

Typical page size

„ 4 KB

„

Example

32 bit virtual address to 24 bit physical address

¾ 32-bit virtual address to 24-bit physical address

¾ If page size is 4 KB

„ Page offset: 12 bits

„ Virtual page number: 20 bits

„ Physical page number: 12 bits

¾ Virtual memory maps 220 virtual pages to 212 physical pages

(35)

Virtual Memory Concepts (cont.)

An example mapping of 32-bit virtual

address to 24-bit physical address

(36)

Virtual Memory Concepts (cont.)

Virtual to physical physical address mapping

(37)

Virtual Memory Concepts (cont.)

„ A virtual page can be

¾ In main memory

¾ On disk

„ Page fault occurs if the page is not in memory

¾ Like a cache miss

„ OS takes control and transfers the page

(38)

Paging

„ Supported directly by the CPU

„ Divides each segment into 4096-byte blocks called pages

„ Sum of all programs can be larger than physical memory

„ Part of running program is in memory, part is on disk

„ Virtual memory manager (VMM) – OS utility that manages the loading and unloading of pages

„ As the program runs, the processor selectively unloads inactive pages from memory and loads other pages that are immediately required.

(39)

Paging (cont.)

„

OS maintains page directory and page tables

„

Page translation: CPU converts the linear address into a physical address

„

Page fault: issued by CPU when a needed page is not in memory and CPU interrupts the program

not in memory, and CPU interrupts the program

„

OS copies the page into memory, program

resumes execution

(40)

Page Translation

A linear address is divided into a page directory field, page table field, and page

Directory Table Offset

Page Directory Page Table

Physical Address Page Frame Linear Address

10 10 12

field, and page frame offset. The CPU uses all

three to calculate the physical

address.

Directory Entry

CR3

Page-Table Entry

32

(41)

Intel Microprocessor History

„

Intel 8086, 80286

„

IA-32 processor family

„

P6 processor family

„

CISC and RISC

(42)

Early Intel Microprocessors

„ Intel 8080

¾ 64K addressable RAM

¾ 8-bit registers

¾ CP/M operating system

¾ S-100 BUS architecture

¾ 8-inch floppy disks!

„ Intel 8086/8088

¾ IBM-PC Used 8088

¾ 1 MB addressable RAM

¾ 16-bit registers

¾ 16-bit data bus (8-bit for 8088)

¾ separate floating-point unit (8087)

(43)

The IBM-AT

„

Intel 80286

¾ 16 MB addressable RAM

¾ Protected memory

¾ several times faster than 8086

¾ introduced IDE bus architecture

¾ introduced IDE bus architecture

¾ 80287 floating point unit

(44)

Intel IA-32 Family

„

Intel386

¾

4 GB addressable RAM, 32-bit registers, paging (virtual memory)

„

Intel486

¾

instruction pipelining

„

Pentium

¾

superscalar, 32-bit address bus, 64-bit

internal data path

(45)

Intel P6 Family

„

Pentium Pro

¾ advanced optimization techniques in microcode

„

Pentium II

¾ MMX (multimedia) instruction set

P ti III

„

Pentium III

¾ SIMD (streaming extensions) instructions

„

Pentium 4 and Xeon

¾ Intel NetBurst micro-architecture, tuned for multimedia

(46)

What's Next

„

General Concepts

„

IA-32 Processor Architecture

„

IA-32 Memory Management

„

Components of an IA-32 Microcomputer I t O t t S t

„

Input-Output System

(47)

What's Next

„

General Concepts

„

IA-32 Processor Architecture

„

IA-32 Memory Management

„

Components of an IA-32 Microcomputer

Microcomputer

„

Input-Output System

(48)

Components of an IA-32 Microcomputer

„

Motherboard

„

Video output

„

Memory

„

Input-output ports

(49)

Motherboard

„

CPU socket

„

External cache memory slots

„

Main memory slots

„

BIOS chips

„

Sound synthesizer chip (optional)

„

Video controller chip (optional)

„

IDE, parallel, serial, USB, video, keyboard, joystick, network, and mouse connectors

„

PCI bus connectors (expansion cards)

(50)

Intel D850MD Motherboard

Pentium 4 socket mouse, keyboard,

parallel, serial, and USB connectors

AGP slot Video

memory controller hub PCI slots

Audio chip

dynamic RAM

Speaker

IDE drive connectors

Battery Power connector

Diskette connector I/O Controller

Firmware hub

Source: Intel® Desktop Board D850MD/D850MV Technical Product Specification

(51)

Video Output

„

Video controller

¾ on motherboard, or on expansion card

¾ AGP (accelerated graphics port technology)*

„

Video memory (VRAM)

d C l

„

Video CRT Display

¾ uses raster scanning

¾ horizontal retrace

¾ vertical retrace

„

Direct digital LCD monitors

¾ no raster scanning required

* This link may change over time.

(52)

Sample Video Controller (ATI Corp.)

• 128-bit 3D graphics

performance powered by RAGE™ 128 PRO

• 3D graphics performance

• Intelligent TV-Tuner with Digital VCR

• TV-ON-DEMAND™

• Interactive Program Guide

• Still image and MPEG-2 motion video capture

• Video editing

• Hardware DVD video playback

• Video output to TV or VCR

(53)

Memory

„ ROM

¾ read-only memory

„ EPROM

¾ erasable programmable read-only memory

„ Dynamic RAM (DRAM)

¾ inexpensive; must be refreshed constantly

„ Static RAM (SRAM)

¾ expensive; used for cache memory; no refresh required

„ Video RAM (VRAM)

¾ dual ported; optimized for constant video refresh

„ CMOS RAM

¾ complimentary metal-oxide semiconductor

¾ system setup information

„ See: Intel platform memory (Intel technology brief: link address may change)

(54)

Input-Output Ports

„

USB (universal serial bus)

¾ intelligent high-speed connection to devices

¾ up to 480 megabits/second (USB version 2.0)

¾ USB hub connects multiple devices

¾ enumeration: computer queries devices

¾ supports hot connections

„

Parallel

¾ short cable, high speed

¾ common for printers

¾ bidirectional, parallel data transfer

¾ Intel 8255 controller chip

(55)

Input-Output Ports (cont)

„

Serial

¾ RS-232 serial port

¾ one bit at a time

¾ uses long cables and modems

¾ 16550 UART (universal asynchronous receiver

¾ 16550 UART (universal asynchronous receiver transmitter)

¾ programmable in assembly language

(56)

What's Next

„

General Concepts

„

IA-32 Processor Architecture

„

IA-32 Memory Management

„

Components of an IA-32 Microcomputer

„

Input-Output System

„

Input-Output System

(57)

Levels of Input-Output

„ Level 3: Call a library function (C++, Java)

¾ easy to do; abstracted from hardware; details hidden

¾ slowest performance

„ Level 2: Call an operating system function

¾ specific to one OS; device-independentp ; p

¾ medium performance

„ Level 1: Call a BIOS (basic input-output system) function

¾ may produce different results on different systems

¾ knowledge of hardware required

¾ usually good performance

„ Level 0: Communicate directly with the hardware

¾ May not be allowed by some operating systems

(58)

Displaying a String of Characters

When a HLL program displays a string of characters, the

following steps take place:

Application Program

OS Function Level 2

Level 3

BIOS Function

Hardware Level 0

Level 1

(59)

ASM Programming levels

ASM Program

OS Function

BIOS Function Level 1 Level 2

ASM programs can perform input-output at each of the following levels:

g

Hardware Level 0

Level 1

(60)

Figure

Updating...

References

Related subjects :