IA-32 Processor Architecture

60  Download (0)

Full text


Computer Organization &

Computer Organization &

Assembly Languages Assembly Languages

Pu-Jen Cheng

Computer Organization (II)

IA-32 Processor Architecture



„ Some materials used in this course are adapted from

¾ The slides prepared by Kip Irvine for the book, Assembly Language for Intel-Based Computers, 5th Ed.

¾ Assembly Language & Computer Organization, NTU (http://www.csie.ntu.edu.tw/~cyy/courses/assembly/

05fall/news/) 05fall/news/)



IA-32 architecture


From 386 to the latest 32-bit processor, P4


From programmer’s point of view, IA-32 has not

changed substantially except the introduction of a

set of high-performance instructions


IA-32 Processor Architecture


Modes of operation


Basic execution environment


Floating-point unit


Modes of Operation


Protected mode

¾ native mode (Windows, Linux)


Real-address mode

¾ native MS-DOS


System management mode

¾ power management, system security, diagnostics

• Virtual-8086 mode

• hybrid of Protected

• each program has its own 8086 computer


Multitasking (supported by protected mode)


OS can run multiple programs at the same time.


Multiple threads of execution within the same program.


Scheduler utility assigns a given amount of CPU

i h i

time to each running program.


Rapid switching of tasks

¾ gives illusion that all programs are running at once

¾ the processor must support task switching.




A process is an instance of a running program.

¾ Not the same as “program” or “processor”


Process provides each program with two key abstractions:

Logical control flow

¾ Logical control flow

„ Each program seems to have exclusive use of the CPU.

¾ Private address space

„ Each program seems to have exclusive use of main memory.


How are these Illusions maintained?

¾ Process executions interleaved (multitasking)

¾ Address spaces managed by virtual memory system


Process States


Basic Execution Environment


Addressable memory


General-purpose registers


Index and base registers


Specialized register uses


Status flags


Status flags


Floating-point, MMX, XMM registers


Addressable Memory


Protected mode

¾ 4 GB

¾ 32-bit address


Real-address and Virtual-8086 modes

1 MB

¾ 1 MB space

¾ 20-bit address


General-Purpose Registers


32-bit General-Purpose Registers


Named storage locations inside the CPU, optimized for speed.




16-bit Segment Registers EBX





Accessing Parts of Registers


Use 8-bit name, 16-bit name, or 32-bit name


Applies to EAX, EBX, ECX, and EDX


8 8

8 bits + 8 bits

16 bits


EAX 32 bits


Index and Base Registers

„ Some registers have only a 16-bit name for their lower half

„ The 16-bit registers are usually used only in real-address mode


Some specialized register uses



¾ EAX – accumulator (automatically used by division and multiplication)

¾ ECX – loop counter

¾ ESP – stack pointer (should never be used for ith ti d t t f )

arithmetic or data transfer)

¾ ESI, EDI – index registers (used for high-speed memory transfer instructions)

¾ EBP – extended frame pointer (stack)


Some specialized register uses (cont.)



¾ CS – code segment

¾ DS – data segment

¾ SS – stack segment

¾ ES, FS, GS - additional segments


EIP – instruction pointer



¾ control flags (control CPU’s operation, e.g. break, interrupt, enter 8086/protected mode)

¾ Status flag

¾ each flag is a single binary bit (set or clear)


Status Flags

Carry (CF)

¾ unsigned arithmetic out of range

Overflow (OF)

¾ signed arithmetic out of range

Sign (SF)

¾ result is negative

¾ result is negative

Zero (ZF)

¾ result is zero

Auxiliary Carry (AC)

¾ carry from bit 3 to bit 4 in 8-bit operand

Parity (PF)

¾ sum of 1 bits in least-significant byte is an even number


System registers


Accessed by operating system kernel at highest privilege level, not by application programs

¾ IDTR (Interrupt Descriptor Table Register)

¾ GDTR (Global Descriptor Table Register)

¾ LDTR (Local Descriptor Table Register)

¾ LDTR (Local Descriptor Table Register)

¾ Task register

¾ Debug registers

¾ Control registers (e.g. task switching, paging, enabling cache memory)

¾ Model-specific registers (e.g. performance monitoring, checking architecture)


Floating-Point, MMX, XMM Registers

„ Eight 80-bit floating-point data registers

¾ ST(0), ST(1), . . . , ST(7)

¾ arranged in a stack

ST(0) ST(1) ST(2) ST(3) ST(4)

¾ used for all floating-point arithmetic

„ Eight 64-bit MMX registers

„ Eight 128-bit XMM registers for single-instruction multiple-data (SIMD) operations

ST(5) ST(6) ST(7)

SIMD: A single computer instruction perform the same identical action (retrieve, calculate, or store) simultaneously on two or more pieces of data


IA-32 Memory Management


Real-address mode


Calculating linear addresses


Protected mode


Multi-segment model




Real-Address mode


1 MB RAM maximum addressable


Application programs can access any area of memory


Single tasking


Supported by MS-DOS operating system


Segmented Memory

„ Segmented memory addressing: absolute (linear) address is a combination of a 16-bit segment value added to a 16-bit offset

„8086 processor only has 16-bit registers

E0000 F0000


00000 10000 20000 30000 40000 50000 60000 70000 80000 90000 A0000 B0000 C0000 D0000


seg ofs

8000:0250 0250

one segment

Segment: 64K units


Calculating Linear Addresses


Given a segment address, multiply it by 16 (add a hexadecimal zero), and add it to the offset


Example: convert 08F1:0100 to a linear address

Adjusted Segment value: 0 8 F 1 0 Adjusted Segment value: 0 8 F 1 0 Add the offset: 0 1 0 0 Linear address: 0 9 0 1 0


A typical program has three segments: code,

data and stack. Segment registers CS, DS and

SS are used to store them separately.


Your turn . . .

What linear address corresponds to the segment/offset address 028F:0030?

028F0 + 0030 = 02920

Always use hexadecimal notation for addresses.


Your turn . . .

What segment addresses correspond to the linear address 28F30h?

Many different segment-offset addresses can produce the

li dd 28F30h F l

linear address 28F30h. For example:

28F0:0030, 28F3:0000, 28B0:0430, . . .


Protected Mode


4 GB addressable RAM

¾ (00000000 to FFFFFFFFh)


Each program assigned a memory partition which is protected from other programs

D i d f ltit ki


Designed for multitasking


Supported by Linux & MS-Windows


Protected Mode (cont.)

„ Segments

¾ variable-sized areas of memory for code & data

„ Segment descriptor

¾ 64-bit value identifying and describing a single memory segment

¾ contains segment’s base address, access rights, size limit, type, and usage

„ Segment descriptor tableS g p

¾ contains segment descriptors by which OS keep track of locations of individual program segments

„ Segment registers

¾ points to segment descriptor tables

„ Program structure

¾ code, data, and stack areas

¾ CS, DS, SS segment descriptors

¾ global descriptor table (GDT)

„ MASM Programs use the Microsoft flat memory model


Flat Segment Model

„ Single global descriptor table (GDT) whose base address is in GDTR

„ Created when OS switches the processor into protected mode during boot up

„ All segments mapped to entire 32-bit address space


not u


physical RAM


Segment descriptor, in the Global Descriptor Table

00040 - - - - base address limit access




Multi-Segment Model

„ Each program has a local descriptor table (LDT)

¾ holds descriptor for each segment used by the program


3000 00003000

Local Descriptor Table

0002 00008000 000A 00026000 0010

base limit access

8000 26000


Translating Addresses (See Ch11.4)


The IA-32 processor uses a one- or two-step process to convert a variable's logical address into a unique memory location.


The first step combines a segment value (16- bit segment register) with a variable’s offset bit, segment register) with a variable s offset (32-bit) to create a linear address.


The second optional step, called page

translation, converts a linear address to a

physical address.


Converting Logical to Linear Address

The segment selector (16-bit) points to a


descriptor, which contains the base address of a

Selector Offset Logical address

Descriptor table

address of a

memory segment.

The 32-bit offset from the logical address is added to the segment’s base address,

generating a 32-bit linear address.

Segment Descriptor +


(contains base address of descriptor table)

Linear address


Indexing into a Descriptor Table

„ Each segment descriptor indexes into the program's local descriptor table (LDT)

„ Each table entry is mapped to a linear address:

Logical addresses

(unused) Linear address space

Logical addresses

0018 0000003A



001A0000 0002A000 0001A000 00003000 Local Descriptor Table

0010 000001B6

0008 00002CD3

LDTR register DS

18 10 08 00 (index)

IP offset



Virtual Memory Concepts

„ Implements a mapping function

¾ Between virtual address space and physical

address space


„ Examples

¾ PowerPC

„ 48-bit virtual address

„ 32-bit physical address

¾ Pentium

„ Both are 32-bit addresses

„ But uses



Virtual Memory Concepts (cont.)


Virtual address space is divided into fixed-size chunks

¾ These chunks are called virtual pages

¾ Virtual address is divided into

„ Virtual page numberp g

„ Byte offset into a virtual page

¾ Physical memory is also divided into similar-size chunks

„ These chunks are referred to as physical pages

„ Physical address is divided into

„ Physical page number

„ Byte offset within a page


Virtual Memory Concepts (cont.)


Page size is similar to cache line size


Typical page size

„ 4 KB



32 bit virtual address to 24 bit physical address

¾ 32-bit virtual address to 24-bit physical address

¾ If page size is 4 KB

„ Page offset: 12 bits

„ Virtual page number: 20 bits

„ Physical page number: 12 bits

¾ Virtual memory maps 220 virtual pages to 212 physical pages


Virtual Memory Concepts (cont.)

An example mapping of 32-bit virtual

address to 24-bit physical address


Virtual Memory Concepts (cont.)

Virtual to physical physical address mapping


Virtual Memory Concepts (cont.)

„ A virtual page can be

¾ In main memory

¾ On disk

„ Page fault occurs if the page is not in memory

¾ Like a cache miss

„ OS takes control and transfers the page



„ Supported directly by the CPU

„ Divides each segment into 4096-byte blocks called pages

„ Sum of all programs can be larger than physical memory

„ Part of running program is in memory, part is on disk

„ Virtual memory manager (VMM) – OS utility that manages the loading and unloading of pages

„ As the program runs, the processor selectively unloads inactive pages from memory and loads other pages that are immediately required.


Paging (cont.)


OS maintains page directory and page tables


Page translation: CPU converts the linear address into a physical address


Page fault: issued by CPU when a needed page is not in memory and CPU interrupts the program

not in memory, and CPU interrupts the program


OS copies the page into memory, program

resumes execution


Page Translation

A linear address is divided into a page directory field, page table field, and page

Directory Table Offset

Page Directory Page Table

Physical Address Page Frame Linear Address

10 10 12

field, and page frame offset. The CPU uses all

three to calculate the physical


Directory Entry


Page-Table Entry



Intel Microprocessor History


Intel 8086, 80286


IA-32 processor family


P6 processor family




Early Intel Microprocessors

„ Intel 8080

¾ 64K addressable RAM

¾ 8-bit registers

¾ CP/M operating system

¾ S-100 BUS architecture

¾ 8-inch floppy disks!

„ Intel 8086/8088

¾ IBM-PC Used 8088

¾ 1 MB addressable RAM

¾ 16-bit registers

¾ 16-bit data bus (8-bit for 8088)

¾ separate floating-point unit (8087)




Intel 80286

¾ 16 MB addressable RAM

¾ Protected memory

¾ several times faster than 8086

¾ introduced IDE bus architecture

¾ introduced IDE bus architecture

¾ 80287 floating point unit


Intel IA-32 Family




4 GB addressable RAM, 32-bit registers, paging (virtual memory)




instruction pipelining




superscalar, 32-bit address bus, 64-bit

internal data path


Intel P6 Family


Pentium Pro

¾ advanced optimization techniques in microcode


Pentium II

¾ MMX (multimedia) instruction set

P ti III


Pentium III

¾ SIMD (streaming extensions) instructions


Pentium 4 and Xeon

¾ Intel NetBurst micro-architecture, tuned for multimedia


What's Next


General Concepts


IA-32 Processor Architecture


IA-32 Memory Management


Components of an IA-32 Microcomputer I t O t t S t


Input-Output System


What's Next


General Concepts


IA-32 Processor Architecture


IA-32 Memory Management


Components of an IA-32 Microcomputer



Input-Output System


Components of an IA-32 Microcomputer




Video output




Input-output ports




CPU socket


External cache memory slots


Main memory slots


BIOS chips


Sound synthesizer chip (optional)


Video controller chip (optional)


IDE, parallel, serial, USB, video, keyboard, joystick, network, and mouse connectors


PCI bus connectors (expansion cards)


Intel D850MD Motherboard

Pentium 4 socket mouse, keyboard,

parallel, serial, and USB connectors

AGP slot Video

memory controller hub PCI slots

Audio chip

dynamic RAM


IDE drive connectors

Battery Power connector

Diskette connector I/O Controller

Firmware hub

Source: Intel® Desktop Board D850MD/D850MV Technical Product Specification


Video Output


Video controller

¾ on motherboard, or on expansion card

¾ AGP (accelerated graphics port technology)*


Video memory (VRAM)

d C l


Video CRT Display

¾ uses raster scanning

¾ horizontal retrace

¾ vertical retrace


Direct digital LCD monitors

¾ no raster scanning required

* This link may change over time.


Sample Video Controller (ATI Corp.)

• 128-bit 3D graphics

performance powered by RAGE™ 128 PRO

• 3D graphics performance

• Intelligent TV-Tuner with Digital VCR


• Interactive Program Guide

• Still image and MPEG-2 motion video capture

• Video editing

• Hardware DVD video playback

• Video output to TV or VCR




¾ read-only memory


¾ erasable programmable read-only memory

„ Dynamic RAM (DRAM)

¾ inexpensive; must be refreshed constantly

„ Static RAM (SRAM)

¾ expensive; used for cache memory; no refresh required

„ Video RAM (VRAM)

¾ dual ported; optimized for constant video refresh


¾ complimentary metal-oxide semiconductor

¾ system setup information

„ See: Intel platform memory (Intel technology brief: link address may change)


Input-Output Ports


USB (universal serial bus)

¾ intelligent high-speed connection to devices

¾ up to 480 megabits/second (USB version 2.0)

¾ USB hub connects multiple devices

¾ enumeration: computer queries devices

¾ supports hot connections



¾ short cable, high speed

¾ common for printers

¾ bidirectional, parallel data transfer

¾ Intel 8255 controller chip


Input-Output Ports (cont)



¾ RS-232 serial port

¾ one bit at a time

¾ uses long cables and modems

¾ 16550 UART (universal asynchronous receiver

¾ 16550 UART (universal asynchronous receiver transmitter)

¾ programmable in assembly language


What's Next


General Concepts


IA-32 Processor Architecture


IA-32 Memory Management


Components of an IA-32 Microcomputer


Input-Output System


Input-Output System


Levels of Input-Output

„ Level 3: Call a library function (C++, Java)

¾ easy to do; abstracted from hardware; details hidden

¾ slowest performance

„ Level 2: Call an operating system function

¾ specific to one OS; device-independentp ; p

¾ medium performance

„ Level 1: Call a BIOS (basic input-output system) function

¾ may produce different results on different systems

¾ knowledge of hardware required

¾ usually good performance

„ Level 0: Communicate directly with the hardware

¾ May not be allowed by some operating systems


Displaying a String of Characters

When a HLL program displays a string of characters, the

following steps take place:

Application Program

OS Function Level 2

Level 3

BIOS Function

Hardware Level 0

Level 1


ASM Programming levels

ASM Program

OS Function

BIOS Function Level 1 Level 2

ASM programs can perform input-output at each of the following levels:


Hardware Level 0

Level 1





Related subjects :