Computer Organization &
Computer Organization &
Assembly Languages Assembly Languages
Pu-Jen Cheng
Computer Organization (II)
IA-32 Processor Architecture
Materials
Some materials used in this course are adapted from
¾ The slides prepared by Kip Irvine for the book, Assembly Language for Intel-Based Computers, 5th Ed.
¾ Assembly Language & Computer Organization, NTU (http://www.csie.ntu.edu.tw/~cyy/courses/assembly/
05fall/news/) 05fall/news/)
(http://www.csie.ntu.edu.tw/~acpang/course/asm_2004)
IA-32 architecture
From 386 to the latest 32-bit processor, P4
From programmer’s point of view, IA-32 has not
changed substantially except the introduction of a
set of high-performance instructions
IA-32 Processor Architecture
Modes of operation
Basic execution environment
Floating-point unit
Modes of Operation
Protected mode
¾ native mode (Windows, Linux)
Real-address mode
¾ native MS-DOS
System management mode
¾ power management, system security, diagnostics
• Virtual-8086 mode
• hybrid of Protected
• each program has its own 8086 computer
Multitasking (supported by protected mode)
OS can run multiple programs at the same time.
Multiple threads of execution within the same program.
Scheduler utility assigns a given amount of CPU
i h i
time to each running program.
Rapid switching of tasks
¾ gives illusion that all programs are running at once
¾ the processor must support task switching.
Process
A process is an instance of a running program.
¾ Not the same as “program” or “processor”
Process provides each program with two key abstractions:
Logical control flow
¾ Logical control flow
Each program seems to have exclusive use of the CPU.
¾ Private address space
Each program seems to have exclusive use of main memory.
How are these Illusions maintained?
¾ Process executions interleaved (multitasking)
¾ Address spaces managed by virtual memory system
Process States
Basic Execution Environment
Addressable memory
General-purpose registers
Index and base registers
Specialized register uses
Status flags
Status flags
Floating-point, MMX, XMM registers
Addressable Memory
Protected mode
¾ 4 GB
¾ 32-bit address
Real-address and Virtual-8086 modes
1 MB
¾ 1 MB space
¾ 20-bit address
General-Purpose Registers
EAX EBX
32-bit General-Purpose Registers
EBP ESP
Named storage locations inside the CPU, optimized for speed.
CS SS DS
ES
EIP EFLAGS
16-bit Segment Registers EBX
ECX EDX
FS GS ESP
ESI EDI
Accessing Parts of Registers
Use 8-bit name, 16-bit name, or 32-bit name
Applies to EAX, EBX, ECX, and EDX
AH AL
8 8
8 bits + 8 bits
16 bits
AX
EAX 32 bits
Index and Base Registers
Some registers have only a 16-bit name for their lower half
The 16-bit registers are usually used only in real-address mode
Some specialized register uses
General-Purpose
¾ EAX – accumulator (automatically used by division and multiplication)
¾ ECX – loop counter
¾ ESP – stack pointer (should never be used for ith ti d t t f )
arithmetic or data transfer)
¾ ESI, EDI – index registers (used for high-speed memory transfer instructions)
¾ EBP – extended frame pointer (stack)
Some specialized register uses (cont.)
Segment
¾ CS – code segment
¾ DS – data segment
¾ SS – stack segment
¾ ES, FS, GS - additional segments
EIP – instruction pointer
EFLAGS
¾ control flags (control CPU’s operation, e.g. break, interrupt, enter 8086/protected mode)
¾ Status flag
¾ each flag is a single binary bit (set or clear)
Status Flags
• Carry (CF)
¾ unsigned arithmetic out of range
• Overflow (OF)
¾ signed arithmetic out of range
• Sign (SF)
¾ result is negative
¾ result is negative
• Zero (ZF)
¾ result is zero
• Auxiliary Carry (AC)
¾ carry from bit 3 to bit 4 in 8-bit operand
• Parity (PF)
¾ sum of 1 bits in least-significant byte is an even number
System registers
Accessed by operating system kernel at highest privilege level, not by application programs
¾ IDTR (Interrupt Descriptor Table Register)
¾ GDTR (Global Descriptor Table Register)
¾ LDTR (Local Descriptor Table Register)
¾ LDTR (Local Descriptor Table Register)
¾ Task register
¾ Debug registers
¾ Control registers (e.g. task switching, paging, enabling cache memory)
¾ Model-specific registers (e.g. performance monitoring, checking architecture)
Floating-Point, MMX, XMM Registers
Eight 80-bit floating-point data registers
¾ ST(0), ST(1), . . . , ST(7)
¾ arranged in a stack
ST(0) ST(1) ST(2) ST(3) ST(4)
¾ used for all floating-point arithmetic
Eight 64-bit MMX registers
Eight 128-bit XMM registers for single-instruction multiple-data (SIMD) operations
ST(5) ST(6) ST(7)
SIMD: A single computer instruction perform the same identical action (retrieve, calculate, or store) simultaneously on two or more pieces of data
IA-32 Memory Management
Real-address mode
Calculating linear addresses
Protected mode
Multi-segment model
Paging
Real-Address mode
1 MB RAM maximum addressable
Application programs can access any area of memory
Single tasking
Supported by MS-DOS operating system
Segmented Memory
Segmented memory addressing: absolute (linear) address is a combination of a 16-bit segment value added to a 16-bit offset
8086 processor only has 16-bit registers
E0000 F0000
8000:FFFF
00000 10000 20000 30000 40000 50000 60000 70000 80000 90000 A0000 B0000 C0000 D0000
8000:0000
seg ofs
8000:0250 0250
one segment
Segment: 64K units
Calculating Linear Addresses
Given a segment address, multiply it by 16 (add a hexadecimal zero), and add it to the offset
Example: convert 08F1:0100 to a linear address
Adjusted Segment value: 0 8 F 1 0 Adjusted Segment value: 0 8 F 1 0 Add the offset: 0 1 0 0 Linear address: 0 9 0 1 0
A typical program has three segments: code,
data and stack. Segment registers CS, DS and
SS are used to store them separately.
Your turn . . .
What linear address corresponds to the segment/offset address 028F:0030?
028F0 + 0030 = 02920
Always use hexadecimal notation for addresses.
Your turn . . .
What segment addresses correspond to the linear address 28F30h?
Many different segment-offset addresses can produce the
li dd 28F30h F l
linear address 28F30h. For example:
28F0:0030, 28F3:0000, 28B0:0430, . . .
Protected Mode
4 GB addressable RAM
¾ (00000000 to FFFFFFFFh)
Each program assigned a memory partition which is protected from other programs
D i d f ltit ki
Designed for multitasking
Supported by Linux & MS-Windows
Protected Mode (cont.)
Segments
¾ variable-sized areas of memory for code & data
Segment descriptor
¾ 64-bit value identifying and describing a single memory segment
¾ contains segment’s base address, access rights, size limit, type, and usage
Segment descriptor tableS g p
¾ contains segment descriptors by which OS keep track of locations of individual program segments
Segment registers
¾ points to segment descriptor tables
Program structure
¾ code, data, and stack areas
¾ CS, DS, SS segment descriptors
¾ global descriptor table (GDT)
MASM Programs use the Microsoft flat memory model
Flat Segment Model
Single global descriptor table (GDT) whose base address is in GDTR
Created when OS switches the processor into protected mode during boot up
All segments mapped to entire 32-bit address space
FFFFFFFF (4GB)
not u
00000000
physical RAM
00000000
Segment descriptor, in the Global Descriptor Table
00040 - - - - base address limit access
00040000
sed
Multi-Segment Model
Each program has a local descriptor table (LDT)
¾ holds descriptor for each segment used by the program
RAM
3000 00003000
Local Descriptor Table
0002 00008000 000A 00026000 0010
base limit access
8000 26000
Translating Addresses (See Ch11.4)
The IA-32 processor uses a one- or two-step process to convert a variable's logical address into a unique memory location.
The first step combines a segment value (16- bit segment register) with a variable’s offset bit, segment register) with a variable s offset (32-bit) to create a linear address.
The second optional step, called page
translation, converts a linear address to a
physical address.
Converting Logical to Linear Address
The segment selector (16-bit) points to a
segment
descriptor, which contains the base address of a
Selector Offset Logical address
Descriptor table
address of a
memory segment.
The 32-bit offset from the logical address is added to the segment’s base address,
generating a 32-bit linear address.
Segment Descriptor +
GDTR/LDTR
(contains base address of descriptor table)
Linear address
Indexing into a Descriptor Table
Each segment descriptor indexes into the program's local descriptor table (LDT)
Each table entry is mapped to a linear address:
Logical addresses
(unused) Linear address space
Logical addresses
0018 0000003A
DRAM
SS ESP
001A0000 0002A000 0001A000 00003000 Local Descriptor Table
0010 000001B6
0008 00002CD3
LDTR register DS
18 10 08 00 (index)
IP offset
cs
Virtual Memory Concepts
Implements a mapping function
¾ Between virtual address space and physical
address space
Examples
Examples
¾ PowerPC
48-bit virtual address
32-bit physical address
¾ Pentium
Both are 32-bit addresses
But uses
segmentation
Virtual Memory Concepts (cont.)
Virtual address space is divided into fixed-size chunks
¾ These chunks are called virtual pages
¾ Virtual address is divided into
Virtual page numberp g
Byte offset into a virtual page
¾ Physical memory is also divided into similar-size chunks
These chunks are referred to as physical pages
Physical address is divided into
Physical page number
Byte offset within a page
Virtual Memory Concepts (cont.)
Page size is similar to cache line size
Typical page size
4 KB
Example
32 bit virtual address to 24 bit physical address
¾ 32-bit virtual address to 24-bit physical address
¾ If page size is 4 KB
Page offset: 12 bits
Virtual page number: 20 bits
Physical page number: 12 bits
¾ Virtual memory maps 220 virtual pages to 212 physical pages
Virtual Memory Concepts (cont.)
An example mapping of 32-bit virtual
address to 24-bit physical address
Virtual Memory Concepts (cont.)
Virtual to physical physical address mapping
Virtual Memory Concepts (cont.)
A virtual page can be
¾ In main memory
¾ On disk
Page fault occurs if the page is not in memory
¾ Like a cache miss
OS takes control and transfers the page
Paging
Supported directly by the CPU
Divides each segment into 4096-byte blocks called pages
Sum of all programs can be larger than physical memory
Part of running program is in memory, part is on disk
Virtual memory manager (VMM) – OS utility that manages the loading and unloading of pages
As the program runs, the processor selectively unloads inactive pages from memory and loads other pages that are immediately required.
Paging (cont.)
OS maintains page directory and page tables
Page translation: CPU converts the linear address into a physical address
Page fault: issued by CPU when a needed page is not in memory and CPU interrupts the program
not in memory, and CPU interrupts the program
OS copies the page into memory, program
resumes execution
Page Translation
A linear address is divided into a page directory field, page table field, and page
Directory Table Offset
Page Directory Page Table
Physical Address Page Frame Linear Address
10 10 12
field, and page frame offset. The CPU uses all
three to calculate the physical
address.
Directory Entry
CR3
Page-Table Entry
32
Intel Microprocessor History
Intel 8086, 80286
IA-32 processor family
P6 processor family
CISC and RISC
Early Intel Microprocessors
Intel 8080
¾ 64K addressable RAM
¾ 8-bit registers
¾ CP/M operating system
¾ S-100 BUS architecture
¾ 8-inch floppy disks!
Intel 8086/8088
¾ IBM-PC Used 8088
¾ 1 MB addressable RAM
¾ 16-bit registers
¾ 16-bit data bus (8-bit for 8088)
¾ separate floating-point unit (8087)
The IBM-AT
Intel 80286
¾ 16 MB addressable RAM
¾ Protected memory
¾ several times faster than 8086
¾ introduced IDE bus architecture
¾ introduced IDE bus architecture
¾ 80287 floating point unit
Intel IA-32 Family
Intel386
¾
4 GB addressable RAM, 32-bit registers, paging (virtual memory)
Intel486
¾
instruction pipelining
Pentium
¾
superscalar, 32-bit address bus, 64-bit
internal data path
Intel P6 Family
Pentium Pro
¾ advanced optimization techniques in microcode
Pentium II
¾ MMX (multimedia) instruction set
P ti III
Pentium III
¾ SIMD (streaming extensions) instructions
Pentium 4 and Xeon
¾ Intel NetBurst micro-architecture, tuned for multimedia
What's Next
General Concepts
IA-32 Processor Architecture
IA-32 Memory Management
Components of an IA-32 Microcomputer I t O t t S t
Input-Output System
What's Next
General Concepts
IA-32 Processor Architecture
IA-32 Memory Management
Components of an IA-32 Microcomputer
Microcomputer
Input-Output System
Components of an IA-32 Microcomputer
Motherboard
Video output
Memory
Input-output ports
Motherboard
CPU socket
External cache memory slots
Main memory slots
BIOS chips
Sound synthesizer chip (optional)
Video controller chip (optional)
IDE, parallel, serial, USB, video, keyboard, joystick, network, and mouse connectors
PCI bus connectors (expansion cards)
Intel D850MD Motherboard
Pentium 4 socket mouse, keyboard,
parallel, serial, and USB connectors
AGP slot Video
memory controller hub PCI slots
Audio chip
dynamic RAM
Speaker
IDE drive connectors
Battery Power connector
Diskette connector I/O Controller
Firmware hub
Source: Intel® Desktop Board D850MD/D850MV Technical Product Specification
Video Output
Video controller
¾ on motherboard, or on expansion card
¾ AGP (accelerated graphics port technology)*
Video memory (VRAM)
d C l
Video CRT Display
¾ uses raster scanning
¾ horizontal retrace
¾ vertical retrace
Direct digital LCD monitors
¾ no raster scanning required
* This link may change over time.
Sample Video Controller (ATI Corp.)
• 128-bit 3D graphics
performance powered by RAGE™ 128 PRO
• 3D graphics performance
• Intelligent TV-Tuner with Digital VCR
• TV-ON-DEMAND™
• Interactive Program Guide
• Still image and MPEG-2 motion video capture
• Video editing
• Hardware DVD video playback
• Video output to TV or VCR
Memory
ROM
¾ read-only memory
EPROM
¾ erasable programmable read-only memory
Dynamic RAM (DRAM)
¾ inexpensive; must be refreshed constantly
Static RAM (SRAM)
¾ expensive; used for cache memory; no refresh required
Video RAM (VRAM)
¾ dual ported; optimized for constant video refresh
CMOS RAM
¾ complimentary metal-oxide semiconductor
¾ system setup information
See: Intel platform memory (Intel technology brief: link address may change)
Input-Output Ports
USB (universal serial bus)
¾ intelligent high-speed connection to devices
¾ up to 480 megabits/second (USB version 2.0)
¾ USB hub connects multiple devices
¾ enumeration: computer queries devices
¾ supports hot connections
Parallel
¾ short cable, high speed
¾ common for printers
¾ bidirectional, parallel data transfer
¾ Intel 8255 controller chip
Input-Output Ports (cont)
Serial
¾ RS-232 serial port
¾ one bit at a time
¾ uses long cables and modems
¾ 16550 UART (universal asynchronous receiver
¾ 16550 UART (universal asynchronous receiver transmitter)
¾ programmable in assembly language
What's Next
General Concepts
IA-32 Processor Architecture
IA-32 Memory Management
Components of an IA-32 Microcomputer
Input-Output System
Input-Output System
Levels of Input-Output
Level 3: Call a library function (C++, Java)
¾ easy to do; abstracted from hardware; details hidden
¾ slowest performance
Level 2: Call an operating system function
¾ specific to one OS; device-independentp ; p
¾ medium performance
Level 1: Call a BIOS (basic input-output system) function
¾ may produce different results on different systems
¾ knowledge of hardware required
¾ usually good performance
Level 0: Communicate directly with the hardware
¾ May not be allowed by some operating systems
Displaying a String of Characters
When a HLL program displays a string of characters, the
following steps take place:
Application Program
OS Function Level 2
Level 3
BIOS Function
Hardware Level 0
Level 1
ASM Programming levels
ASM Program
OS Function
BIOS Function Level 1 Level 2
ASM programs can perform input-output at each of the following levels:
g
Hardware Level 0
Level 1