Lecture 10
Storage Systems
• Types of Storage Devices
I/O Systems
Processor
Cache
Memory - I/O Bus
Main Memory
I/O Controller
Disk Disk
I/O Controller
I/O Controller Graphics Network
interrupts interrupts
Motivation: Who Cares About I/O?
• CPU Performance: 50% to 100% per year
• Multiprocessor supercomputers 150% per year
• I/O system performance limited by mechanical delays
< 5% per year (IO per sec or MB per sec)
• Amdahl's Law: system speed-up limited by the slowest part!
– 10% IO & 10x CPU => 5x Performance (lose 50%) – 10% IO & 100x CPU => 10x Performance (lose 90%) – I/O bottleneck:
» Diminishing fraction of time in CPU
Devices: Magnetic Disks
Sector Track
Cylinder Head Platter
• Purpose:
– Long-term, nonvolatile storage
– Large, inexpensive, slow level in the storage hierarchy
• Read/Write data is three-stage process:
– Seek Time (~20 ms avg, 1M cyc at 50MHz)
» move arm over track – rotational latency -
» wait for the sector to rotate under head
» Average = (0.5)/3600RPM = 8.3ms – Transfer rate
» About a sector per ms (1-10 MB/s)
Disk Time Example
• Disk Parameters:
– 512-byte sector
– Advertised average seek time is 9 ms – Transfer rate is 4MB/sec
– Disk spins at 7200 RPM.
– Controller overhead is 1ms – Assume that the disk is idle
• What is the average time to read/write a sector?
– Ave seek time + ave rot time + xfer time + control overhead – 9 + 0.5 / 7200RPM + 0.5KB / (4.0MB/sec) + 1 = 14.3
Other Devices
• DRAM + Battery
– Big reduction in seek time and lower latency – Cost is not attactive
• CD-ROMs
– Cheap and high density
– For archival storage due to their write once nature
• Magnetic Tapes
– Sequential access – Backup to disks
• Automated tape library
– Robotic tape storage
Processor Interface Issues
• Interconnections
– Busses
• Processor interface
– Interrupts
– Memory mapped I/O
• I/O Control Structures
– Polling – Interrupts
– DMA
– I/O Controllers/Processors
Bus-Based Interconnect
• Bus: a shared communication link between subsystems
– Low cost: a single set of wires is shared multiple ways
– Versatility: Easy to add new devices & peripherals may even be ported between computers using common bus
• Disadvantage
– A communication bottleneck, possibly limiting the maximum I/O throughput
• Bus speed is limited by physical factors
– the bus length
– the number of devices (and, hence, bus loading).
– these physical limits prevent arbitrary bus speedup.
Bus-Based Interconnect
• Two generic types of busses:
– I/O busses: lengthy, many types of devices connected, wide range in the data bandwidth), and follow a bus standard
(sometimes called a channel)
– CPU–memory buses: high speed, matched to the memory system to maximize memory–CPU bandwidth, single device (sometimes called a backplane)
– To lower costs, low cost (older) systems combine together
• Bus transaction
– Sending address & receiving or sending data
Bus Protocols
° ° ° Master Slave
Control Lines Address Lines Data Lines
Bus Master
: has ability to control the bus, initiates transaction -- need bus arbitration if there are multiple bus mastersBus Slave
: module activated by the transactionBus Communication Protocol
: specification of sequence of events and timing requirements in transferring information.Synchronous Bus Protocols
Address Data Read Wait Clock
Address Data
Pipelined/Split transaction Bus Protocol
addr 1 addr 2 addr 3 begin read
Read complete
Asynchronous Handshake
Address Data Read Req.
Ack.
Master Asserts Address Master Asserts Data
Next Address
Write Transaction
t0 t1 t2 t3 t4 t5
t0 : Master has obtained control and asserts address, direction, data Waits a specified amount of time for slaves to decode target
t1: Master asserts request line
4 Cycle Handshake
Read Transaction
Address Data Read Req Ack
Master Asserts Address Next Address
t0 t1 t2 t3 t4 t5
t0 : Master has obtained control and asserts address, direction, data Waits a specified amount of time for slaves to decode target
t1: Master asserts request line
4 Cycle Handshake
Bus Arbitration
Parallel (Centralized) Arbitration
Serial Arbitration (daisy chaining) BR BG
M
BR BG
M
BR BG
M
M
BGi BGo BR
M
BGi BGo BR
M
BGi BGo BR
BG BR A.U.
Bus Request Bus Grant
• Parallel arbitration: use multiple request lines; a centralized arbiter chooses from the among the requesters.
Bus Arbitration
• Distributed arbitration
– Self-selection: use multiple request lines; the devices requesting bus access determine who will be granted access
– Collision detection: multiple simultaneous requests result in a collision; the collision is detected and a
scheme for selecting among the colliding parties is used
Bus Options
Option High performance Low cost
Bus width Separate address Multiplex address
& data lines & data lines
Data width Wider is faster Narrower is cheaper (e.g., 32 bits) (e.g., 8 bits)
Transfer size Multiple words has Single-word transfer less bus overhead is simpler
Bus masters Multiple Single master
(requires arbitration) (no arbitration)
Split Yes—separate No—continuous
transaction? Request and Reply connection is cheaper packets gets higher and has lower latency bandwidth
(needs multiple masters)
Clocking Synchronous Asynchronous
Interfacing Storage Devices to the CPU
• Connect an I/O bus to memory? Or cache?
• How does the CPU address an I/O device
– Memory-mapped I/O
– Dedicated I/O instructions
• I/O control structures
– Polling – Interrupts – DMA
– I/O Processors
Dedicated I/O Instructions
Independent I/O Bus
CPU
Interface Interface
Peripheral Peripheral
Memory memory
bus
Separate I/O instructions (in,out):
the CPU sends a signal that this address is for I/O devices
CPU common memory
& I/O bus
Memory Mapped I/O
Single Memory & I/O Bus No Separate I/O Instructions CPU
Interface Interface
Peripheral Peripheral Memory
I/O
$ CPU
0
n
Polling
CPU
IOC
device Memory
Is the data ready?
read data
store data yes
no
done? no
busy wait loop not an efficient way to use the CPU
unless the device is very fast!
but checks for I/O completion can be dispersed among
computationally intensive code
Interrupt Driven Data Transfer
CPU
IOC
device Memory
add sub and or nop
read store ...
rti
memory
user
program (1) I/O
interrupt (2) save PC (3) interrupt service addr
interrupt service routine (4)
User program progress only halted during actual transfer
each xfer – 1000 bytes : 2 µsec per interrupt
Direct Memory Access
CPU
IOC
device Memory DMAC
Time to do 1000 xfers (0.1 second):
1 DMA set-up sequence @ 50 µsec 1 interrupt @ 2 µsec
1 interrupt service sequence @ 48 µsec 0.1 ms for interrupt overhead
(1/1000 of transfer time) CPU sends a starting address,
direction, and length count to DMAC. Then issues "start".
DMAC provides handshake signals for Peripheral
0
Peripherals DMAC Memory
Mapped I/O
Input/Output Processors
CPU IOP
Mem
D1 D2
Dn . . . main memory
bus
I/O bus CPU
IOP
issues instruction to IOP interrupts when done (1)
memory (2)
(3)
(4) OP Device Address
target device
where cmnds are
looks in memory for commands
OP Addr Cnt Other