ARM Architecture
Computer Organization and Assembly Languages p g z y g g Yung-Yu Chuang
with slides by Peng-Sheng Chen, Ville Pietikainen
ARM history
• 1983 developed by Acorn computers
T l 6502 i BBC t
– To replace 6502 in BBC computers – 4-man VLSI design team
It i li it f th i i t
– Its simplicity comes from the inexperience team
– Match the needs for generalized SoC for reasonable power performance and die size
power, performance and die size
– The first commercial RISC implemenation
1990 ARM (Ad d RISC M hi ) d b
• 1990 ARM (Advanced RISC Machine), owned by
Acorn, Apple and VLSI
ARM Ltd
Design and license ARM core design but not fabricate
Why ARM?
• One of the most licensed and thus widespread processor cores in the world
processor cores in the world
– Used in PDA, cell phones, multimedia players, handheld game console digital TV and cameras handheld game console, digital TV and cameras – ARM7: GBA, iPod
ARM9: NDS PSP Sony Ericsson BenQ – ARM9: NDS, PSP, Sony Ericsson, BenQ – ARM11: Apple iPhone, Nokia N93, N800
90% of 32 bit embedded RISC processors till 2009 – 90% of 32-bit embedded RISC processors till 2009
• Used especially in portable devices due to its
l ti d bl
low power consumption and reasonable
performance
ARM powered products
ARM processors
• A simple but powerful design
A h l f il f d i h i i il d i
• A whole family of designs sharing similar design
principles and a common instruction set
Naming ARM
• ARMxyzTDMIEJFS
– x: series – x: series – y: MMU
z: cache – z: cache – T: Thumb
D: debugger – D: debugger – M: Multiplier
I: EmbeddedICE (built in debugger hardware) – I: EmbeddedICE (built-in debugger hardware) – E: Enhanced instruction
J J ll (JVM) – J: Jazelle (JVM) – F: Floating-point
S S th i ibl i ( d i f EDA – S: Synthesizible version (source code version for EDA
tools)
Popular ARM architectures
• ARM7TDMI
3 i li t (f t h/d d / t ) – 3 pipeline stages (fetch/decode/execute) – High code density/low power consumption
O f th t d ARM i (f l d – One of the most used ARM-version (for low-end
systems)
All ARM cores after ARM7TDMI include TDMI even if – All ARM cores after ARM7TDMI include TDMI even if
they do not include TDMI in their labels
ARM9TDMI
• ARM9TDMI
– Compatible with ARM7
5 t (f t h/d d / t / / it )
– 5 stages (fetch/decode/execute/memory/write) – Separate instruction and data cache
• ARM11
ARM family comparison
year 1995 1997 1999 2003
ARM is a RISC
• RISC: simple but powerful instructions that
execute within a single cycle at high clock speed execute within a single cycle at high clock speed.
• Four major design rules:
– Instructions: reduced set/single cycle/fixed length – Pipeline: decode in one stage/no need for microcode – Registers: a large set of general-purpose registers
– Load/store architecture: data processing instructions apply to registers only; load/store to transfer data from memory
R l i i l d i d f l k
• Results in simple design and fast clock rate
• The distinction blurs because CISC implements
RISC concepts
ARM design philosophy
• Small processor for lower power consumption (for embedded system)
(for embedded system)
• High code density for limited memory and h i l i t i ti
physical size restrictions
• The ability to use slow and low-cost memory
• Reduced die size for reducing manufacture cost
and accommodating more peripherals g p p
ARM features
• Different from pure RISC in several ways:
V i bl l ti f t i i t ti
– Variable cycle execution for certain instructions:
multiple-register load/store (faster/higher code density)
density)
– Inline barrel shifter leading to more complex
instructions: improves performance and code density instructions: improves performance and code density – Thumb 16-bit instruction set: 30% code density
improvementp
– Conditional execution: improve performance and code density by reducing branch
– Enhanced instructions: DSP instructions
ARM architecture
ARM architecture
• Load/store architecture architecture
• A large array of
if i t
uniform registers
• Fixed-length 32-bit instructions
• 3-address instructions
Registers
• Only 16 registers are visible to a specific mode.
A mode could access A mode could access
– A particular set of r0-r12 13 ( t k i t ) – r13 (sp, stack pointer) – r14 (lr, link register)
15 ( )
– r15 (pc, program counter)
– Current program status register (cpsr) – The uses of r0-r13 are orthogonal
General-purpose registers
0 8 7
16 15 24 23
31
8-bit Byte 8 bit Byte 16-bit Half word 32-bit word
• 6 data types (signed/unsigned)
• 6 data types (signed/unsigned)
• All ARM operations are 32-bit. Shorter data t l t d b d t t f
types are only supported by data transfer
operations.
Program counter
• Store the address of the instruction to be executed
executed
• All instructions are 32-bit wide and word- li d
aligned
• Thus, the last two bits of pc are undefined.
Program status register (CPSR)
mode bits
overflow Thumb state
carry/borrow zero
FIQ disable IRQ disable negative
Q
Processor modes
Register organization
Instruction sets
• ARM/Thumb/Jazelle
Pipeline
ARM7 ARM9 ARM9
In execution pc always 8 bytes ahead
In execution, pc always 8 bytes ahead
Pipeline
• Execution of a branch or direct modification of pc causes ARM core to flush its pipeline
pc causes ARM core to flush its pipeline
• ARM10 starts to use branch prediction
• An instruction in the execution stage will
complete even though an interrupt has been
raised. Other instructions in the pipeline are
abondond.
Interrupts
Vector table
:
Interrupt handlers handlers
code