1990 ARM (Ad d RISC M hi ) d b

(1)

ARM Architecture

Computer Organization and Assembly Languages p g z y g g Yung-Yu Chuang

with slides by Peng-Sheng Chen, Ville Pietikainen

(2)

ARM history

• 1983 developed by Acorn computers

T l 6502 i BBC t

– To replace 6502 in BBC computers – 4-man VLSI design team

It i li it f th i i t

– Its simplicity comes from the inexperience team

– Match the needs for generalized SoC for reasonable power performance and die size

power, performance and die size

– The first commercial RISC implemenation

1990 ARM (Ad d RISC M hi ) d b

• 1990 ARM (Advanced RISC Machine), owned by

Acorn, Apple and VLSI

(3)

ARM Ltd

Design and license ARM core design but not fabricate

(4)

Why ARM?

• One of the most licensed and thus widespread processor cores in the world

processor cores in the world

– Used in PDA, cell phones, multimedia players, handheld game console digital TV and cameras handheld game console, digital TV and cameras – ARM7: GBA, iPod

ARM9: NDS PSP Sony Ericsson BenQ – ARM9: NDS, PSP, Sony Ericsson, BenQ – ARM11: Apple iPhone, Nokia N93, N800

90% of 32 bit embedded RISC processors till 2009 – 90% of 32-bit embedded RISC processors till 2009

• Used especially in portable devices due to its

l ti d bl

low power consumption and reasonable

performance

(5)

ARM powered products

(6)

ARM processors

• A simple but powerful design

A h l f il f d i h i i il d i

• A whole family of designs sharing similar design

principles and a common instruction set

(7)

Naming ARM

• ARMxyzTDMIEJFS

– x: series – x: series – y: MMU

z: cache – z: cache – T: Thumb

D: debugger – D: debugger – M: Multiplier

I: EmbeddedICE (built in debugger hardware) – I: EmbeddedICE (built-in debugger hardware) – E: Enhanced instruction

J J ll (JVM) – J: Jazelle (JVM) – F: Floating-point

S S th i ibl i ( d i f EDA – S: Synthesizible version (source code version for EDA

tools)

(8)

Popular ARM architectures

• ARM7TDMI

3 i li t (f t h/d d / t ) – 3 pipeline stages (fetch/decode/execute) – High code density/low power consumption

O f th t d ARM i (f l d – One of the most used ARM-version (for low-end

systems)

All ARM cores after ARM7TDMI include TDMI even if – All ARM cores after ARM7TDMI include TDMI even if

they do not include TDMI in their labels

ARM9TDMI

• ARM9TDMI

– Compatible with ARM7

5 t (f t h/d d / t / / it )

– 5 stages (fetch/decode/execute/memory/write) – Separate instruction and data cache

• ARM11

(9)

ARM family comparison

year 1995 1997 1999 2003

(10)

ARM is a RISC

• RISC: simple but powerful instructions that

execute within a single cycle at high clock speed execute within a single cycle at high clock speed.

• Four major design rules:

– Instructions: reduced set/single cycle/fixed length – Pipeline: decode in one stage/no need for microcode – Registers: a large set of general-purpose registers

– Load/store architecture: data processing instructions apply to registers only; load/store to transfer data from memory

R l i i l d i d f l k

• Results in simple design and fast clock rate

• The distinction blurs because CISC implements

RISC concepts

(11)

ARM design philosophy

• Small processor for lower power consumption (for embedded system)

(for embedded system)

• High code density for limited memory and h i l i t i ti

physical size restrictions

• The ability to use slow and low-cost memory

• Reduced die size for reducing manufacture cost

and accommodating more peripherals g p p

(12)

ARM features

• Different from pure RISC in several ways:

V i bl l ti f t i i t ti

– Variable cycle execution for certain instructions:

multiple-register load/store (faster/higher code density)

density)

– Inline barrel shifter leading to more complex

instructions: improves performance and code density instructions: improves performance and code density – Thumb 16-bit instruction set: 30% code density

improvementp

– Conditional execution: improve performance and code density by reducing branch

– Enhanced instructions: DSP instructions

(13)

ARM architecture

(14)

ARM architecture

• Load/store architecture architecture

• A large array of

if i t

uniform registers

• Fixed-length 32-bit instructions

• 3-address instructions

(15)

Registers

• Only 16 registers are visible to a specific mode.

A mode could access A mode could access

– A particular set of r0-r12 13 ( t k i t ) – r13 (sp, stack pointer) – r14 (lr, link register)

15 ( )

– r15 (pc, program counter)

– Current program status register (cpsr) – The uses of r0-r13 are orthogonal

(16)

General-purpose registers

0 8 7

16 15 24 23

31

8-bit Byte 8 bit Byte 16-bit Half word 32-bit word

• 6 data types (signed/unsigned)

• All ARM operations are 32-bit. Shorter data t l t d b d t t f

types are only supported by data transfer

operations.

(17)

Program counter

• Store the address of the instruction to be executed

executed

• All instructions are 32-bit wide and word- li d

aligned

• Thus, the last two bits of pc are undefined.

(18)

Program status register (CPSR)

mode bits

overflow Thumb state

carry/borrow zero

FIQ disable IRQ disable negative

Q

(19)

Processor modes

(20)

Register organization

(21)

Instruction sets

• ARM/Thumb/Jazelle

(22)

Pipeline

ARM7 ARM9 ARM9

In execution pc always 8 bytes ahead

In execution, pc always 8 bytes ahead

(23)

Pipeline

• Execution of a branch or direct modification of pc causes ARM core to flush its pipeline

pc causes ARM core to flush its pipeline

• ARM10 starts to use branch prediction

• An instruction in the execution stage will

complete even though an interrupt has been

raised. Other instructions in the pipeline are

abondond.

(24)

Interrupts

Vector table

:

Interrupt handlers handlers

code

(25)

Interrupts

(26)