• 沒有找到結果。

1990 ARM (Ad d RISC M hi ) d b

N/A
N/A
Protected

Academic year: 2022

Share "1990 ARM (Ad d RISC M hi ) d b "

Copied!
26
0
0

加載中.... (立即查看全文)

全文

(1)

ARM Architecture

Computer Organization and Assembly Languages p g z y g g Yung-Yu Chuang

with slides by Peng-Sheng Chen, Ville Pietikainen

(2)

ARM history

• 1983 developed by Acorn computers

T l 6502 i BBC t

– To replace 6502 in BBC computers – 4-man VLSI design team

It i li it f th i i t

– Its simplicity comes from the inexperience team

– Match the needs for generalized SoC for reasonable power performance and die size

power, performance and die size

– The first commercial RISC implemenation

1990 ARM (Ad d RISC M hi ) d b

• 1990 ARM (Advanced RISC Machine), owned by

Acorn, Apple and VLSI

(3)

ARM Ltd

Design and license ARM core design but not fabricate

(4)

Why ARM?

• One of the most licensed and thus widespread processor cores in the world

processor cores in the world

– Used in PDA, cell phones, multimedia players, handheld game console digital TV and cameras handheld game console, digital TV and cameras – ARM7: GBA, iPod

ARM9: NDS PSP Sony Ericsson BenQ – ARM9: NDS, PSP, Sony Ericsson, BenQ – ARM11: Apple iPhone, Nokia N93, N800

90% of 32 bit embedded RISC processors till 2009 – 90% of 32-bit embedded RISC processors till 2009

• Used especially in portable devices due to its

l ti d bl

low power consumption and reasonable

performance

(5)

ARM powered products

(6)

ARM processors

• A simple but powerful design

A h l f il f d i h i i il d i

• A whole family of designs sharing similar design

principles and a common instruction set

(7)

Naming ARM

• ARMxyzTDMIEJFS

– x: series – x: series – y: MMU

z: cache – z: cache – T: Thumb

D: debugger – D: debugger – M: Multiplier

I: EmbeddedICE (built in debugger hardware) – I: EmbeddedICE (built-in debugger hardware) – E: Enhanced instruction

J J ll (JVM) – J: Jazelle (JVM) – F: Floating-point

S S th i ibl i ( d i f EDA – S: Synthesizible version (source code version for EDA

tools)

(8)

Popular ARM architectures

• ARM7TDMI

3 i li t (f t h/d d / t ) – 3 pipeline stages (fetch/decode/execute) – High code density/low power consumption

O f th t d ARM i (f l d – One of the most used ARM-version (for low-end

systems)

All ARM cores after ARM7TDMI include TDMI even if – All ARM cores after ARM7TDMI include TDMI even if

they do not include TDMI in their labels

ARM9TDMI

• ARM9TDMI

– Compatible with ARM7

5 t (f t h/d d / t / / it )

– 5 stages (fetch/decode/execute/memory/write) – Separate instruction and data cache

• ARM11

(9)

ARM family comparison

year 1995 1997 1999 2003

(10)

ARM is a RISC

• RISC: simple but powerful instructions that

execute within a single cycle at high clock speed execute within a single cycle at high clock speed.

• Four major design rules:

– Instructions: reduced set/single cycle/fixed length – Pipeline: decode in one stage/no need for microcode – Registers: a large set of general-purpose registers

– Load/store architecture: data processing instructions apply to registers only; load/store to transfer data from memory

R l i i l d i d f l k

• Results in simple design and fast clock rate

• The distinction blurs because CISC implements

RISC concepts

(11)

ARM design philosophy

• Small processor for lower power consumption (for embedded system)

(for embedded system)

• High code density for limited memory and h i l i t i ti

physical size restrictions

• The ability to use slow and low-cost memory

• Reduced die size for reducing manufacture cost

and accommodating more peripherals g p p

(12)

ARM features

• Different from pure RISC in several ways:

V i bl l ti f t i i t ti

– Variable cycle execution for certain instructions:

multiple-register load/store (faster/higher code density)

density)

– Inline barrel shifter leading to more complex

instructions: improves performance and code density instructions: improves performance and code density – Thumb 16-bit instruction set: 30% code density

improvementp

– Conditional execution: improve performance and code density by reducing branch

– Enhanced instructions: DSP instructions

(13)

ARM architecture

(14)

ARM architecture

• Load/store architecture architecture

• A large array of

if i t

uniform registers

• Fixed-length 32-bit instructions

• 3-address instructions

(15)

Registers

• Only 16 registers are visible to a specific mode.

A mode could access A mode could access

– A particular set of r0-r12 13 ( t k i t ) – r13 (sp, stack pointer) – r14 (lr, link register)

15 ( )

– r15 (pc, program counter)

– Current program status register (cpsr) – The uses of r0-r13 are orthogonal

(16)

General-purpose registers

0 8 7

16 15 24 23

31

8-bit Byte 8 bit Byte 16-bit Half word 32-bit word

• 6 data types (signed/unsigned)

• 6 data types (signed/unsigned)

• All ARM operations are 32-bit. Shorter data t l t d b d t t f

types are only supported by data transfer

operations.

(17)

Program counter

• Store the address of the instruction to be executed

executed

• All instructions are 32-bit wide and word- li d

aligned

• Thus, the last two bits of pc are undefined.

(18)

Program status register (CPSR)

mode bits

overflow Thumb state

carry/borrow zero

FIQ disable IRQ disable negative

Q

(19)

Processor modes

(20)

Register organization

(21)

Instruction sets

• ARM/Thumb/Jazelle

(22)

Pipeline

ARM7 ARM9 ARM9

In execution pc always 8 bytes ahead

In execution, pc always 8 bytes ahead

(23)

Pipeline

• Execution of a branch or direct modification of pc causes ARM core to flush its pipeline

pc causes ARM core to flush its pipeline

• ARM10 starts to use branch prediction

• An instruction in the execution stage will

complete even though an interrupt has been

raised. Other instructions in the pipeline are

abondond.

(24)

Interrupts

Vector table

:

Interrupt handlers handlers

code

(25)

Interrupts

(26)

References

參考文獻

相關文件

Full ascending (FA) LDMFA LDMDA STMFA STMIB Full descending (FD) LDMFD LDMIA STMFD STMDB Full descending (FD) LDMFD LDMIA STMFD STMDB Empty ascending (EA) LDMEA LDMDB STMEA STMIA E t

Hence, code for each non zero AC coefficient is composed of a basecode (corresponding to runlength/category) and a code corresponding to offset in.. Standard tables vs

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 1: Compiler II: Code Generation slide

 Extend the syntax analyzer into a full-blown compiler that, instead of passive XML code, generates executable VM code.  Two challenges: (a) handling data, and (b)

• tiny (a single segment, used by .com programs), small (one code segment and one data segment), medium (multiple code segments and a single data segment), compact (one code

• Performance: vectorized code often runs much faster than the corresponding code containing loops.. Zheng-Liang

a single instruction.. Thus, the operand can be modified before it can be modified before it is used. Useful for fast multipliation and dealing p g with lists, table and other

• 57 MMX instructions are defined to perform the parallel operations on multiple data elements parallel operations on multiple data elements packed into 64-bit data types.. Th i l