• 沒有找到結果。

GNU compiler and binutils

N/A
N/A
Protected

Academic year: 2022

Share "GNU compiler and binutils"

Copied!
63
0
0

加載中.... (立即查看全文)

全文

(1)

ARM Assembly Programming II

Computer Organization and Assembly Languages Yung-Yu Chuang

2007/11/26

with slides by Peng-Sheng Chen

(2)

GNU compiler and binutils

• HAM uses GNU compiler and binutils – gcc: GNU C compiler

– as: GNU assembler – ld: GNU linker

– gdb: GNU project debugger

– insight: a (Tcl/Tk) graphic interface to gdb

(3)

Pipeline

• COFF (common object file format)

• ELF (extended linker format)

• Segments in the object file

– Text: code

– Data: initialized global variables – BSS: uninitialized global variables

.c .elf

C source executable

gcc

.s

asm source as

.coff object file

ld Simulator

Debugger

(4)

GAS program format

.file “test.s”

.text

.global main

.type main, %function main:

MOV R0, #100 ADD R0, R0, R0 SWI #11

.end

(5)

GAS program format

.file “test.s”

.text

.global main

.type main, %function main:

MOV R0, #100 ADD R0, R0, R0 SWI #11

.end

export variable

signals the end of the program

set the type of a symbol to be

either a function or an object

call interrupt to end the program

(6)

ARM assembly program

main:

LDR R1, value @ load value STR R1, result

SWI #11

value: .word 0x0000C123 result: .word 0

label operation operand comments

(7)

Control structures

• Program is to implement algorithms to solve problems. Program decomposition and flow of control are important concepts to express

algorithms.

• Flow of control:

– Sequence.

– Decision: if-then-else, switch

– Iteration: repeat-until, do-while, for

• Decomposition: split a problem into several smaller and manageable ones and solve them independently.

(subroutines/functions/procedures)

(8)

Decision

• If-then-else

• switch

(9)

If statements

if then else

BNE else

B endif else:

endif:

C T E

C

T

E

// find maximum

if (R0>R1) then R2:=R0 else R2:=R1

(10)

If statements

if then else

BNE else

B endif else:

endif:

C T E

C

T

E

// find maximum

if (R0>R1) then R2:=R0 else R2:=R1

CMP R0, R1 BLE else MOV R2, R0 B endif else: MOV R2, R1 endif:

(11)

If statements

Two other options:

CMP R0, R1 MOVGT R2, R0 MOVLE R2, R1 MOV R2, R0 CMP R0, R1 MOVLE R2, R1

// find maximum

if (R0>R1) then R2:=R0 else R2:=R1

CMP R0, R1 BLE else MOV R2, R0 B endif else: MOV R2, R1 endif:

(12)

If statements

if (R1==1 || R1==5 || R1==12) R0=1;

TEQ R1, #1 ...

TEQNE R1, #5 ...

TEQNE R1, #12 ...

MOVEQ R0, #1 BNE fail

(13)

If statements

if (R1==0) zero

else if (R1>0) plus else if (R1<0) neg

TEQ R1, #0 BMI neg

BEQ zero BPL plus neg: ...

B exit Zero: ...

B exit ...

(14)

If statements

R0=abs(R0)

TEQ R0, #0

RSBMI R0, R0, #0

(15)

Multi-way branches

CMP R0, #`0’

BCC other @ less than ‘0’

CMP R0, #`9’

BLS digit @ between ‘0’ and ‘9’

CMP R0, #`A’

BCC other

CMP R0, #`Z’

BLS letter @ between ‘A’ and ‘Z’

CMP R0, #`a’

BCC other

CMP R0, #`z’

BHI other @ not between ‘a’ and ‘z’

letter: ...

(16)

Switch statements

switch (exp) {

case c1: S1; break;

case c2: S2; break;

...

case cN: SN; break;

default: SD;

}

e=exp;

if (e==c1) {S1}

else

if (e==c2) {S2}

else ...

(17)

Switch statements

switch (R0) {

case 0: S0; break;

case 1: S1; break;

case 2: S2; break;

case 3: S3; break;

default: err;

}

CMP R0, #0 BEQ S0

CMP R0, #1 BEQ S1

CMP R0, #2 BEQ S2

CMP R0, #3 BEQ S3

err: ...

B exit S0: ...

B exit The range is between 0 and N

Slow if N is large

(18)

Switch statements

ADR R1, JMPTBL CMP R0, #3

LDRLS PC, [R1, R0, LSL #2]

err:...

B exit S0: ...

JMPTBL:

.word S0 .word S1 .word S2 .word S3

S0 S1 S2 S3 JMPTBL

R1

R0

For larger N and sparse values, we could use a hash function.

What if the range is between M and N?

(19)

Iteration

• repeat-until

• do-while

• for

(20)

repeat loops

do { } while ( )

loop:

BEQ loop endw:

C S

C

S

(21)

while loops

while ( ) { }

loop:

BNE endw

B loop endw:

C S

C

S

B test

loop:

test:

BEQ loop endw:

C

S

(22)

while loops

while ( ) { }

B test

loop:

test:

BEQ loop endw:

C S

C S

BNE endw

loop:

test:

BEQ loop endw:

C

S

C

(23)

GCD

int gcd (int i, int j) {

while (i!=j) {

if (i>j) i -= j;

else

j -= i;

} }

(24)

GCD

Loop: CMP R1, R2

SUBGT R1, R1, R2 SUBLT R2, R2, R1 BNE loop

(25)

for loops

for ( ; ; ) { }

loop:

BNE endfor

B loop endfor:

I C A S

C

S A I

for (i=0; i<10; i++) { a[i]:=0; }

(26)

for loops

for ( ; ; ) { }

loop:

BNE endfor

B loop endfor:

MOV R0, #0 ADR R2, A MOV R1, #0 loop: CMP R1, #10

BGE endfor

STR R0,[R2,R1,LSL #2]

ADD R1, R1, #1 B loop

endfor:

I C A S

C

S A I

for (i=0; i<10; i++) { a[i]:=0; }

(27)

for loops

MOV R1, #0 loop: CMP R1, #10

BGE endfor

@ do something ADD R1, R1, #1 B loop

endfor:

for (i=0; i<10; i++) { do something; }

MOV R1, #10 loop:

@ do something SUBS R1, R1, #1 BNE loop

endfor:

Execute a loop for a constant of times.

(28)

Procedures

• Arguments: expressions passed into a function

• Parameters: values received by the function

• Caller and callee

void func(int a, int b) {

...

}

int main(void) {

func(100,200);

...

}

arguments parameters callee

caller

(29)

Procedures

• How to pass arguments? By registers? By stack?

By memory? In what order?

main:

...

BL func ...

.end

func:

...

...

.end

(30)

Procedures

• How to pass arguments? By registers? By stack?

By memory? In what order?

• Who should save R5? Caller? Callee?

main:

@ use R5 BL func

@ use R5 ...

...

.end

func:

...

@ use R5 ...

...

.end

caller callee

(31)

Procedures (caller save)

• How to pass arguments? By registers? By stack?

By memory? In what order?

• Who should save R5? Caller? Callee?

main:

@ use R5

@ save R5 BL func

@ restore R5

@ use R5 .end

func:

...

@ use R5

.end

caller callee

(32)

Procedures (callee save)

• How to pass arguments? By registers? By stack?

By memory? In what order?

• Who should save R5? Caller? Callee?

main:

@ use R5 BL func

@ use R5

.end

func: @ save R5 ...

@ use R5

@restore R5 .end

caller callee

(33)

Procedures

• How to pass arguments? By registers? By stack?

By memory? In what order?

• Who should save R5? Caller? Callee?

• We need a protocol for these.

main:

@ use R5 BL func

@ use R5 ...

...

.end

func:

...

@ use R5 ...

...

.end

caller callee

(34)

ARM Procedure Call Standard (APCS)

• ARM Ltd. defines a set of rules for procedure entry and exit so that

– Object codes generated by different compilers can be linked together

– Procedures can be called between high-level languages and assembly

• APCS defines

– Use of registers – Use of stack

– Format of stack-based data structure – Mechanism for argument passing

(35)

APCS register usage convention

Register APCS name APCS role

0 a1 Argument 1 / integer result / scratch register 1 a2 Argument 2 / scratch register

2 a3 Argument 3 / scratch register 3 a4 Argument 4 / scratch register

4 v1 Register variable 1

5 v2 Register variable 2

6 v3 Register variable 3

7 v4 Register variable 4

8 v5 Register variable 5

9 sb/v6 Static base / register variable 6 10 sl/v7 Stack limit / register variable 7

11 fp Frame pointer

12 ip Scratch reg. / new sb in inter-link-unit calls 13 sp Lower end of current stack frame

14 lr Link address / scratch register

15 pc Program counter

(36)

APCS register usage convention

Register APCS name APCS role

0 a1 Argument 1 / integer result / scratch register 1 a2 Argument 2 / scratch register

2 a3 Argument 3 / scratch register 3 a4 Argument 4 / scratch register

4 v1 Register variable 1

5 v2 Register variable 2

6 v3 Register variable 3

7 v4 Register variable 4

8 v5 Register variable 5

9 sb/v6 Static base / register variable 6 10 sl/v7 Stack limit / register variable 7

11 fp Frame pointer

12 ip Scratch reg. / new sb in inter-link-unit calls 13 sp Lower end of current stack frame

14 lr Link address / scratch register

15 pc Program counter

• Used to pass the first 4 parameters

• Caller-saved if necessary

(37)

APCS register usage convention

Register APCS name APCS role

0 a1 Argument 1 / integer result / scratch register 1 a2 Argument 2 / scratch register

2 a3 Argument 3 / scratch register 3 a4 Argument 4 / scratch register

4 v1 Register variable 1

5 v2 Register variable 2

6 v3 Register variable 3

7 v4 Register variable 4

8 v5 Register variable 5

9 sb/v6 Static base / register variable 6 10 sl/v7 Stack limit / register variable 7

11 fp Frame pointer

12 ip Scratch reg. / new sb in inter-link-unit calls 13 sp Lower end of current stack frame

14 lr Link address / scratch register

15 pc Program counter

• Register variables, must return

unchanged

• Callee-saved

(38)

APCS register usage convention

Register APCS name APCS role

0 a1 Argument 1 / integer result / scratch register 1 a2 Argument 2 / scratch register

2 a3 Argument 3 / scratch register 3 a4 Argument 4 / scratch register

4 v1 Register variable 1

5 v2 Register variable 2

6 v3 Register variable 3

7 v4 Register variable 4

8 v5 Register variable 5

9 sb/v6 Static base / register variable 6 10 sl/v7 Stack limit / register variable 7

11 fp Frame pointer

12 ip Scratch reg. / new sb in inter-link-unit calls 13 sp Lower end of current stack frame

14 lr Link address / scratch register

15 pc Program counter

• Registers for special purposes

• Could be used as

temporary variables if saved properly.

(39)

Argument passing

• The first four word arguments are passed through R0 to R3.

• Remaining parameters are pushed into stack in the reverse order.

• Procedures with less than four parameters are

more effective.

(40)

Return value

• One word value in R0

• A value of length 2~4 words (R0-R1, R0-R2, R0-

R3)

(41)

Function entry/exit

• A simple leaf function with less than four

parameters has the minimal overhead. 50% of calls are to leaf functions

BL leaf1 ...

leaf1: ...

...

MOV PC, LR @ return

main

leaf leaf

leaf

leaf

(42)

Function entry/exit

• Save a minimal set of temporary variables

BL leaf2 ...

leaf2: STMFD sp!, {regs, lr} @ save ...

LDMFD sp!, {regs, pc} @ restore and

@ return

(43)

Standard ARM C program address space

code static data

heap

stack application

load address

top of memory

application image top of application

top of heap

stack pointer (sp) stack limit (sl)

(44)

Accessing operands

• A procedure often accesses operand in the following ways

– An argument passed on a register: no further work – An argument passed on the stack: use stack pointer

(R13) relative addressing with an immediate offset known at compiling time

– A constant: PC-relative addressing, offset known at compiling time

– A local variable: allocate on the stack and access through stack pointer relative addressing

– A global variable: allocated in the static area and can be accessed by the static base relative (R9) addressing

(45)

Procedure

main:

LDR R0, #0 ...

BL func ...

low

high stack

(46)

Procedure

func: STMFD SP!, {R4-R6, LR}

SUB SP, SP, #0xC ...

STR R0, [SP, #0] @ v1=a1 ...

ADD SP, SP, #0xC

LDMFD SP!, {R4-R6, PC}

R4 R5 R6 LR v1 v2 v3 low

high stack

(47)

Block copy example

void bcopy(char *to, char *from, int n) {

while (n--)

*to++ = *from++;

}

(48)

Block copy example

@ arguments: R0: to, R1: from, R2: n bcopy: TEQ R2, #0

BEQ end

loop: SUB R2, R2, #1 LDRB R3, [R1], #1 STRB R3, [R0], #1 B bcopy

end: MOV PC, LR

(49)

Block copy example

@ arguments: R0: to, R1: from, R2: n

@ rewrite “n–-” as “-–n>=0”

bcopy: SUBS R2, R2, #1 LDRPLB R3, [R1], #1 STRPLB R3, [R0], #1 BPL bcopy

MOV PC, LR

(50)

Block copy example

@ arguments: R0: to, R1: from, R2: n

@ assume n is a multiple of 4; loop unrolling bcopy: SUBS R2, R2, #4

LDRPLB R3, [R1], #1 STRPLB R3, [R0], #1 LDRPLB R3, [R1], #1 STRPLB R3, [R0], #1 LDRPLB R3, [R1], #1 STRPLB R3, [R0], #1 LDRPLB R3, [R1], #1 STRPLB R3, [R0], #1 BPL bcopy

MOV PC, LR

(51)

Block copy example

@ arguments: R0: to, R1: from, R2: n

@ n is a multiple of 16;

bcopy: SUBS R2, R2, #16 LDRPL R3, [R1], #4 STRPL R3, [R0], #4 LDRPL R3, [R1], #4 STRPL R3, [R0], #4 LDRPL R3, [R1], #4 STRPL R3, [R0], #4 LDRPL R3, [R1], #4 STRPL R3, [R0], #4 BPL bcopy

MOV PC, LR

(52)

Block copy example

@ arguments: R0: to, R1: from, R2: n

@ n is a multiple of 16;

bcopy: SUBS R2, R2, #16 LDMPL R1!, {R3-R6}

STMPL R0!, {R3-R6}

BPL bcopy MOV PC, LR

@ could be extend to copy 40 byte at a time

@ if not multiple of 40, add a copy_rest loop

(53)

Search example

int main(void) {

int a[10]={7,6,4,5,5,1,3,2,9,8};

int i;

int s=4;

for (i=0; i<10; i++) if (s==a[i]) break;

if (i>=10) return -1;

else return i;

}

(54)

Search

.section .rodata .LC0:

.word 7 .word 6 .word 4 .word 5 .word 5 .word 1 .word 3 .word 2 .word 9 .word 8

(55)

Search

.text

.global main

.type main, %function main: sub sp, sp, #48

adr r4, L9 @ =.LC0 add r5, sp, #8

ldmia r4!, {r0, r1, r2, r3}

stmia r5!, {r0, r1, r2, r3}

ldmia r4!, {r0, r1, r2, r3}

stmia r5!, {r0, r1, r2, r3}

ldmia r4!, {r0, r1}

stmia r5!, {r0, r1}

: a[9]

s i a[0]

low

high stack

(56)

Search

mov r3, #4

str r3, [sp, #0] @ s=4 mov r3, #0

str r3, [sp, #4] @ i=0 loop: ldr r0, [sp, #4] @ r0=i

cmp r0, #10 @ i<10?

bge end

ldr r1, [sp, #0] @ r1=s mov r2, #4

mul r3, r0, r2 add r3, r3, #8

ldr r4, [sp, r3] @ r4=a[i]

: a[9]

s i a[0]

low

high stack

(57)

Search

teq r1, r4 @ test if s==a[i]

beq end

add r0, r0, #1 @ i++

str r0, [sp, #4] @ update i b loop

end: str r0, [sp, #4]

cmp r0, #10 movge r0, #-1

add sp, sp, #48 mov pc, lr

: a[9]

s i a[0]

low

high stack

(58)

Optimization

• Remove unnecessary load/store

• Remove loop invariant

• Use addressing mode

• Use conditional execution

(59)

Search (remove load/store)

mov r3, #4

str r3, [sp, #0] @ s=4 mov r3, #0

str r3, [sp, #4] @ i=0 loop: ldr r0, [sp, #4] @ r0=i

cmp r0, #10 @ i<10?

bge end

ldr r1, [sp, #0] @ r1=s mov r2, #4

mul r3, r0, r2 add r3, r3, #8

ldr r4, [sp, r3] @ r4=a[i]

: a[9]

s i a[0]

low

high stack r1,

r0,

(60)

Search (remove load/store)

teq r1, r4 @ test if s==a[i]

beq end

add r0, r0, #1 @ i++

str r0, [sp, #4] @ update i b loop

end: str r0, [sp, #4]

cmp r0, #10 movge r0, #-1

add sp, sp, #48 mov pc, lr

: a[9]

s i a[0]

low

high stack

(61)

Search (loop invariant/addressing mode)

mov r3, #4

str r3, [sp, #0] @ s=4 mov r3, #0

str r3, [sp, #4] @ i=0 loop: ldr r0, [sp, #4] @ r0=i

cmp r0, #10 @ i<10?

bge end

ldr r1, [sp, #0] @ r1=s mov r2, #4

mul r3, r0, r2 add r3, r3, #8

ldr r4, [sp, r3] @ r4=a[i]

: a[9]

s i a[0]

low

high stack r1,

r0,

mov r2, sp, #8

ldr r4, [r2, r0, LSL #2]

(62)

Search (conditional execution)

teq r1, r4 @ test if s==a[i]

beq end

add r0, r0, #1 @ i++

str r0, [sp, #4] @ update i b loop

end: str r0, [sp, #4]

cmp r0, #10 movge r0, #-1

add sp, sp, #48 mov pc, lr

: a[9]

s i a[0]

low

high stack addeq

beq

(63)

Optimization

• Remove unnecessary load/store

• Remove loop invariant

• Use addressing mode

• Use conditional execution

• From 22 words to 13 words and execution time

is greatly reduced.

參考文獻

相關文件

The static, private, local and argument variables are mapped by the compiler on the four memory segments static , this , local , argument. In addition, there are four

Nonsmooth regularization induces sparsity in the solution, avoids oversmoothing signals, and is useful for variable selection.. The regularized problem can be solved effectively by

Monopolies in synchronous distributed systems (Peleg 1998; Peleg

Corollary 13.3. For, if C is simple and lies in D, the function f is analytic at each point interior to and on C; so we apply the Cauchy-Goursat theorem directly. On the other hand,

Corollary 13.3. For, if C is simple and lies in D, the function f is analytic at each point interior to and on C; so we apply the Cauchy-Goursat theorem directly. On the other hand,

In addressing the questions of its changing religious identities and institutional affiliations, the paper shows that both local and global factors are involved, namely, Puhua

Microphone and 600 ohm line conduits shall be mechanically and electrically connected to receptacle boxes and electrically grounded to the audio system ground point.. Lines in

final instance variable: accessed through instance, and assigned once (in declaration or every instance constructor) final instance method: cannot be overriden (≈ assigned once)