• 沒有找到結果。

GNU compiler and binutils

N/A
N/A
Protected

Academic year: 2022

Share "GNU compiler and binutils"

Copied!
16
0
0

加載中.... (立即查看全文)

全文

(1)

ARM Assembly Programming II

Computer Organization and Assembly Languages Yung-Yu Chuang

2007/11/26

with slides by Peng-Sheng Chen

GNU compiler and binutils

• HAM uses GNU compiler and binutils – gcc: GNU C compiler

– as: GNU assembler – ld: GNU linker

– gdb: GNU project debugger

– insight: a (Tcl/Tk) graphic interface to gdb

Pipeline

• COFF (common object file format)

• ELF (extended linker format)

• Segments in the object file

– Text: code

– Data: initialized global variables – BSS: uninitialized global variables

.c .elf

C source executable

gcc .s asm source

as

.coff object file

ld Simulator

Debugger

GAS program format

.file “test.s”

.text

.global main

.type main, %function main:

MOV R0, #100 ADD R0, R0, R0 SWI #11

.end

(2)

GAS program format

.file “test.s”

.text

.global main

.type main, %function main:

MOV R0, #100 ADD R0, R0, R0 SWI #11

.end

export variable

signals the end of the program

set the type of a symbol to be either a function or an object

call interrupt to end the program

ARM assembly program

main:

LDR R1, value @ load value STR R1, result

SWI #11

value: .word 0x0000C123 result: .word 0

label operation operand comments

Control structures

• Program is to implement algorithms to solve problems. Program decomposition and flow of control are important concepts to express algorithms.

• Flow of control:

– Sequence.

– Decision: if-then-else, switch

– Iteration: repeat-until, do-while, for

• Decomposition: split a problem into several smaller and manageable ones and solve them independently.

(subroutines/functions/procedures)

Decision

• If-then-else

• switch

(3)

If statements

if then else

BNE else

B endif else:

endif:

C T E

C

T

E

// find maximum

if (R0>R1) then R2:=R0 else R2:=R1

If statements

if then else

BNE else

B endif else:

endif:

C T E

C

T

E

// find maximum

if (R0>R1) then R2:=R0 else R2:=R1

CMP R0, R1 BLE else MOV R2, R0 B endif else: MOV R2, R1 endif:

If statements

Two other options:

CMP R0, R1 MOVGT R2, R0 MOVLE R2, R1

MOV R2, R0 CMP R0, R1 MOVLE R2, R1

// find maximum

if (R0>R1) then R2:=R0 else R2:=R1

CMP R0, R1 BLE else MOV R2, R0 B endif else: MOV R2, R1 endif:

If statements

if (R1==1 || R1==5 || R1==12) R0=1;

TEQ R1, #1 ...

TEQNE R1, #5 ...

TEQNE R1, #12 ...

MOVEQ R0, #1 BNE fail

(4)

If statements

if (R1==0) zero else if (R1>0) plus else if (R1<0) neg

TEQ R1, #0 BMI neg BEQ zero BPL plus neg: ...

B exit Zero: ...

B exit ...

If statements

R0=abs(R0)

TEQ R0, #0 RSBMI R0, R0, #0

Multi-way branches

CMP R0, #`0’

BCC other @ less than ‘0’

CMP R0, #`9’

BLS digit @ between ‘0’ and ‘9’

CMP R0, #`A’

BCC other CMP R0, #`Z’

BLS letter @ between ‘A’ and ‘Z’

CMP R0, #`a’

BCC other CMP R0, #`z’

BHI other @ not between ‘a’ and ‘z’

letter: ...

Switch statements

switch (exp) {

case c1: S1; break;

case c2: S2; break;

...

case cN: SN; break;

default: SD;

}

e=exp;

if (e==c1) {S1}

else

if (e==c2) {S2}

else ...

(5)

Switch statements

switch (R0) {

case 0: S0; break;

case 1: S1; break;

case 2: S2; break;

case 3: S3; break;

default: err;

}

CMP R0, #0 BEQ S0 CMP R0, #1 BEQ S1 CMP R0, #2 BEQ S2 CMP R0, #3 BEQ S3 err: ...

B exit S0: ...

B exit The range is between 0 and N

Slow if N is large

Switch statements

ADR R1, JMPTBL CMP R0, #3

LDRLS PC, [R1, R0, LSL #2]

err:...

B exit S0: ...

JMPTBL:

.word S0 .word S1 .word S2 .word S3

S0 S1 S2 S3 JMPTBL

R1 R0

For larger N and sparse values, we could use a hash function.

What if the range is between M and N?

Iteration

• repeat-until

• do-while

• for

repeat loops

do { } while ( )

loop:

BEQ loop endw:

C S

C

S

(6)

while loops

while ( ) { }

loop:

BNE endw

B loop endw:

C S

C

S

B test

loop:

test:

BEQ loop endw:

C S

while loops

while ( ) { }

B test

loop:

test:

BEQ loop endw:

C S

C S

BNE endw

loop:

test:

BEQ loop endw:

C S C

GCD

int gcd (int i, int j) {

while (i!=j) {

if (i>j) i -= j;

else

j -= i;

} }

GCD

Loop: CMP R1, R2 SUBGT R1, R1, R2 SUBLT R2, R2, R1 BNE loop

(7)

for loops

for ( ; ; ) { }

loop:

BNE endfor

B loop endfor:

I C A S

C

S A I

for (i=0; i<10; i++) { a[i]:=0; }

for loops

for ( ; ; ) { }

loop:

BNE endfor

B loop endfor:

MOV R0, #0 ADR R2, A MOV R1, #0 loop: CMP R1, #10

BGE endfor

STR R0,[R2,R1,LSL #2]

ADD R1, R1, #1 B loop

endfor:

I C A S

C

S A I

for (i=0; i<10; i++) { a[i]:=0; }

for loops

MOV R1, #0 loop: CMP R1, #10

BGE endfor

@ do something ADD R1, R1, #1 B loop

endfor:

for (i=0; i<10; i++) { do something; }

MOV R1, #10 loop:

@ do something SUBS R1, R1, #1 BNE loop

endfor:

Execute a loop for a constant of times.

Procedures

• Arguments: expressions passed into a function

• Parameters: values received by the function

• Caller and callee

void func(int a, int b) {

...

}

int main(void) {

func(100,200);

...

}

arguments parameters callee

caller

(8)

Procedures

• How to pass arguments? By registers? By stack?

By memory? In what order?

main:

...

BL func ...

.end

func:

...

...

.end

Procedures

• How to pass arguments? By registers? By stack?

By memory? In what order?

• Who should save R5? Caller? Callee?

main:

@ use R5 BL func

@ use R5 ...

...

.end

func:

...

@ use R5 ...

...

.end

caller callee

Procedures (caller save)

• How to pass arguments? By registers? By stack?

By memory? In what order?

• Who should save R5? Caller? Callee?

main:

@ use R5

@ save R5 BL func

@ restore R5

@ use R5 .end

func:

...

@ use R5

.end

caller callee

Procedures (callee save)

• How to pass arguments? By registers? By stack?

By memory? In what order?

• Who should save R5? Caller? Callee?

main:

@ use R5 BL func

@ use R5

.end

func: @ save R5 ...

@ use R5

@restore R5 .end

caller callee

(9)

Procedures

• How to pass arguments? By registers? By stack?

By memory? In what order?

• Who should save R5? Caller? Callee?

• We need a protocol for these.

main:

@ use R5 BL func

@ use R5 ...

...

.end

func:

...

@ use R5 ...

...

.end

caller callee

ARM Procedure Call Standard (APCS)

• ARM Ltd. defines a set of rules for procedure entry and exit so that

– Object codes generated by different compilers can be linked together

– Procedures can be called between high-level languages and assembly

• APCS defines

– Use of registers – Use of stack

– Format of stack-based data structure – Mechanism for argument passing

APCS register usage convention

Register APCS name APCS role

0 a1 Argument 1 / integer result / scratch register 1 a2 Argument 2 / scratch register

2 a3 Argument 3 / scratch register 3 a4 Argument 4 / scratch register

4 v1 Register variable 1

5 v2 Register variable 2

6 v3 Register variable 3

7 v4 Register variable 4

8 v5 Register variable 5

9 sb/v6 Static base / register variable 6 10 sl/v7 Stack limit / register variable 7

11 fp Frame pointer

12 ip Scratch reg. / new sb in inter-link-unit calls 13 sp Lower end of current stack frame

14 lr Link address / scratch register

15 pc Program counter

APCS register usage convention

Register APCS name APCS role

0 a1 Argument 1 / integer result / scratch register 1 a2 Argument 2 / scratch register

2 a3 Argument 3 / scratch register 3 a4 Argument 4 / scratch register

4 v1 Register variable 1

5 v2 Register variable 2

6 v3 Register variable 3

7 v4 Register variable 4

8 v5 Register variable 5

9 sb/v6 Static base / register variable 6 10 sl/v7 Stack limit / register variable 7

11 fp Frame pointer

12 ip Scratch reg. / new sb in inter-link-unit calls 13 sp Lower end of current stack frame

14 lr Link address / scratch register

15 pc Program counter

• Used to pass the first 4 parameters

• Caller-saved if necessary

(10)

APCS register usage convention

Register APCS name APCS role

0 a1 Argument 1 / integer result / scratch register 1 a2 Argument 2 / scratch register

2 a3 Argument 3 / scratch register 3 a4 Argument 4 / scratch register

4 v1 Register variable 1

5 v2 Register variable 2

6 v3 Register variable 3

7 v4 Register variable 4

8 v5 Register variable 5

9 sb/v6 Static base / register variable 6 10 sl/v7 Stack limit / register variable 7

11 fp Frame pointer

12 ip Scratch reg. / new sb in inter-link-unit calls 13 sp Lower end of current stack frame

14 lr Link address / scratch register

15 pc Program counter

• Register variables, must return

unchanged

• Callee-saved

APCS register usage convention

Register APCS name APCS role

0 a1 Argument 1 / integer result / scratch register 1 a2 Argument 2 / scratch register

2 a3 Argument 3 / scratch register 3 a4 Argument 4 / scratch register

4 v1 Register variable 1

5 v2 Register variable 2

6 v3 Register variable 3

7 v4 Register variable 4

8 v5 Register variable 5

9 sb/v6 Static base / register variable 6 10 sl/v7 Stack limit / register variable 7

11 fp Frame pointer

12 ip Scratch reg. / new sb in inter-link-unit calls 13 sp Lower end of current stack frame

14 lr Link address / scratch register

15 pc Program counter

• Registers for special purposes

• Could be used as temporary variables if saved properly.

Argument passing

• The first four word arguments are passed through R0 to R3.

• Remaining parameters are pushed into stack in the reverse order.

• Procedures with less than four parameters are more effective.

Return value

• One word value in R0

• A value of length 2~4 words (R0-R1, R0-R2, R0-

R3)

(11)

Function entry/exit

• A simple leaf function with less than four parameters has the minimal overhead. 50% of calls are to leaf functions

BL leaf1 ...

leaf1: ...

...

MOV PC, LR @ return

main

leaf leaf

leaf

leaf

Function entry/exit

• Save a minimal set of temporary variables

BL leaf2 ...

leaf2: STMFD sp!, {regs, lr} @ save ...

LDMFD sp!, {regs, pc} @ restore and

@ return

Standard ARM C program address space

code static data

heap

stack application

load address

top of memory

application image top of application

top of heap

stack pointer (sp) stack limit (sl)

Accessing operands

• A procedure often accesses operand in the following ways

– An argument passed on a register: no further work – An argument passed on the stack: use stack pointer

(R13) relative addressing with an immediate offset known at compiling time

– A constant: PC-relative addressing, offset known at compiling time

– A local variable: allocate on the stack and access through stack pointer relative addressing

– A global variable: allocated in the static area and can be accessed by the static base relative (R9) addressing

(12)

Procedure

main:

LDR R0, #0 ...

BL func ...

low

high stack

Procedure

func: STMFD SP!, {R4-R6, LR}

SUB SP, SP, #0xC ...

STR R0, [SP, #0] @ v1=a1

...

ADD SP, SP, #0xC LDMFD SP!, {R4-R6, PC}

R4 R5 R6 LR v1 v2 v3 low

high stack

Block copy example

void bcopy(char *to, char *from, int n) {

while (n--)

*to++ = *from++;

}

Block copy example

@ arguments: R0: to, R1: from, R2: n bcopy: TEQ R2, #0

BEQ end

loop: SUB R2, R2, #1 LDRB R3, [R1], #1 STRB R3, [R0], #1 B bcopy

end: MOV PC, LR

(13)

Block copy example

@ arguments: R0: to, R1: from, R2: n

@ rewrite “n–-” as “-–n>=0”

bcopy: SUBS R2, R2, #1 LDRPLB R3, [R1], #1 STRPLB R3, [R0], #1 BPL bcopy

MOV PC, LR

Block copy example

@ arguments: R0: to, R1: from, R2: n

@ assume n is a multiple of 4; loop unrolling bcopy: SUBS R2, R2, #4

LDRPLB R3, [R1], #1 STRPLB R3, [R0], #1 LDRPLB R3, [R1], #1 STRPLB R3, [R0], #1 LDRPLB R3, [R1], #1 STRPLB R3, [R0], #1 LDRPLB R3, [R1], #1 STRPLB R3, [R0], #1 BPL bcopy

MOV PC, LR

Block copy example

@ arguments: R0: to, R1: from, R2: n

@ n is a multiple of 16;

bcopy: SUBS R2, R2, #16 LDRPL R3, [R1], #4 STRPL R3, [R0], #4 LDRPL R3, [R1], #4 STRPL R3, [R0], #4 LDRPL R3, [R1], #4 STRPL R3, [R0], #4 LDRPL R3, [R1], #4 STRPL R3, [R0], #4 BPL bcopy

MOV PC, LR

Block copy example

@ arguments: R0: to, R1: from, R2: n

@ n is a multiple of 16;

bcopy: SUBS R2, R2, #16 LDMPL R1!, {R3-R6}

STMPL R0!, {R3-R6}

BPL bcopy MOV PC, LR

@ could be extend to copy 40 byte at a time

@ if not multiple of 40, add a copy_rest loop

(14)

Search example

int main(void) {

int a[10]={7,6,4,5,5,1,3,2,9,8};

int i;

int s=4;

for (i=0; i<10; i++) if (s==a[i]) break;

if (i>=10) return -1;

else return i;

}

Search

.section .rodata .LC0:

.word 7 .word 6 .word 4 .word 5 .word 5 .word 1 .word 3 .word 2 .word 9 .word 8

Search

.text

.global main

.type main, %function main: sub sp, sp, #48

adr r4, L9 @ =.LC0 add r5, sp, #8

ldmia r4!, {r0, r1, r2, r3}

stmia r5!, {r0, r1, r2, r3}

ldmia r4!, {r0, r1, r2, r3}

stmia r5!, {r0, r1, r2, r3}

ldmia r4!, {r0, r1}

stmia r5!, {r0, r1}

: a[9]

s i a[0]

low

high stack

Search

mov r3, #4

str r3, [sp, #0] @ s=4 mov r3, #0

str r3, [sp, #4] @ i=0 loop: ldr r0, [sp, #4] @ r0=i

cmp r0, #10 @ i<10?

bge end

ldr r1, [sp, #0] @ r1=s mov r2, #4

mul r3, r0, r2 add r3, r3, #8

ldr r4, [sp, r3] @ r4=a[i]

: a[9]

s i a[0]

low

high stack

(15)

Search

teq r1, r4 @ test if s==a[i]

beq end

add r0, r0, #1 @ i++

str r0, [sp, #4] @ update i b loop

end: str r0, [sp, #4]

cmp r0, #10 movge r0, #-1 add sp, sp, #48 mov pc, lr

: a[9]

s i a[0]

low

high stack

Optimization

• Remove unnecessary load/store

• Remove loop invariant

• Use addressing mode

• Use conditional execution

Search (remove load/store)

mov r3, #4

str r3, [sp, #0] @ s=4 mov r3, #0

str r3, [sp, #4] @ i=0 loop: ldr r0, [sp, #4] @ r0=i

cmp r0, #10 @ i<10?

bge end

ldr r1, [sp, #0] @ r1=s mov r2, #4

mul r3, r0, r2 add r3, r3, #8

ldr r4, [sp, r3] @ r4=a[i]

: a[9]

s i a[0]

low

high stack r1,

r0,

Search (remove load/store)

teq r1, r4 @ test if s==a[i]

beq end

add r0, r0, #1 @ i++

str r0, [sp, #4] @ update i b loop

end: str r0, [sp, #4]

cmp r0, #10 movge r0, #-1 add sp, sp, #48 mov pc, lr

: a[9]

s i a[0]

low

high stack

(16)

Search (loop invariant/addressing mode)

mov r3, #4

str r3, [sp, #0] @ s=4 mov r3, #0

str r3, [sp, #4] @ i=0 loop: ldr r0, [sp, #4] @ r0=i

cmp r0, #10 @ i<10?

bge end

ldr r1, [sp, #0] @ r1=s mov r2, #4

mul r3, r0, r2 add r3, r3, #8

ldr r4, [sp, r3] @ r4=a[i]

: a[9]

s i a[0]

low

high stack r1,

r0,

mov r2, sp, #8

ldr r4, [r2, r0, LSL #2]

Search (conditional execution)

teq r1, r4 @ test if s==a[i]

beq end

add r0, r0, #1 @ i++

str r0, [sp, #4] @ update i b loop

end: str r0, [sp, #4]

cmp r0, #10 movge r0, #-1 add sp, sp, #48 mov pc, lr

: a[9]

s i a[0]

low

high stack addeq

beq

Optimization

• Remove unnecessary load/store

• Remove loop invariant

• Use addressing mode

• Use conditional execution

• From 22 words to 13 words and execution time

is greatly reduced.

參考文獻

相關文件

• The  ArrayList class is an example of a  collection class. • Starting with version 5.0, Java has added a  new kind of for loop called a for each

– An argument passed on the stack: use stack pointer (R13) relative addressing with an immediate offset known at compiling time. known at

Note: Automatic variables are allocated within the stack frames.

The static, private, local and argument variables are mapped by the compiler on the four memory segments static , this , local , argument. In addition, there are four

Nonsmooth regularization induces sparsity in the solution, avoids oversmoothing signals, and is useful for variable selection.. The regularized problem can be solved effectively by

For a decreasing function, using left endpoints gives us an overestimate and using right endpoints results in an underestimate.. We will use  6 to get

Part (a) can be established based on the density formula for variable transforma- tions.. Part (b) can be established with the moment

When a solution curve crosses one of these lines, it has a local maximum or