Compiler II: Code Generation

55  Download (0)

Full text

(1)

www.nand2tetris.org

Building a Modern Computer From First Principles

Compiler II: Code Generation

(2)

Course map

Assembler Chapter 6

H.L. Language

&

Operating Sys.

abstract interface

Compiler

Chapters 10 - 11

VM Translator

Chapters 7 - 8

Computer Architecture

Chapters 4 - 5

Gate Logic

Chapters 1 - 3 Electrical

Engineering

Physics Virtual

Machine

abstract interface

Software hierarchy

Assembly Language

abstract interface

Hardware hierarchy

Machine Language

abstract interface

Hardware Platform

abstract interface

Chips &

Logic Gates

abstract interface

Human Thought

Abstract design

Chapters 9, 12

(3)

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 1: Compiler II: Code Generation slide 3

The big picture

(Chapter 11)

Jack Program

Toke-

nizer Parser

Code Gene -ration

Syntax Analyzer Jack Compiler

VM code XML code

(Chapter 10)

1. Syntax analysis: extracting the semantics from the source code

2. Code generation: expressing the semantics using the target language

This chapter

previous chapter

(4)

Syntax analysis

(review)

Class Bar {

method Fraction foo(int y) { var int temp; // a variable let temp = (xxx+12)*-63;

...

...

<varDec>

<keyword> var </keyword>

<keyword> int </keyword>

<identifier> temp </identifier>

<symbol> ; </symbol>

</varDec>

<statements>

<letStatement>

<keyword> let </keyword>

<identifier> temp </identifier>

<symbol> = </symbol>

<expression>

<term>

<symbol> ( </symbol>

<expression>

<term>

<identifier> xxx </identifier>

</term>

<symbol> + </symbol>

<term>

<int.Const.> 12 </int.Const.>

</term>

</expression>

...

Syntax analyzer

The code generation challenge:

Program = a series of operations that manipulate data

Compiler: converts each

“understood” (parsed) source operation and data item into corresponding operations and data items in the target language

(5)

Syntax analysis

(review)

Class Bar {

method Fraction foo(int y) { var int temp; // a variable let temp = (xxx+12)*-63;

...

...

<varDec>

<keyword> var </keyword>

<keyword> int </keyword>

<identifier> temp </identifier>

<symbol> ; </symbol>

</varDec>

<statements>

<letStatement>

<keyword> let </keyword>

<identifier> temp </identifier>

<symbol> = </symbol>

<expression>

<term>

<symbol> ( </symbol>

<expression>

<term>

<identifier> xxx </identifier>

</term>

<symbol> + </symbol>

<term>

<int.Const.> 12 </int.Const.>

</term>

</expression>

...

Syntax analyzer

The code generation challenge:

Thus, we have to generate code for

o handling data

o handling operations

Our approach: morph the syntax analyzer (project 10) into a full-blown compiler:

instead of generating XML, we’ll make it generate VM code.

(6)

Memory segments

(review)

Where i is a non-negative integer and segment is one of the following:

static: holds values of global variables, shared by all functions in the same class

argument: holds values of the argument variables of the current function local: holds values of the local variables of the current function

this: holds values of the private (“object”) variables of the current object

that: holds array values (silly name, sorry)

constant: holds all the constants in the range 0 … 32767 (pseudo memory segment)

pointer: used to anchor this and that to various areas in the heap temp: fixed 8-entry segment that holds temporary variables for

general use; Shared by all VM functions in the program.

VM memory Commands:

pop segment i push segment i

(7)

Memory segments

(review)

(8)

VM implementation on the Hack platform (review)

Basic idea: the mapping of the stack and the global segments on the RAM is easy (fixed);

the mapping of the function-level segments is dynamic, using pointers The stack: mapped on RAM[256 .. 2047];

The stack pointer is kept in RAM address SP

static: mapped on RAM[16 ... 255];

each segment reference static i appearing in a VM file named f is compiled to the assembly language symbol f.i (recall that the assembler further maps such symbols to the RAM, from address 16 onward)

Statics

3

12

. . .

4 5

14 15 0 1

13 2 THIS THAT SP LCL ARG

TEMP

255

. . .

16 General

purpose

2047

. . .

256

2048

Stack

. . . Heap

Host

RAM

(9)

VM implementation on the Hack platform (review)

local,argument: these method-level segments are stored in the stack, The base addresses of these segments are kept in RAM addresses LCL and ARG.

Access to the i-th entry of any of these segments is implemented by accessing RAM[segmentBase + i]

this,that: these dynamically allocated segments are mapped somewhere from address 2048 onward, in an area called

“heap”. The base addresses of these segments are kept in RAM addresses THIS, and THAT.

constant: a truly a virtual segment:

access to constant i is implemented by supplying the constant i.

pointer: contains this and that.

Statics

3

12

. . .

4 5

14 15 0 1

13 2 THIS THAT SP LCL ARG

TEMP

255

. . .

16 General

purpose

2047

. . .

256

2048

Stack

. . . Heap

Host

RAM

(10)

VM implementation on the Hack platform (review)

working stack of the current function

argument nArgs-1 ARG

saved state of the calling function. Used by the VM implementation to restore the segments of the calling function just after the current function returns.

saved THIS saved ARG saved returnAddress

saved LCL

local 0 local 1

. . .

local nVars-1 argument 0 argument 1

. . .

frames of all the functions up the calling chain

LCL

SP

saved THAT

local variables of the current function

arguments pushed by the caller for the current function

Global stack:

the entire RAM area dedicated for holding the stack

Working stack:

The stack that the current function sees

(11)

VM implementation on the Hack platform (review)

working stack of the current function

argument nArgs-1 ARG

saved state of the calling function. Used by the VM implementation to restore the segments of the calling function just after the current function returns.

saved THIS saved ARG saved returnAddress

saved LCL

local 0 local 1

. . .

local nVars-1 argument 0 argument 1

. . .

frames of all the functions up the calling chain

LCL

SP

saved THAT

local variables of the current function

arguments pushed by the caller for the current function

At any point of time, only one function (the current function) is executing; other functions may be waiting up the calling chain

Shaded areas:

irrelevant to the current function

The current function sees only the working stack, and has access only to its memory segments

The rest of the stack holds the frozen states of all the functions up the calling hierarchy.

(12)

Code generation example

method int foo() { var int x;

let x = x + 1;

...

<letStatement>

<keyword> let </keyword>

<identifier> x </identifier>

<symbol> = </symbol>

<expression>

<term>

<identifier> x </identifier>

</term>

<symbol> + </symbol>

<term>

<constant> 1 </constant>

</term>

</expression>

</letStatement>

Syntax analysis

(note that x is the first local variable declared in the method)

push local 0 push constant 1 add

pop local 0 Code

generation

(13)

Handling variables

When the compiler encounters a variable, say x, in the source code, it has to know:

What is x’s data type?

Primitive, or ADT (class name) ?

(Need to know in order to properly allocate RAM resources for its representation)

What kind of variable is x?

static, field, local, argument ?

( We need to know in order to properly allocate it to the

right memory segment; this also implies the variable’s life

cycle ).

(14)

Handling variables: mapping them on memory segments

(example)

class BankAccount { // class variables static int nAccounts;

static int bankCommission;

// account propetrties field int id;

field String owner;

field int balance;

method void transfer(int sum, BankAccount from, Date when){

var int i, j; // some local variables

var Date due; // Date is a user-define type

let balance = (balance + sum) – commission(sum * 5);

// More code ...

}

The target language uses 8 memory segments

Each memory segment, e.g. static, is an indexed sequence of 16-bit values that can be referred to as static 0, static 1, static 2, etc.

(15)

Handling variables: mapping them on memory segments

(example)

class BankAccount { // class variables static int nAccounts;

static int bankCommission;

// account propetrties field int id;

field String owner;

field int balance;

When compiling this class, we have to create the following mappings:

The class variables nAccounts , bankCommission are mapped on static 0,1

The object fields id, owner, balance are mapped on this 0,1,2

(16)

Handling variables: mapping them on memory segments

(example)

method void transfer(int sum, BankAccount from, Date when){

var int i, j; // some local variables

var Date due; // Date is a user-define type

let balance = (balance + sum) – commission(sum * 5);

// More code ...

}

When compiling this class, we have to create the following mappings:

The class variables nAccounts , bankCommission are mapped on static 0,1

The object fields id, owner, balance are mapped on this 0,1,2

The argument variables sum, bankAccount, when are mapped on argument 0,1,2

The local variables i, j, due

are mapped on local 0,1,2.

(17)

Handling variables:

symbol tables class BankAccount {

static int nAccounts;

static int bankCommission;

field int id;

field String owner;

field int balance;

method void transfer(int sum, BankAccount from, Date when){

var int i, j;

var Date due;

let balance = (balance + sum) – commission(sum * 5);

// More code ...

}

How the compiler uses symbol tables:

The compiler builds and maintains a linked list of hash tables, each reflecting a

single scope nested within the next one in the list

Identifier lookup works from the current symbol table back to the list’s head

(a classical implementation).

(18)

Handling variables:

managing their life cycle

Variables life cycle

static variables: single copy must be kept alive throughout the program duration

field variables: different copies must be kept for each object local variables: created on subroutine entry, killed on exit

argument variables: similar to local variables.

Good news: the VM implementation already handles all these details !

(19)

120 80 radius: 50

x:

y:

color: 3

120 80 50 3012

3013 3014

3 3015

412 3012

...

...

High level program view RAM view

0

...

b following

compilation b

object

b

object (Actual RAM locations of program variables are

run-time dependent, and thus the addresses shown here are arbitrary examples.)

Background:

Suppose we have an object named b of type Ball. A Ball has x, y coordinates, a radius, and a

color.

Class Ball {

field int x, y, radius, color;

method void SetR(int r) { radius = r; } }

...

Ball b; b=Ball.new();

b.SetR(17);

Handling objects:

establishing access to the object’s fields

(20)

Class Ball { ...

void SetR(int r) { radius = r; } }

...

Ball b;

b.SetR(17);

Handling objects:

establishing access to the object’s fields

(21)

0 0 1

Virtual memory segments just before the operation b.radius=17:

3012 17 0

1

...

...

120 80 17 0

1 2 3012 0

1

3 3012

17 0

1

argument pointer this

...

3

(this 0

is now

alligned with

RAM[3012])

...

Virtual memory segments just after the operation b.radius=17:

argument pointer this

Class Ball { ...

void SetR(int r) { radius = r; } }

...

Ball b;

b.SetR(17);

// Get b's base address:

push argument 0

// Point the this segment to b:

pop pointer 0 // Get r's value push argument 1

// Set b's third field to r:

pop this 2

Handling objects:

establishing access to the object’s fields need to know which

instance it is working on

need to pass the object into the function

=> Ball.SetR(b, 17)

0 0 1

Virtual memory segments just before the operation b.radius=17:

3012 17 0

1

...

...

120 80 17 0

1 2 3012 0

1

3 3012

17 0

1

argument pointer this

...

3

(this 0

is now

alligned with

RAM[3012])

...

Virtual memory segments just after the operation b.radius=17:

argument pointer this

this 0 is now aligned with RAM[3012]

(22)

class Complex {

// Fields (properties):

int re; // Real part

int im; // Imaginary part ...

/** Constructs a new Complex number */

public Complex (int re, int im) { this.re = re;

this.im = im;

} ...

}

Java code

Handling objects:

construction / memory allocation

(23)

Java code

Handling objects:

construction / memory allocation

class Foo {

public void bla() { Complex a, b, c;

...

a = new Complex(5,17);

b = new Complex(12,192);

...

// Only the reference is copied c = a;

...

} Following

execution:

(24)

Java code

Handling objects:

construction / memory allocation

class Foo {

public void bla() { Complex a, b, c;

...

a = new Complex(5,17);

b = new Complex(12,192);

...

// Only the reference is copied c = a;

...

}

How to compile:

foo = new ClassName(…) The compiler generates code affecting:

foo = Memory.alloc(n) Where n is the number of words necessary to

represent the object in question, and

Memory.alloc is an OS method that returns the base address of a free memory block of size n words.

(25)

Handling objects:

accessing fields

How to compile:

im = im * c ?

1. look up the two variables in the symbol table

2. Generate the code:

This pseudo-code should be expressed in the

target language.

class Complex {

// Fields (properties):

int re; // Real part

int im; // Imaginary part ...

/** Constructs a new Complex number */

public Complex (int re, int im) { this.re = re;

this.im = im;

}

/** Multiplies this Complex number by the given scalar */

public void mult (int c) { re = re * c;

im = im * c;

} ...

}

Java code

*(this + 1) = *(this + 1) times

(argument 0)

(26)

Handling objects:

method calls

class Complex { ...

public void mult (int c) { re = re * c;

im = im * c;

} ...

}

class Foo { ...

public void bla() { Complex x;

...

x = new Complex(1,2);

x.mult(5);

...

} }

Java code

push x push 5 call mult How to compile:

x.mult(5) ?

This method call can also be viewed as:

mult(x,5)

Generate the following code:

(27)

Handling objects:

method calls

General rule: each method call foo.bar(v1,v2,...)

is translated into:

push foo push v1 push v2 ...

call bar class Complex {

...

public void mult (int c) { re = re * c;

im = im * c;

} ...

}

class Foo { ...

public void bla() { Complex x;

...

x = new Complex(1,2);

x.mult(5);

...

} }

Java code

(28)

Handling array

int foo() { // some language, not Jack int bar[10];

...

bar[2] = 19;

}

(29)

Handling array

(30)

class Bla { ...

void foo(int k) { int x, y;

int[] bar; // declare an array // Construct the array:

bar = new int[10];

...

bar[k]=19;

} ...

Main.foo(2); // Call the foo method Java code

How to compile:

bar = new int(n) ?

Generate code affecting:

bar = Memory.alloc(n)

Handling arrays:

declaration / construction

0 4315

4316 4317

4324

(bar array)

...

4318

...

...

4315

...

0

bar x y

2 k

(local 0) (local 1) (local 2)

(argument 0) 275

276 277

504

RAM state

...

Following compilation:

(31)

class Bla { ...

void foo(int k) { int x, y;

int[] bar; // declare an array // Construct the array:

bar = new int[10];

...

bar[k]=19;

} ...

Main.foo(2); // Call the foo method Java code

How to compile: bar[k] = 19 ?

Handling arrays:

accessing an array entry by its index

RAM state, just after executing bar[k] = 19

19 4315

4316 4317

4324

(bar array)

...

4318

...

...

4315

...

0

bar x y

2 k

(local 0) (local 1) (local 2)

(argument 0) 275

276 277

504

RAM state

...

Following compilation:

(32)

How to compile: bar[k] = 19 ?

// bar[k]=19, // or *(bar+k)=19 push bar

push k add

// Use a pointer to // access x[k]

// addr points to bar[k]

pop addr push 19

// Set bar[k] to 19 pop *addr

VM Code (pseudo)

// bar[k]=19, // or *(bar+k)=19 push local 2

push argument 0 add

// Use a pointer to // access x[k]

pop pointer 1 push constant 19

pop that 0

VM Code (actual)

Handling arrays:

accessing an array entry by its index

(33)

syntax analysis

parse tree

Handling expressions

((5+z)/-8)*(4^2) High-level code

push 5 push z add

push 8 neg

call div push 4 push 2

call power call mult code

generation

VM code

(34)

Handling expressions (Jack grammar)

’x’: x appears verbatim

x: x is a language construct x?: x appears 0 or 1 times x*: x appears 0 or more times x|y: either x or y appears

(x,y): x appears, then y.

term binary term

(35)

Handling expressions (Jack grammar)

’x’: x appears verbatim

x: x is a language construct x?: x appears 0 or 1 times x*: x appears 0 or more times x|y: either x or y appears

(x,y): x appears, then y.

term constant

variable function

unary op

(36)

Handling expressions

To generate VM code from a parse tree exp, use the following logic:

The codeWrite(exp) algorithm:

if exp is a constant n then output "push n"

if exp is a variable v then output "push v"

if exp is op(exp

1

) then codeWrite(exp

1

);

output "op";

if exp is f (exp

1

, ..., exp

n

) then codeWrite(exp1);

...

codeWrite(expn);

output "call f";

if exp is ( exp

1

op exp

2

) then codeWrite(exp

1

);

codeWrite(exp

2

);

output "op";

(37)

The Jack grammar (Expression)

(38)

From parsing to code generation (simplified expression)

 EXP  TERM (OP TERM)*

 TERM  integer | variable

 OP  + | - | * | /

(39)

From parsing to code generation

 EXP  TERM (OP TERM)*

 TERM  integer | variable

 OP  + | - | * | /

EXP() :

TERM();

while (next()==OP) OP();

TERM();

(40)

From parsing to code generation

 EXP  TERM (OP TERM)*

 TERM  integer | variable

 OP  + | - | * | /

EXP() :

TERM();

while (next()==OP) OP();

TERM();

TERM():

switch (next()) case INT:

eat(INT);

case VAR:

eat(VAR);

(41)

From parsing to code generation

 EXP  TERM (OP TERM)*

 TERM  integer | variable

 OP  + | - | * | /

EXP() :

TERM();

while (next()==OP) OP();

TERM();

OP():

switch (next())

case +: eat(ADD);

case -: eat(SUB);

case *: eat(MUL);

case /: eat(DIV);

TERM():

switch (next()) case INT:

eat(INT);

case VAR:

eat(VAR);

(42)

From parsing to code generation

 EXP  TERM (OP TERM)*

 TERM  integer | variable

 OP  + | - | * | /

EXP() :

TERM();

while (next()==OP) OP();

TERM();

OP():

switch (next())

case +: eat(ADD);

case -: eat(SUB);

case *: eat(MUL);

case /: eat(DIV);

TERM():

switch (next()) case INT:

eat(INT);

case VAR:

eat(VAR);

(43)

From parsing to code generation

 EXP  TERM (OP TERM)*

 TERM  integer | variable

 OP  + | - | * | /

EXP() :

TERM();

while (next()==OP) op=OP();

TERM();

write(op);

TERM():

switch (next())

case INT: write(‘push constant ’ +next());

eat(INT);

case VAR: write(‘push ’

+lookup(next()));

eat(VAR);

OP():

switch (next())

case +: eat(ADD);

return ‘add’;

case -: eat(SUB);

return ‘sub’;

case *: eat(MUL);

return ‘call Math.mul’;

case /: eat(DIV);

return ‘call Math.div’;

(44)

The Jack grammar (Expression)

(45)

The Jack grammar (statement)

STATEMENTS() :

while (next() in {let, if, while, do, return}) STATEMENT();

(46)

The Jack grammar (statement)

STATEMENT() : switch (next())

case LET: LET_STAT();

case IF: IF_STAT();

case WHILE: WHILE_STAT();

case DO: DO_STAT();

case RETURN: RETURN_STAT();

(47)

let statement

LET_STAT():

eat(LET);

eat(VAR);

eat(EQ);

EXP();

eat(SEMI);

Parsing

LET_STAT():

eat(LET);

variable=lookup(next());

eat(VAR);

eat(EQ);

EXP();

eat(SEMI);

write(‘pop ’ + variable)

Parsing with code generation

(48)

Handling program flow

if (cond) s1

else s2 ...

High-level code

VM code to compute and push !(cond) if-goto L1

VM code for executing s1 goto L2

label L1

VM code for executing s2 label L2

...

VM code

code

generation

(49)

Handling program flow

while (cond) s

...

High-level code

label L1

VM code to compute and push !(cond) if-goto L2

VM code for executing s goto L1

label L2 ...

VM code

code

generation

(50)

The Jack grammar (class)

CLASS() :

eat(CLASS);

eat(ID);

eat(‘{‘);

while (next() in {static, field}) CLASSVARDEC();

while (next() in {constructor, function, method}) SUBROUTINEDEC();

(51)

The Jack grammar (class)

CLASS() :

eat(CLASS); class=registerClass(next());

eat(ID);

eat(‘{‘);

while (next() in {static, field}) CLASSVARDEC(class);

while (next() in {constructor, function, method}) SUBROUTINEDEC(class);

(52)

The Jack grammar (class)

CLASSVARDEC(class) : switch (next())

case static: eat(STATIC); kind=STATIC;

case field: eat(FIELD); kind=FIELD;

switch (next())

case int: type=INT; eat(INT);

case char: type=CHAR; eat(CHAR);

case boolean: type=BOOLEAN; eat(BOOLEAN);

case ID: type=lookup(next()); eat(ID);

registerClassVar(class, next(), kind, type);

eat(ID);

while (next()=COMMA)

registerClassVar(class, next(), kind, type);

eat(ID);

(53)

Put them together

class BankAccount {

static int nAccounts;

static int bankCommission;

field int id;

field String owner;

field int balance;

method void transfer(int sum, BankAccount from, Date when){

var int i, j;

var Date due;

let balance = (balance + sum) – commission(sum * 5);

// More code ...

}

(54)

...

let balance = (balance + sum) – commission(sum * 5)

(55)

Perspective

Jack simplifications that are challenging to extend:

Limited primitive type system

No inheritance

No public class fields, e.g. must use r = c.getRadius() rather than r = c.radius

Jack simplifications that are easy to extend: :

Limited control structures, e.g. no for, switch, …

Cumbersome handling of char types, e.g. cannot use let x=‘c’

Optimization

For example, c=c+1 is translated inefficiently into push c, push 1, add, pop c.

Parallel processing

Many other examples of possible improvements …

Figure

Updating...

References

Related subjects :