• 沒有找到結果。

Compiler I: Syntax Analysis

N/A
N/A
Protected

Academic year: 2022

Share "Compiler I: Syntax Analysis"

Copied!
15
0
0

加載中.... (立即查看全文)

全文

(1)

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 1

www.nand2tetris.org

Building a Modern Computer From First Principles

Compiler I: Syntax Analysis

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 2

Course map

Assembler Chapter 6 H.L. Language

&

Operating Sys.

abstract interface

Compiler

Chapters 10 - 11

VM Translator Chapters 7 - 8

Computer Architecture Chapters 4 - 5

Gate Logic

Chapters 1 - 3 Electrical

Engineering

Physics Virtual

Machine abstract interface

Software hierarchy

Assembly Language abstract interface

Hardware hierarchy

Machine

Language abstract interface

Hardware Platform abstract interface

Chips &

Logic Gates abstract interface Human

Thought

Abstract design Chapters 9, 12

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 3

Motivation: Why study about compilers?

The first compiler is FORTRAN compiler developed by an IBM team led by John Backus (Turing Award, 1977) in 1957. It took 18 man-month.

Because Compilers …

 Are an essential part of applied computer science

 Are very relevant to computational linguistics

 Are implemented using classical programming techniques

 Employ important software engineering principles

 Train you in developing software for transforming one structure to another (programs, files, transactions, …)

 Train you to think in terms of ”description languages”.

 Parsing files of some complex syntax is very common in many applications.

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 4

The big picture

. . .

RISC machine

other digital platforms, each equipped with its VM implementation RISC

machine language

Hack computer

Hack machine language CISC

machine language

CISC machine

. . .

a high-levelwritten in language

Any computer

. . .

HW lectures (Projects 1-6) Intermediate code

VM implementation

over CISC platforms

VM imp.

over RISC platforms

VM imp.

over the Hack platform VM

emulator

VM lectures (Projects 7-8) Some Other

language Jack

language

compilerSome Some Other

compiler

compilerJack

. . .

Some

language

. . .

Compiler lectures (Projects

10,11)

Modern compilers are two-tiered:

 Front-end:

from high-level language to some intermediate language

 Back-end:

from the

intermediate

language to

binary code.

(2)

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 5

Compiler architecture (front end)

. . .

Intermediate code

RISC machine language

Hack machine language CISC

machine

language . . . a high-levelwritten in

language

. . .

VM implementation

over CISC platforms

VM imp.

over RISC platforms

VM imp.

over the Hack platform VM emulator Some Other

language Jack

language

Some compiler Some Other

compiler Jack compiler

. . .

Some language. . .

 Syntax analysis: understanding the structure of the source code

 Code generation: reconstructing the semantics using the syntax of the target code.

 Tokenizing: creating a stream of “atoms”

 Parsing: matching the atom stream with the language grammar XML output = one way to demonstrate that the syntax analyzer

works

(Chapter 11) Jack

Program Toke-

nizer Parser Code

Gene -ration Syntax Analyzer

Jack Compiler

VM code XML code (Chapter 10)

(source)

scanner

(target)

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 6

Tokenizing / Lexical analysis / scanning

 Remove white space

 Construct a token list (language atoms)

 Things to worry about:

 Language specific rules: e.g. how to treat “++”

 Language-specific classifications:

keyword, symbol, identifier, integerCconstant, stringConstant,...

 While we are at it, we can have the tokenizer record not only the token, but also its lexical classification (as defined by the source language grammar).

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 7

C function to split a string into tokens

char* strtok(char* str, const char* delimiters);

str: string to be broken into tokens

delimiters: string containing the delimiter characters

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 8

Jack Tokenizer

if (x < 153) {let city = ”Paris”;} Source code

<tokens>

<keyword> if </keyword>

<symbol> ( </symbol>

<identifier> x </identifier>

<symbol> &lt; </symbol>

<integerConstant> 153 </integerConstant>

<symbol> ) </symbol>

<symbol> { </symbol>

<keyword> let </keyword>

<identifier> city </identifier>

<symbol> = </symbol>

<stringConstant> Paris </stringConstant>

<symbol> ; </symbol>

<symbol> } </symbol>

</tokens>

Tokenizer’s output

Tokenizer

(3)

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 9

Parsing

 The tokenizer discussed thus far is part of a larger program called parser

Each language is characterized by a grammar.

The parser is implemented to recognize this grammar in given texts

 The parsing process:

 A text is given and tokenized

 The parser determines weather or not the text can be generated from the grammar

 In the process, the parser performs a complete structural analysis of the text

 The text can be in an expression in a :

 Natural language (English, …)

 Programming language (Jack, …).

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 10

Parsing examples

He ate an apple on the desk.

English

ate

he an apple the desk parse

on

(5+3)*2 – sqrt(9*4)

-

5

sqrt

+ *

3

2

9 4

*

Jack

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 11

Regular expressions

 a|b*

{, “a”, “b”, “bb”, “bbb”, …}

 (a|b)*

{, “a”, “b”, “aa”, “ab”, “ba”, “bb”, “aaa”, …}

 ab*(c|)

{a, “ac”, “ab”, “abc”, “abb”, “abbc”, …}

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 12

Lex

 A computer program that generates lexical analyzers (scanners or lexers)

 Commonly used with the yacc parser generator.

 Structure of a Lex file

Definition section

%%

Rules section

%%

C code section

(4)

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 13

Example of a Lex file

/*** Definition section ***/

%{

/* C code to be copied verbatim */

#include <stdio.h>

%}

/* This tells flex to read only one input file */

%option noyywrap

/*** Rules section ***/

%%

[0-9]+ {

/* yytext is a string containing the matched text. */

printf("Saw an integer: %s\n", yytext);

}

.|\n { /* Ignore all other characters. */ }

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 14

Example of a Lex file

%%

/*** C Code section ***/

int main(void) {

/* Call the lexer, then quit. */

yylex();

return 0;

}

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 15

Example of a Lex file

> flex test.lex

(a file lex.yy.c with 1,763 lines is generated)

> gcc lex.yy.c

(an executable file a.out is generated)

> ./a.out < test.txt Saw an integer: 123 Saw an integer: 2 Saw an integer: 6

abc123z.!&*2gj6

test.txt

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 16

Another Lex example

%{

int num_lines = 0, num_chars = 0;

%}

%option noyywrap

%%

\n ++num_lines; ++num_chars;

. ++num_chars;

%%

main() { yylex();

printf( "# of lines = %d, # of chars = %d\n", num_lines, num_chars );

}

(5)

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 17

A more complex Lex example

%{

/* need this for the call to atof() below */

#include <math.h>

%}

%option noyywrap DIGIT [0-9]

ID [a-z][a-z0-9]*

%%

{DIGIT}+ {

printf( "An integer: %s (%d)\n", yytext, atoi( yytext ) );

}

{DIGIT}+"."{DIGIT}* {

printf( "A float: %s (%g)\n", yytext, atof( yytext ) );

}

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 18

A more complex Lex example

if|then|begin|end|procedure|function {

printf( "A keyword: %s\n", yytext );

}

{ID} printf( "An identifier: %s\n", yytext );

"+"|"-"|"="|"("|")" printf( “Symbol: %s\n", yytext );

[ \t\n]+ /* eat up whitespace */

. printf("Unrecognized char: %s\n", yytext );

%%

void main(int argc, char **argv ) {

if ( argc > 1 ) yyin = fopen( argv[1], "r" );

else yyin = stdin;

yylex();

}

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 19

A more complex Lex example

if (a+b) then foo=3.1416 else

foo=12

pascal.txt

A keyword: if Symbol: (

An identifier: a Symbol: +

An identifier: b Symbol: )

A keyword: then An identifier: foo Symbol: =

A float: 3.1416 (3.1416) An identifier: else An identifier: foo Symbol: =

An integer: 12 (12)

output

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 20

Context-free grammar

 Terminals: 0, 1, #

 Non-terminals: A, B

 Start symbol: A

 Rules:

 A0A1

 AB

 B#

 Simple (terminal) forms / complex (non-terminal) forms

 Grammar = set of rules on how to construct complex forms from simpler forms

 Highly recursive.

(6)

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 21

Examples of context-free grammar

 S() S(S) SSS

 Sa|aS|bS

strings ending with ‘a’

 S  x S  y S  S+S S  S-S S  S*S S  S/S S  (S)

(x+y)*x-x*y/(x+x)

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 22

Examples of context-free grammar

 non-terminals: S, E, Elist

 terminals: ID, NUM, PRINT, +, :=, (, ), ;

 rules:

S  S; S S  ID := E

S  PRINT ( Elist )

E  ID E  NUM E  E + E

E  ( S , Elist )

slide credit: David Walker

Elist  E

Elist  Elist , E

ID = NUM ; PRINT ( NUM ) Try to derive:

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 23

Examples of context-free grammar

 non-terminals: S, E, Elist

 terminals: ID, NUM, PRINT, +, :=, (, ), ;

 rules:

S  S; S S  ID := E

S  PRINT ( Elist )

E  ID E  NUM E  E + E

E  ( S , Elist )

slide credit: David Walker

Elist  E

Elist  Elist , E

S S ; S ID = E ; S ID = NUM ; S

ID = NUM ; PRINT ( Elist ) ID = NUM ; PRINT ( E ) ID = NUM ; PRINT ( NUM )

S S ; S

S ; PRINT ( Elist ) S ; PRINT ( E ) S ; PRINT ( NUM ) ID = E ; PRINT ( NUM ) ID = NUM ; PRINT ( NUM ) left-most derivation right-most derivation

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 24

Parse tree

 Two derivations, but 1 tree

slide credit: David Walker

S S ; S ID = E ; S ID = NUM ; S

ID = NUM ; PRINT ( Elist ) ID = NUM ; PRINT ( E ) ID = NUM ; PRINT ( NUM ) S S ; S

S ; PRINT ( Elist ) S ; PRINT ( E ) S ; PRINT ( NUM ) ID = E ; PRINT ( NUM ) ID = NUM ; PRINT ( NUM )

S

S S

ID := E

NUM

NUM E

L ) PRINT (

;

(7)

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 25

Ambiguous Grammars

 a grammar is ambiguous if the same sequence of tokens can give rise to two or more parse trees

 non-terminals: E

 terminals: ID, NUM, PLUS, MUL

 rules: E  ID

E  NUM E  E + E E  E * E

characters: 4 + 5 * 6

tokens: NUM(4) PLUS NUM(5) MUL NUM(6)

slide credit: David Walker

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 26

Ambiguous Grammars

E  ID E  NUM E  E + E E  E * E

characters: 4 + 5 * 6

tokens: NUM(4) PLUS NUM(5) MUL NUM(6) E

E E

NUM(4) E

E * +

NUM(6) NUM(5)

E

E E

NUM(6) + E

E

*

NUM(5) NUM(4)

slide credit: David Walker

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 27

Ambiguous Grammars

 problem: compilers use parse trees to interpret the meaning of parsed expressions

 different parse trees have different meanings

 eg: (4 + 5) * 6 is not 4 + (5 * 6)

 languages with ambiguous grammars are

DISASTROUS; The meaning of programs isn’t well- defined! You can’t tell what your program might do!

 solution: rewrite grammar to eliminate ambiguity

 fold precedence rules into grammar to disambiguate

 fold associativity rules into grammar to disambiguate

 other tricks as well

slide credit: David Walker

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 28

Recursive descent parser

 Recursive Descent Parsing

 aka: predictive parsing; top-down parsing

 simple, efficient

 can be coded by hand in ML quickly

 parses many, but not all CFGs

 parses LL(1) grammars

 Left-to-right parse; Leftmost-derivation; 1 symbol lookahead

 key ideas:

 one recursive function for each non terminal

 each production becomes one clause in the function

slide credit: David Walker

(8)

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 29

Recursive descent parser

 Non-terminals: S, E, L

 Terminals: NUM, IF, THEN, ELSE, BEGIN, END, PRINT, =, ;

 Rules:

1. S -> IF E THEN S ELSE S

2. | BEGIN S L

3. | PRINT E

4. L -> END

5. | ; S L

6. E -> NUM = NUM

slide credit: David Walker

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 30

Recursive descent parser

 Non-terminals: S, E, L

 Terminals: NUM, IF, THEN, ELSE, BEGIN, END, PRINT, =, ;

 Rules:

1. S -> IF E THEN S ELSE S

2. | BEGIN S L

3. | PRINT E

4. L -> END

5. | ; S L

6. E -> NUM = NUM

S() {

switch (next()) { case IF:

eat(IF); E(); eat(THEN);

S(); eat(ELSE); S();

break;

case BEGIN:

eat(BEGIN); S(); L();

break;

case PRINT:

eat(PRINT); E();

break;

}

}

slide credit: David Walker

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 31

Recursive descent parser

 Non-terminals: S, E, L

 Terminals: NUM, IF, THEN, ELSE, BEGIN, END, PRINT, EQ(=), SEMI(;)

 Rules:

1. S -> IF E THEN S ELSE S

2. | BEGIN S L

3. | PRINT E

4. L -> END

5. | ; S L

6. E -> NUM = NUM

L() {

switch (next()) { case END:

eat(END);

break;

case SEMI:

eat(SEMI); S(); L();

break;

default:

error();

} }

slide credit: David Walker

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 32

Recursive descent parser

 Non-terminals: S, E, L

 Terminals: NUM, IF, THEN, ELSE, BEGIN, END, PRINT, EQ(=), SEMI(;)

 Rules:

1. S -> IF E THEN S ELSE S

2. | BEGIN S L

3. | PRINT E

4. L -> END

5. | ; S L

6. E -> NUM = NUM

E() {

eat(NUM);

eat(EQ);

eat(NUM);

}

slide credit: David Walker

(9)

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 33

Recursive descent parser

 Non-terminals: S, A, E, L

 Terminals: EOF, ID, NUM, ASSIGN(:=), PRINT, LPAREN((), RPAREN())

 Rules:

1. S -> A EOF

2. A -> ID := E

3. | PRINT(L)

4. E -> ID

5. | NUM

6. L -> E

7. | L, E

slide credit: David Walker

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 34

Recursive descent parser

 Non-terminals: S, A, E, L

 Terminals: EOF, ID, NUM, ASSIGN(:=), PRINT, LPAREN((), RPAREN())

 Rules:

1. S -> A EOF

2. A -> ID := E

3. | PRINT(L)

4. E -> ID

5. | NUM

6. L -> E

7. | L, E

S() {

A();

eat(EOF);

}

slide credit: David Walker

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 35

Recursive descent parser

 Non-terminals: S, A, E, L

 Terminals: EOF, ID, NUM, ASSIGN(:=), PRINT, LPAREN((), RPAREN())

 Rules:

1. S -> A EOF

2. A -> ID := E

3. | PRINT(L)

4. E -> ID

5. | NUM

6. L -> E

7. | L, E

A() {

switch (next()) { case ID:

eat(ID); eat(ASSIGN);

E();

break;

case PRINT:

eat(PRINT); eat(LPAREN);

L(); eat(RPAREN);

break;

} }

slide credit: David Walker

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 36

Recursive descent parser

 Non-terminals: S, A, E, L

 Terminals: EOF, ID, NUM, ASSIGN(:=), PRINT, LPAREN((), RPAREN())

 Rules:

1. S -> A EOF

2. A -> ID := E

3. | PRINT(L)

4. E -> ID

5. | NUM

6. L -> E

7. | L, E

E() {

switch (next()) { case ID:

eat(ID);

break;

case NUM:

eat(NUM);

break;

} }

slide credit: David Walker

(10)

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 37

Recursive descent parser

 Non-terminals: S, A, E, L

 Terminals: EOF, ID, NUM, ASSIGN(:=), PRINT, LPAREN((), RPAREN())

 Rules:

1. S -> A EOF

2. A -> ID := E

3. | PRINT(L)

4. E -> ID

5. | NUM

6. L -> E

7. | L, E

L() {

switch (next()) { case ID:

???

case NUM:

???

} Problem: } E could be ID

L could be E could be ID

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 38

Recursive descent parser

 Non-terminals: S, A, E, L

 Terminals: EOF, ID, NUM, ASSIGN(:=), PRINT, LPAREN((), RPAREN())

 Rules:

1. S -> A EOF

2. A -> ID := E

3. | PRINT(L)

4. E -> ID

5. | NUM

6. L -> E

7. | L, E

Problem:

E could be ID

L could be E could be ID L -> E M M -> , E M

| 

slide credit: David Walker

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 39

A typical grammar of a typical C-like language

while (expression) { if (expression)

statement;

while (expression) { statement;

if (expression) statement;

}

while (expression) { statement;

statement;

} }

Code samples

if (expression) { statement;

while (expression) statement;

statement;

}

if (expression) if (expression)

statement;

}

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 40

A typical grammar of a typical C-like language

program: statement;

statement: whileStatement

| ifStatement

| // other statement possibilities ...

| '{' statementSequence '}'

whileStatement: 'while' '(' expression ')' statement ifStatement: simpleIf

| ifElse

simpleIf: 'if' '(' expression ')' statement ifElse: 'if' '(' expression ')' statement

'else' statement

statementSequence: '' // null, i.e. the empty sequence

| statement ';' statementSequence

expression: // definition of an expression comes here

(11)

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 41

Parse tree

while ( count <= 100 ) { count ++ . . .

statement

whileStatement

expression

statementSequence statement

;

statement statementSequence

Input Text:

while (count<=100) { /** demonstration */

count++;

// ...

Tokenized:

while ( count

<=

100 ) { count ++

; ...

program: statement;

statement: whileStatement

| ifStatement

| // other statement possibilities ...

| '{' statementSequence '}'

whileStatement: 'while'

'(' expression ')' statement ...

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 42

Recursive descent parsing

Parser implementation: a set of parsing methods, one for each rule:

parseStatement()

parseWhileStatement()

parseIfStatement()

parseStatementSequence()

parseExpression().

 Highly recursive

 LL(0) grammars: the first token determines in which rule we are

 In other grammars you have to look ahead 1 or more tokens

 Jack is almost LL(0).

while (expression) { statement;

statement;

while (expression) { while (expression)

statement;

statement;

} }

code sample

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 43

The Jack grammar

’x’: x appears verbatim x: x is a language construct x?: x appears 0 or 1 times x*: x appears 0 or more times x|y: either x or y appears (x,y): x appears, then y.

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 44

The Jack grammar

’x’: x appears verbatim

x: x is a language construct

x?: x appears 0 or 1 times

x*: x appears 0 or more times

x|y: either x or y appears

(x,y): x appears, then y.

(12)

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 45

The Jack grammar

’x’: x appears verbatim x: x is a language construct x?: x appears 0 or 1 times x*: x appears 0 or more times x|y: either x or y appears (x,y): x appears, then y.

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 46

The Jack grammar

’x’: x appears verbatim x: x is a language construct x?: x appears 0 or 1 times x*: x appears 0 or more times x|y: either x or y appears (x,y): x appears, then y.

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 47

Jack syntax analyzer in action

Class Bar {

method Fraction foo(int y) { var int temp; // a variable let temp = (xxx+12)*-63;

...

...

Syntax analyzer

 With the grammar, we can write a syntax analyzer program (parser)

 The syntax analyzer takes a source text file and attempts to match it on the language grammar

 If successful, it can generate a parse tree in some structured format, e.g. XML.

Syntax analyzer

<varDec>

<keyword> var </keyword>

<keyword> int </keyword>

<identifier> temp </identifier>

<symbol> ; </symbol>

</varDec>

<statements>

<letStatement>

<keyword> let </keyword>

<identifier> temp </identifier>

<symbol> = </symbol>

<expression>

<term>

<symbol> ( </symbol>

<expression>

<term>

<identifier> xxx </identifier>

</term>

<symbol> + </symbol>

<term>

<int.Const.> 12 </int.Const.>

</term>

</expression>

...

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 48

Jack syntax analyzer in action

Class Bar {

method Fraction foo(int y) { var int temp; // a variable let temp = (xxx+12)*-63;

...

...

Syntax analyzer

 If xxx is non-terminal, output:

<xxx>

Recursive code for the body of xxx

</xxx>

If xxx is terminal

(keyword, symbol, constant, or identifier) , output:

<xxx>

xxx value

</xxx>

<varDec>

<keyword> var </keyword>

<keyword> int </keyword>

<identifier> temp </identifier>

<symbol> ; </symbol>

</varDec>

<statements>

<letStatement>

<keyword> let </keyword>

<identifier> temp </identifier>

<symbol> = </symbol>

<expression>

<term>

<symbol> ( </symbol>

<expression>

<term>

<identifier> xxx </identifier>

</term>

<symbol> + </symbol>

<term>

<int.Const.> 12 </int.Const.>

</term>

</expression>

...

(13)

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 49

The Jack grammar

’x’: x appears verbatim x: x is a language construct x?: x appears 0 or 1 times x*: x appears 0 or more times x|y: either x or y appears (x,y): x appears, then y.

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 50

Recursive descent parser (simplified expression)

 EXP  TERM (OP TERM)*

 TERM  integer | variable

 OP  + | - | * | /

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 51

From parsing to code generation

 EXP  TERM (OP TERM)*

 TERM  integer | variable

 OP  + | - | * | /

EXP() : TERM();

while (next()==OP) OP();

TERM();

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 52

From parsing to code generation

 EXP  TERM (OP TERM)*

 TERM  integer | variable

 OP  + | - | * | /

EXP() : TERM();

while (next()==OP) OP();

TERM();

TERM():

switch (next()) case INT:

eat(INT);

case VAR:

eat(VAR);

(14)

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 53

From parsing to code generation

 EXP  TERM (OP TERM)*

 TERM  integer | variable

 OP  + | - | * | /

EXP() : TERM();

while (next()==OP) OP();

TERM();

OP():

switch (next()) case +: eat(ADD);

case -: eat(SUB);

case *: eat(MUL);

case /: eat(DIV);

TERM():

switch (next()) case INT:

eat(INT);

case VAR:

eat(VAR);

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 54

From parsing to code generation

 EXP  TERM (OP TERM)*

 TERM  integer | variable

 OP  + | - | * | /

EXP() : TERM();

while (next()==OP) OP();

TERM();

OP():

switch (next()) case +: eat(ADD);

case -: eat(SUB);

case *: eat(MUL);

case /: eat(DIV);

TERM():

switch (next()) case INT:

eat(INT);

case VAR:

eat(VAR);

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 55

From parsing to code generation

 EXP  TERM (OP TERM)*

 TERM  integer | variable

 OP  + | - | * | /

EXP() : print(‘<exp>’);

TERM();

while (next()==OP) OP();

TERM();

print(‘</exp>’);

OP(): print(‘<op>’);

switch (next()) case +: eat(ADD);

print(‘<sym> + </sym>’);

case -: eat(SUB);

print(‘<sym> - </sym>’);

case *: eat(MUL);

print(‘<sym> * </sym>’);

case /: eat(DIV);

print(‘<sym> / </sym>’);

print(‘</op>’);

TERM(): print(‘<term>’);

switch (next())

case INT: print(‘<int> next() </int>’);

eat(INT);

case VAR: print(‘<id> next() </id>’);

eat(VAR);

print(‘</term>’);

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 56

Summary and next step

(Chapter 11)

Jack

Program

Toke-

nizer Parser

Code Gene -ration

Syntax Analyzer

Jack Compiler

VM code XML code

(Chapter 10)

 Syntax analysis: understanding syntax

 Code generation: constructing semantics

The code generation challenge:

 Extend the syntax analyzer into a full-blown compiler that, instead of passive XML code, generates executable VM code

 Two challenges: (a) handling data, and (b) handling commands.

(15)

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 57

Perspective

 The parse tree can be constructed on the fly

 The Jack language is intentionally simple:

Statement prefixes: let, do, ...

 No operator priority

 No error checking

 Basic data types, etc.

 The Jack compiler: designed to illustrate the key ideas that underlie modern compilers, leaving advanced features to more advanced courses

 Richer languages require more powerful compilers

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 58

Perspective

 Syntax analyzers can be built using:

Lex tool for tokenizing (flex)

Yacc tool for parsing (bison)

 Do everything from scratch (our approach ...)

 Industrial-strength compilers: (LLVM)

 Have good error diagnostics

 Generate tight and efficient code

 Support parallel (multi-core) processors.

參考文獻

相關文件

Elements of Computing Systems, Nisan &amp; Schocken, MIT Press, www.nand2tetris.org , Chapter 12: Operating System slide 1..

Elements of Computing Systems, Nisan &amp; Schocken, MIT Press, www.nand2tetris.org , Chapter 12: Operating System slide 2.. Where we

method void setCharAt(int j, char c) method String appendChar(char c) method void eraseLastChar() method int intValue() method void setInt(int j) function char

Elements of Computing Systems, Nisan &amp; Schocken, MIT Press, www.nand2tetris.org , Chapter 7: Virutal Machine, Part I slide

Elements of Computing Systems, Nisan &amp; Schocken, MIT Press, www.nand2tetris.org , Chapter 9: High-Level Language slide

Elements of Computing Systems, Nisan &amp; Schocken, MIT Press, www.nand2tetris.org , Chapter 9: High-Level Language slide 2.. Where we

Elements of Computing Systems, Nisan &amp; Schocken, MIT Press, www.nand2tetris.org , Chapter 7: Virutal Machine, Part I slide

Elements of Computing Systems, Nisan &amp; Schocken, MIT Press, www.nand2tetris.org , Chapter 1: Compiler II: Code Generation slide