Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 1
www.nand2tetris.org
Building a Modern Computer From First Principles
Compiler I: Syntax Analysis
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 2
Course map
Assembler Chapter 6 H.L. Language
&
Operating Sys.
abstract interface
Compiler
Chapters 10 - 11
VM Translator Chapters 7 - 8
Computer Architecture Chapters 4 - 5
Gate Logic
Chapters 1 - 3 Electrical
Engineering
Physics Virtual
Machine abstract interface
Software hierarchy
Assembly Language abstract interface
Hardware hierarchy
MachineLanguage abstract interface
Hardware Platform abstract interface
Chips &
Logic Gates abstract interface Human
Thought
Abstract design Chapters 9, 12
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 3
Motivation: Why study about compilers?
The first compiler is FORTRAN compiler developed by an IBM team led by John Backus (Turing Award, 1977) in 1957. It took 18 man-month.
Because Compilers …
Are an essential part of applied computer science
Are very relevant to computational linguistics
Are implemented using classical programming techniques
Employ important software engineering principles
Train you in developing software for transforming one structure to another (programs, files, transactions, …)
Train you to think in terms of ”description languages”.
Parsing files of some complex syntax is very common in many applications.
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 4
The big picture
. . .
RISC machine
other digital platforms, each equipped with its VM implementation RISC
machine language
Hack computer
Hack machine language CISC
machine language
CISC machine
. . .
a high-levelwritten in languageAny computer
. . .
HW lectures (Projects 1-6) Intermediate code
VM implementation
over CISC platforms
VM imp.
over RISC platforms
VM imp.
over the Hack platform VM
emulator
VM lectures (Projects 7-8) Some Other
language Jack
language
compilerSome Some Other
compiler
compilerJack
. . .
Some
language
. . .
Compiler lectures (Projects
10,11)
Modern compilers are two-tiered:
Front-end:
from high-level language to some intermediate language
Back-end:
from the
intermediate
language to
binary code.
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 5
Compiler architecture (front end)
. . .
Intermediate code
RISC machine language
Hack machine language CISC
machine
language . . . a high-levelwritten in
language
. . .
VM implementation
over CISC platforms
VM imp.
over RISC platforms
VM imp.
over the Hack platform VM emulator Some Other
language Jack
language
Some compiler Some Other
compiler Jack compiler
. . .
Some language. . .
Syntax analysis: understanding the structure of the source code
Code generation: reconstructing the semantics using the syntax of the target code.
Tokenizing: creating a stream of “atoms”
Parsing: matching the atom stream with the language grammar XML output = one way to demonstrate that the syntax analyzer
works
(Chapter 11) Jack
Program Toke-
nizer Parser Code
Gene -ration Syntax Analyzer
Jack Compiler
VM code XML code (Chapter 10)
(source)
scanner(target)
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 6
Tokenizing / Lexical analysis / scanning
Remove white space
Construct a token list (language atoms)
Things to worry about:
Language specific rules: e.g. how to treat “++”
Language-specific classifications:
keyword, symbol, identifier, integerCconstant, stringConstant,...
While we are at it, we can have the tokenizer record not only the token, but also its lexical classification (as defined by the source language grammar).
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 7
C function to split a string into tokens
char* strtok(char* str, const char* delimiters);
str: string to be broken into tokens
delimiters: string containing the delimiter characters
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 8
Jack Tokenizer
if (x < 153) {let city = ”Paris”;} Source code
<tokens>
<keyword> if </keyword>
<symbol> ( </symbol>
<identifier> x </identifier>
<symbol> < </symbol>
<integerConstant> 153 </integerConstant>
<symbol> ) </symbol>
<symbol> { </symbol>
<keyword> let </keyword>
<identifier> city </identifier>
<symbol> = </symbol>
<stringConstant> Paris </stringConstant>
<symbol> ; </symbol>
<symbol> } </symbol>
</tokens>
Tokenizer’s output
Tokenizer
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 9
Parsing
The tokenizer discussed thus far is part of a larger program called parser
Each language is characterized by a grammar.
The parser is implemented to recognize this grammar in given texts
The parsing process:
A text is given and tokenized
The parser determines weather or not the text can be generated from the grammar
In the process, the parser performs a complete structural analysis of the text
The text can be in an expression in a :
Natural language (English, …)
Programming language (Jack, …).
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 10
Parsing examples
He ate an apple on the desk.
English
ate
he an apple the desk parse
on
(5+3)*2 – sqrt(9*4)
-
5
sqrt
+ *
3
2
9 4
*
Jack
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 11
Regular expressions
a|b*
{, “a”, “b”, “bb”, “bbb”, …}
(a|b)*
{, “a”, “b”, “aa”, “ab”, “ba”, “bb”, “aaa”, …}
ab*(c|)
{a, “ac”, “ab”, “abc”, “abb”, “abbc”, …}
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 12
Lex
A computer program that generates lexical analyzers (scanners or lexers)
Commonly used with the yacc parser generator.
Structure of a Lex file
Definition section
%%
Rules section
%%
C code section
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 13
Example of a Lex file
/*** Definition section ***/
%{
/* C code to be copied verbatim */
#include <stdio.h>
%}
/* This tells flex to read only one input file */
%option noyywrap
/*** Rules section ***/
%%
[0-9]+ {
/* yytext is a string containing the matched text. */
printf("Saw an integer: %s\n", yytext);
}
.|\n { /* Ignore all other characters. */ }
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 14
Example of a Lex file
%%
/*** C Code section ***/
int main(void) {
/* Call the lexer, then quit. */
yylex();
return 0;
}
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 15
Example of a Lex file
> flex test.lex
(a file lex.yy.c with 1,763 lines is generated)
> gcc lex.yy.c
(an executable file a.out is generated)
> ./a.out < test.txt Saw an integer: 123 Saw an integer: 2 Saw an integer: 6
abc123z.!&*2gj6
test.txt
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 16
Another Lex example
%{
int num_lines = 0, num_chars = 0;
%}
%option noyywrap
%%
\n ++num_lines; ++num_chars;
. ++num_chars;
%%
main() { yylex();
printf( "# of lines = %d, # of chars = %d\n", num_lines, num_chars );
}
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 17
A more complex Lex example
%{
/* need this for the call to atof() below */
#include <math.h>
%}
%option noyywrap DIGIT [0-9]
ID [a-z][a-z0-9]*
%%
{DIGIT}+ {
printf( "An integer: %s (%d)\n", yytext, atoi( yytext ) );
}
{DIGIT}+"."{DIGIT}* {
printf( "A float: %s (%g)\n", yytext, atof( yytext ) );
}
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 18
A more complex Lex example
if|then|begin|end|procedure|function {
printf( "A keyword: %s\n", yytext );
}
{ID} printf( "An identifier: %s\n", yytext );
"+"|"-"|"="|"("|")" printf( “Symbol: %s\n", yytext );
[ \t\n]+ /* eat up whitespace */
. printf("Unrecognized char: %s\n", yytext );
%%
void main(int argc, char **argv ) {
if ( argc > 1 ) yyin = fopen( argv[1], "r" );
else yyin = stdin;
yylex();
}
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 19
A more complex Lex example
if (a+b) then foo=3.1416 else
foo=12
pascal.txt
A keyword: if Symbol: (
An identifier: a Symbol: +
An identifier: b Symbol: )
A keyword: then An identifier: foo Symbol: =
A float: 3.1416 (3.1416) An identifier: else An identifier: foo Symbol: =
An integer: 12 (12)
output
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 20
Context-free grammar
Terminals: 0, 1, #
Non-terminals: A, B
Start symbol: A
Rules:
A0A1
AB
B#
Simple (terminal) forms / complex (non-terminal) forms
Grammar = set of rules on how to construct complex forms from simpler forms
Highly recursive.
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 21
Examples of context-free grammar
S() S(S) SSS
Sa|aS|bS
strings ending with ‘a’
S x S y S S+S S S-S S S*S S S/S S (S)
(x+y)*x-x*y/(x+x)
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 22
Examples of context-free grammar
non-terminals: S, E, Elist
terminals: ID, NUM, PRINT, +, :=, (, ), ;
rules:
S S; S S ID := E
S PRINT ( Elist )
E ID E NUM E E + E
E ( S , Elist )
slide credit: David Walker
Elist E
Elist Elist , E
ID = NUM ; PRINT ( NUM ) Try to derive:
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 23
Examples of context-free grammar
non-terminals: S, E, Elist
terminals: ID, NUM, PRINT, +, :=, (, ), ;
rules:
S S; S S ID := E
S PRINT ( Elist )
E ID E NUM E E + E
E ( S , Elist )
slide credit: David Walker
Elist E
Elist Elist , E
S S ; S ID = E ; S ID = NUM ; S
ID = NUM ; PRINT ( Elist ) ID = NUM ; PRINT ( E ) ID = NUM ; PRINT ( NUM )
S S ; S
S ; PRINT ( Elist ) S ; PRINT ( E ) S ; PRINT ( NUM ) ID = E ; PRINT ( NUM ) ID = NUM ; PRINT ( NUM ) left-most derivation right-most derivation
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 24
Parse tree
Two derivations, but 1 tree
slide credit: David Walker
S S ; S ID = E ; S ID = NUM ; S
ID = NUM ; PRINT ( Elist ) ID = NUM ; PRINT ( E ) ID = NUM ; PRINT ( NUM ) S S ; S
S ; PRINT ( Elist ) S ; PRINT ( E ) S ; PRINT ( NUM ) ID = E ; PRINT ( NUM ) ID = NUM ; PRINT ( NUM )
S
S S
ID := E
NUM
NUM E
L ) PRINT (
;
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 25
Ambiguous Grammars
a grammar is ambiguous if the same sequence of tokens can give rise to two or more parse trees
non-terminals: E
terminals: ID, NUM, PLUS, MUL
rules: E ID
E NUM E E + E E E * E
characters: 4 + 5 * 6
tokens: NUM(4) PLUS NUM(5) MUL NUM(6)
slide credit: David Walker
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 26
Ambiguous Grammars
E ID E NUM E E + E E E * E
characters: 4 + 5 * 6
tokens: NUM(4) PLUS NUM(5) MUL NUM(6) E
E E
NUM(4) E
E * +
NUM(6) NUM(5)
E
E E
NUM(6) + E
E
*
NUM(5) NUM(4)
slide credit: David Walker
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 27
Ambiguous Grammars
problem: compilers use parse trees to interpret the meaning of parsed expressions
different parse trees have different meanings
eg: (4 + 5) * 6 is not 4 + (5 * 6)
languages with ambiguous grammars are
DISASTROUS; The meaning of programs isn’t well- defined! You can’t tell what your program might do!
solution: rewrite grammar to eliminate ambiguity
fold precedence rules into grammar to disambiguate
fold associativity rules into grammar to disambiguate
other tricks as well
slide credit: David Walker
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 28
Recursive descent parser
Recursive Descent Parsing
aka: predictive parsing; top-down parsing
simple, efficient
can be coded by hand in ML quickly
parses many, but not all CFGs
parses LL(1) grammars
Left-to-right parse; Leftmost-derivation; 1 symbol lookahead
key ideas:
one recursive function for each non terminal
each production becomes one clause in the function
slide credit: David Walker
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 29
Recursive descent parser
Non-terminals: S, E, L
Terminals: NUM, IF, THEN, ELSE, BEGIN, END, PRINT, =, ;
Rules:
1. S -> IF E THEN S ELSE S
2. | BEGIN S L
3. | PRINT E
4. L -> END
5. | ; S L
6. E -> NUM = NUM
slide credit: David Walker
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 30
Recursive descent parser
Non-terminals: S, E, L
Terminals: NUM, IF, THEN, ELSE, BEGIN, END, PRINT, =, ;
Rules:
1. S -> IF E THEN S ELSE S
2. | BEGIN S L
3. | PRINT E
4. L -> END
5. | ; S L
6. E -> NUM = NUM
S() {
switch (next()) { case IF:
eat(IF); E(); eat(THEN);
S(); eat(ELSE); S();
break;
case BEGIN:
eat(BEGIN); S(); L();
break;
case PRINT:
eat(PRINT); E();
break;
}
}
slide credit: David WalkerElements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 31
Recursive descent parser
Non-terminals: S, E, L
Terminals: NUM, IF, THEN, ELSE, BEGIN, END, PRINT, EQ(=), SEMI(;)
Rules:
1. S -> IF E THEN S ELSE S
2. | BEGIN S L
3. | PRINT E
4. L -> END
5. | ; S L
6. E -> NUM = NUM
L() {
switch (next()) { case END:
eat(END);
break;
case SEMI:
eat(SEMI); S(); L();
break;
default:
error();
} }
slide credit: David Walker
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 32
Recursive descent parser
Non-terminals: S, E, L
Terminals: NUM, IF, THEN, ELSE, BEGIN, END, PRINT, EQ(=), SEMI(;)
Rules:
1. S -> IF E THEN S ELSE S
2. | BEGIN S L
3. | PRINT E
4. L -> END
5. | ; S L
6. E -> NUM = NUM
E() {
eat(NUM);
eat(EQ);
eat(NUM);
}
slide credit: David Walker
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 33
Recursive descent parser
Non-terminals: S, A, E, L
Terminals: EOF, ID, NUM, ASSIGN(:=), PRINT, LPAREN((), RPAREN())
Rules:
1. S -> A EOF
2. A -> ID := E
3. | PRINT(L)
4. E -> ID
5. | NUM
6. L -> E
7. | L, E
slide credit: David Walker
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 34
Recursive descent parser
Non-terminals: S, A, E, L
Terminals: EOF, ID, NUM, ASSIGN(:=), PRINT, LPAREN((), RPAREN())
Rules:
1. S -> A EOF
2. A -> ID := E
3. | PRINT(L)
4. E -> ID
5. | NUM
6. L -> E
7. | L, E
S() {
A();
eat(EOF);
}
slide credit: David Walker
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 35
Recursive descent parser
Non-terminals: S, A, E, L
Terminals: EOF, ID, NUM, ASSIGN(:=), PRINT, LPAREN((), RPAREN())
Rules:
1. S -> A EOF
2. A -> ID := E
3. | PRINT(L)
4. E -> ID
5. | NUM
6. L -> E
7. | L, E
A() {
switch (next()) { case ID:
eat(ID); eat(ASSIGN);
E();
break;
case PRINT:
eat(PRINT); eat(LPAREN);
L(); eat(RPAREN);
break;
} }
slide credit: David Walker
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 36
Recursive descent parser
Non-terminals: S, A, E, L
Terminals: EOF, ID, NUM, ASSIGN(:=), PRINT, LPAREN((), RPAREN())
Rules:
1. S -> A EOF
2. A -> ID := E
3. | PRINT(L)
4. E -> ID
5. | NUM
6. L -> E
7. | L, E
E() {
switch (next()) { case ID:
eat(ID);
break;
case NUM:
eat(NUM);
break;
} }
slide credit: David Walker
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 37
Recursive descent parser
Non-terminals: S, A, E, L
Terminals: EOF, ID, NUM, ASSIGN(:=), PRINT, LPAREN((), RPAREN())
Rules:
1. S -> A EOF
2. A -> ID := E
3. | PRINT(L)
4. E -> ID
5. | NUM
6. L -> E
7. | L, E
L() {
switch (next()) { case ID:
???
case NUM:
???
} Problem: } E could be ID
L could be E could be ID
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 38
Recursive descent parser
Non-terminals: S, A, E, L
Terminals: EOF, ID, NUM, ASSIGN(:=), PRINT, LPAREN((), RPAREN())
Rules:
1. S -> A EOF
2. A -> ID := E
3. | PRINT(L)
4. E -> ID
5. | NUM
6. L -> E
7. | L, E
Problem:
E could be ID
L could be E could be ID L -> E M M -> , E M
|
slide credit: David Walker
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 39
A typical grammar of a typical C-like language
while (expression) { if (expression)
statement;
while (expression) { statement;
if (expression) statement;
}
while (expression) { statement;
statement;
} }
Code samples
if (expression) { statement;
while (expression) statement;
statement;
}
if (expression) if (expression)
statement;
}
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 40
A typical grammar of a typical C-like language
program: statement;
statement: whileStatement
| ifStatement
| // other statement possibilities ...
| '{' statementSequence '}'
whileStatement: 'while' '(' expression ')' statement ifStatement: simpleIf
| ifElse
simpleIf: 'if' '(' expression ')' statement ifElse: 'if' '(' expression ')' statement
'else' statement
statementSequence: '' // null, i.e. the empty sequence
| statement ';' statementSequence
expression: // definition of an expression comes here
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 41
Parse tree
while ( count <= 100 ) { count ++ . . .
statement
whileStatement
expression
statementSequence statement
;
statement statementSequence
Input Text:
while (count<=100) { /** demonstration */
count++;
// ...
Tokenized:
while ( count
<=
100 ) { count ++
; ...
program: statement;
statement: whileStatement
| ifStatement
| // other statement possibilities ...
| '{' statementSequence '}'
whileStatement: 'while'
'(' expression ')' statement ...
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 42
Recursive descent parsing
Parser implementation: a set of parsing methods, one for each rule:
parseStatement()
parseWhileStatement()
parseIfStatement()
parseStatementSequence()
parseExpression().
Highly recursive
LL(0) grammars: the first token determines in which rule we are
In other grammars you have to look ahead 1 or more tokens
Jack is almost LL(0).
while (expression) { statement;
statement;
while (expression) { while (expression)
statement;
statement;
} }
code sample
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 43
The Jack grammar
’x’: x appears verbatim x: x is a language construct x?: x appears 0 or 1 times x*: x appears 0 or more times x|y: either x or y appears (x,y): x appears, then y.
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 44
The Jack grammar
’x’: x appears verbatim
x: x is a language construct
x?: x appears 0 or 1 times
x*: x appears 0 or more times
x|y: either x or y appears
(x,y): x appears, then y.
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 45
The Jack grammar
’x’: x appears verbatim x: x is a language construct x?: x appears 0 or 1 times x*: x appears 0 or more times x|y: either x or y appears (x,y): x appears, then y.
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 46
The Jack grammar
’x’: x appears verbatim x: x is a language construct x?: x appears 0 or 1 times x*: x appears 0 or more times x|y: either x or y appears (x,y): x appears, then y.
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 47
Jack syntax analyzer in action
Class Bar {
method Fraction foo(int y) { var int temp; // a variable let temp = (xxx+12)*-63;
...
...
Syntax analyzer
With the grammar, we can write a syntax analyzer program (parser)
The syntax analyzer takes a source text file and attempts to match it on the language grammar
If successful, it can generate a parse tree in some structured format, e.g. XML.
Syntax analyzer
<varDec>
<keyword> var </keyword>
<keyword> int </keyword>
<identifier> temp </identifier>
<symbol> ; </symbol>
</varDec>
<statements>
<letStatement>
<keyword> let </keyword>
<identifier> temp </identifier>
<symbol> = </symbol>
<expression>
<term>
<symbol> ( </symbol>
<expression>
<term>
<identifier> xxx </identifier>
</term>
<symbol> + </symbol>
<term>
<int.Const.> 12 </int.Const.>
</term>
</expression>
...
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 48
Jack syntax analyzer in action
Class Bar {
method Fraction foo(int y) { var int temp; // a variable let temp = (xxx+12)*-63;
...
...
Syntax analyzer
If xxx is non-terminal, output:
<xxx>
Recursive code for the body of xxx
</xxx>
If xxx is terminal
(keyword, symbol, constant, or identifier) , output:
<xxx>
xxx value
</xxx>
<varDec>
<keyword> var </keyword>
<keyword> int </keyword>
<identifier> temp </identifier>
<symbol> ; </symbol>
</varDec>
<statements>
<letStatement>
<keyword> let </keyword>
<identifier> temp </identifier>
<symbol> = </symbol>
<expression>
<term>
<symbol> ( </symbol>
<expression>
<term>
<identifier> xxx </identifier>
</term>
<symbol> + </symbol>
<term>
<int.Const.> 12 </int.Const.>
</term>
</expression>
...
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 49
The Jack grammar
’x’: x appears verbatim x: x is a language construct x?: x appears 0 or 1 times x*: x appears 0 or more times x|y: either x or y appears (x,y): x appears, then y.
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 50
Recursive descent parser (simplified expression)
EXP TERM (OP TERM)*
TERM integer | variable
OP + | - | * | /
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 51
From parsing to code generation
EXP TERM (OP TERM)*
TERM integer | variable
OP + | - | * | /
EXP() : TERM();
while (next()==OP) OP();
TERM();
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 52
From parsing to code generation
EXP TERM (OP TERM)*
TERM integer | variable
OP + | - | * | /
EXP() : TERM();
while (next()==OP) OP();
TERM();
TERM():
switch (next()) case INT:
eat(INT);
case VAR:
eat(VAR);
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 53
From parsing to code generation
EXP TERM (OP TERM)*
TERM integer | variable
OP + | - | * | /
EXP() : TERM();
while (next()==OP) OP();
TERM();
OP():
switch (next()) case +: eat(ADD);
case -: eat(SUB);
case *: eat(MUL);
case /: eat(DIV);
TERM():
switch (next()) case INT:
eat(INT);
case VAR:
eat(VAR);
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 54
From parsing to code generation
EXP TERM (OP TERM)*
TERM integer | variable
OP + | - | * | /
EXP() : TERM();
while (next()==OP) OP();
TERM();
OP():
switch (next()) case +: eat(ADD);
case -: eat(SUB);
case *: eat(MUL);
case /: eat(DIV);
TERM():
switch (next()) case INT:
eat(INT);
case VAR:
eat(VAR);
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 55
From parsing to code generation
EXP TERM (OP TERM)*
TERM integer | variable
OP + | - | * | /
EXP() : print(‘<exp>’);
TERM();
while (next()==OP) OP();
TERM();
print(‘</exp>’);
OP(): print(‘<op>’);
switch (next()) case +: eat(ADD);
print(‘<sym> + </sym>’);
case -: eat(SUB);
print(‘<sym> - </sym>’);
case *: eat(MUL);
print(‘<sym> * </sym>’);
case /: eat(DIV);
print(‘<sym> / </sym>’);
print(‘</op>’);
TERM(): print(‘<term>’);
switch (next())
case INT: print(‘<int> next() </int>’);
eat(INT);
case VAR: print(‘<id> next() </id>’);
eat(VAR);
print(‘</term>’);
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 56
Summary and next step
(Chapter 11)
JackProgram
Toke-
nizer Parser
Code Gene -ration
Syntax Analyzer
Jack Compiler
VM code XML code
(Chapter 10)
Syntax analysis: understanding syntax
Code generation: constructing semantics
The code generation challenge:
Extend the syntax analyzer into a full-blown compiler that, instead of passive XML code, generates executable VM code
Two challenges: (a) handling data, and (b) handling commands.
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 57
Perspective
The parse tree can be constructed on the fly
The Jack language is intentionally simple:
Statement prefixes: let, do, ...
No operator priority
No error checking
Basic data types, etc.
The Jack compiler: designed to illustrate the key ideas that underlie modern compilers, leaving advanced features to more advanced courses
Richer languages require more powerful compilers
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 58