www.nand2tetris.org
Building a Modern Computer From First Principles
Assembler
Where we are at:
Assembler Chapter 6
H.L. Language
&
Operating Sys.
abstract interface
Compiler
Chapters 10 - 11
VM Translator
Chapters 7 - 8
Computer Architecture
Chapters 4 - 5
Gate Logic
Chapters 1 - 3 Electrical
Engineering
Physics Virtual
Machine
abstract interface
Software hierarchy
Assembly Language
abstract interface
Hardware hierarchy
Machine Language
abstract interface
Hardware Platform
abstract interface
Chips &
Logic Gates
abstract interface
Human Thought
Abstract design
Chapters 9, 12
Why care about assemblers?
Assemblers employ nifty programming tricks
Assemblers are the first rung up the software hierarchy ladder
An assembler is a translator of a simple language
Writing an assembler = low-impact practice for writing compilers.
cross-platform
compiling
0000000000010000 1110111111001000 0000000000010001 1110101010001000 0000000000010000 1111110000010000 0000000000000000 1111010011010000 0000000000010010 1110001100000001 0000000000010000 1111110000010000 0000000000010001
...
Target code (a text file of binary Hack code)
assemble
Assembly example
// Computes 1+...+RAM[0]
// And stored the sum in RAM[1]
@i
M=1 // i = 1
@sum
M=0 // sum = 0 (LOOP)
@i // if i>RAM[0] goto WRITE D=M
@R0 D=D‐M
@WRITE D;JGT
... // Etc.
Source code (example)
The program translation challenge
Extract the program’s semantics from the source program, using the syntax rules of the source language
Re-express the program’s semantics in the target language, using the syntax rules of the target language
Assembler = simple translator
Translates each assembly command into one or more binary machine instructions
Handles symbols (e.g. i , sum , LOOP , …).
execute
Revisiting Hack low-level programming: an example
// Computes 1+...+RAM[0]
// And stores the sum in RAM[1].
@i
M=1 // i = 1
@sum
M=0 // sum = 0 (LOOP)
@i // if i>RAM[0] goto WRITE D=M
@R0 D=D‐M
@WRITE D;JGT
@i // sum += i D=M
@sum M=D+M
@i // i++
M=M+1
@LOOP // goto LOOP 0;JMP
(WRITE)
@sum D=M
@R1
M=D // RAM[1] = the sum (END)
@END 0;JMP
Assembly program
(sum.asm
)CPU emulator screen shot after running this program
The CPU emulator allows loading and executing symbolic Hack code. It resolves all the symbolic
program generated
output user supplied
input
The assembler’s view of an assembly program
// Computes 1+...+RAM[0]
// And stores the sum in RAM[1].
@i
M=1 // i = 1
@sum
M=0 // sum = 0 (LOOP)
@i // if i>RAM[0] goto WRITE D=M
@R0 D=D‐M
@WRITE D;JGT
@i // sum += i D=M
@sum M=D+M
@i // i++
M=M+1
@LOOP // goto LOOP 0;JMP
(WRITE)
@sum D=M
@R1
M=D // RAM[1] = the sum (END)
@END 0;JMP
Assembly program
Assembly program =
a stream of text lines, each being
one of the following:
The assembler’s view of an assembly program
// Computes 1+...+RAM[0]
// And stores the sum in RAM[1].
@i
M=1 // i = 1
@sum
M=0 // sum = 0 (LOOP)
@i // if i>RAM[0] goto WRITE D=M
@R0 D=D‐M
@WRITE D;JGT
@i // sum += i D=M
@sum M=D+M
@i // i++
M=M+1
@LOOP // goto LOOP 0;JMP
(WRITE)
@sum D=M
@R1
M=D // RAM[1] = the sum (END)
@END 0;JMP
Assembly program
Assembly program =
a stream of text lines, each being one of the following:
White space
Empty lines/indentation
Line comments
In-line comments
The assembler’s view of an assembly program
// Computes 1+...+RAM[0]
// And stores the sum in RAM[1].
@i
M=1 // i = 1
@sum
M=0 // sum = 0 (LOOP)
@i // if i>RAM[0] goto WRITE D=M
@R0 D=D‐M
@WRITE D;JGT
@i // sum += i D=M
@sum M=D+M
@i // i++
M=M+1
@LOOP // goto LOOP 0;JMP
(WRITE)
@sum D=M
@R1
M=D // RAM[1] = the sum (END)
@END 0;JMP
Assembly program
Assembly program =
a stream of text lines, each being one of the following:
White space
Empty lines/indentation
Line comments
In-line comments
Instructions
A-instruction
C-instruction
The assembler’s view of an assembly program
// Computes 1+...+RAM[0]
// And stores the sum in RAM[1].
@i
M=1 // i = 1
@sum
M=0 // sum = 0 (LOOP)
@i // if i>RAM[0] goto WRITE D=M
@R0 D=D‐M
@WRITE D;JGT
@i // sum += i D=M
@sum M=D+M
@i // i++
M=M+1
@LOOP // goto LOOP 0;JMP
(WRITE)
@sum D=M
@R1
M=D // RAM[1] = the sum (END)
@END 0;JMP
Assembly program
Assembly program =
a stream of text lines, each being one of the following:
White space
Empty lines/indentation
Line comments
In-line comments
Instructions
A-instruction
C-instruction
Symbols
references
Label declaration (XXX)
The assembler’s view of an assembly program
// Computes 1+...+RAM[0]
// And stores the sum in RAM[1].
@16
M=1 // i = 1
@17
M=0 // sum = 0
@16 // if i>RAM[0] goto WRITE D=M
@0 D=D‐M
@18 D;JGT
@16 // sum += i D=M
@17 M=D+M
@16 // i++
M=M+1
@4 // goto LOOP 0;JMP
@17 D=M
@1
M=D // RAM[1] = the sum
@22 0;JMP
Assembly program
Assembly program =
a stream of text lines, each being one of the following:
White space
Empty lines/indentation
Line comments
In-line comments
Instructions
A-instruction
C-instruction
Symbols
references
Label declaration (XXX) Assume that there is
no symbol for now!
White space
// Computes 1+...+RAM[0]
// And stores the sum in RAM[1].
@16
M=1 // i = 1
@17
M=0 // sum = 0
@16 // if i>RAM[0] goto WRITE D=M
@0 D=D‐M
@18 D;JGT
@16 // sum += i D=M
@17 M=D+M
@16 // i++
M=M+1
@4 // goto LOOP 0;JMP
@17 D=M
@1
M=D // RAM[1] = the sum
@22 0;JMP
Assembly program
Assembly program =
a stream of text lines, each being one of the following:
White space
Empty lines/indentation
Line comments
In-line comments
Instructions
A-instruction
C-instruction
Symbols
references
Label declaration (XXX)
White space → ignore/remove them
@16
M=1
@17 M=0
@16 D=M
@0 D=D‐M
@18 D;JGT
@16 D=M
@17 M=D+M
@16 M=M+1
@4 0;JMP
@17 D=M
@1 M=D
@22 0;JMP
Assembly program
Assembly program =
a stream of text lines, each being one of the following:
White space
Empty lines/indentation
Line comments
In-line comments
Instructions
A-instruction
C-instruction
Symbols
references
Label declaration (XXX)
Instructions → binary encoding
@16
M=1
@17 M=0
@16 D=M
@0 D=D‐M
@18 D;JGT
@16 D=M
@17 M=D+M
@16 M=M+1
@4 0;JMP
@17 D=M
@1 M=D
@22 0;JMP
Assembly program
Assembly program =
a stream of text lines, each being one of the following:
White space
Empty lines/indentation
Line comments
In-line comments
Instructions
A-instruction
C-instruction
Symbols
references
Label declaration (XXX)
Translating / assembling A-instructions
value (v = 0 or 1)
0 v v v v v v v v v v v v v v v
Binary:
@ value // Where value is either a non-negative decimal number // or a symbol referring to such number.
Symbolic:
Translation to binary:
If value is a non-negative decimal number, simple, e.g. @16
If value is a symbol, later.
Translating / assembling C-instructions
jump dest
comp
1 1 1
a c1 c2 c3 c4 c5 c6 d1 d2 d3 j1 j2 j3dest=comp;jump // Either the dest or jump fields may be empty.
// If dest is empty, the "=" is ommitted;
// If jump is empty, the ";" is omitted.
Symbolic:
Binary:
Translating / assembling C-instructions
jump dest
comp
1 1 1
a c1 c2 c3 c4 c5 c6 d1 d2 d3 j1 j2 j3dest=comp;jump // Either the dest or jump fields may be empty.
// If dest is empty, the "=" is ommitted;
// If jump is empty, the ";" is omitted.
Symbolic:
Binary:
Example: MD=D+1
Translating / assembling C-instructions
jump dest
comp
1 1 1
a c1 c2 c3 c4 c5 c6 d1 d2 d3 j1 j2 j3dest=comp;jump // Either the dest or jump fields may be empty.
// If dest is empty, the "=" is ommitted;
// If jump is empty, the ";" is omitted.
Symbolic:
Binary:
Example: D; JGT
The overall assembly logic
For each (real) command
Parse the command,
i.e. break it into its underlying fields
A-instruction: replace the symbolic reference (if any) with the
corresponding memory address, which is a number
(how to do it, later)
C-instruction: for each field in the instruction, generate the
corresponding binary code
Assemble the translated binary codes into a complete 16-bit machine
instruction
Write the 16-bit instruction to the output file.
// Computes 1+...+RAM[0]
// And stores the sum in RAM[1].
@i
M=1 // i = 1
@sum
M=0 // sum = 0 (LOOP)
@i // if i>RAM[0] goto WRITE D=M
@0 D=D‐M
@WRITE D;JGT
@i // sum += i D=M
@sum M=D+M
@i // i++
M=M+1
@LOOP // goto LOOP 0;JMP
(WRITE)
@sum D=M
@1
M=D // RAM[1] = the sum (END)
@END 0;JMP
Assembly program
Assembly programs typically have many symbols:
Labels that mark destinations of goto commands
Labels that mark special memory locations
Variables
These symbols fall into two categories:
User–defined symbols (created by programmers)
Variables
Labels (forward reference could be a problem)
Pre-defined symbols (used by the Hack platform).
Handling symbols (aka symbol resolution)
@R0 D=M
@END D;JLE
@counter M=D
@SCREEN D=A
@x M=D (LOOP)
@x A=M M=‐1
@x D=M
@32 D=D+A
@x M=D
@counter MD=M‐1
@LOOP D;JGT (END)
@END 0;JMP
Typical symbolic Hack assembly code:
Label symbols: Used to label destinations of goto commands. Declared by the pseudo-command (XXX). This directive defines the symbol XXX to refer to the instruction memory location holding the next command in the program. (the assembler needs to maintain instrCtr)
Variable symbols: Any user-defined symbol xxx appearing in an assembly program that is not defined elsewhere using the ( xxx) directive is treated as a variable, and is automatically
assigned a unique RAM address, starting at RAM address 16 (the assembler needs to maintain
nextAddr)
(why start at 16? Later.)
By convention, Hack programmers use lower-case and upper-case to represent variable and label names, respectively
Handling symbols: user-defined symbols
@R0 D=M
@END D;JLE
@counter M=D
@SCREEN D=A
@x M=D (LOOP)
@x A=M M=‐1
@x D=M
@32 D=D+A
@x M=D
@counter MD=M‐1
@LOOP D;JGT (END)
@END 0;JMP
Typical symbolic Hack assembly code:
Virtual registers:
The symbols R0,…, R15 are automatically
predefined to refer to RAM addresses 0,…,15 I/O pointers: The symbols SCREEN and KBD are
automatically predefined to refer to RAM addresses 16384 and 24576, respectively (base addresses of the screen and keyboard memory maps)
VM control pointers: the symbols SP, LCL, ARG, THIS, and THAT (that don’t appear in the code example on the right) are automatically
predefined to refer to RAM addresses 0 to 4, respectively
(The VM control pointers, which overlap R0,…, R4 will come to play in the virtual machine
implementation, covered in the next lecture)
@R0 D=M
@END D;JLE
@counter M=D
@SCREEN D=A
@x M=D (LOOP)
@x A=M M=‐1
@x D=M
@32 D=D+A
@x M=D
@counter MD=M‐1
@LOOP D;JGT (END)
@END 0;JMP
Typical symbolic Hack assembly code:
Handling symbols: pre-defined symbols
// Computes 1+...+RAM[0]
// And stored the sum in RAM[1]
@i
M=1 // i = 1
@sum
M=0 // sum = 0 (LOOP)
@i // if i>RAM[0] goto WRITE D=M
@R0 D=D‐M
@WRITE D;JGT
@i // sum += i D=M
@sum M=D+M
@i // i++
M=M+1 @LOOP // goto LOOP 0;JMP (WRITE) @sum D=M @R1 M=D // RAM[1] = the sum (END) @END 0;JMP
Source code (example) This symbol table is generated by the assembler, and used to translate the symbolic code into binary code. Handling symbols: symbol table
R0 0R1 1
R2 2
... ... R15 15
SCREEN 16384
KBD 24576
SP 0
LCL 1
ARG 2
THIS 3
THAT 4
LOOP 4
WRITE 18
END 22
i 16 sum 17
Symbol table
R0 0
R1 1
R2 2
... R15 15
SCREEN 16384
KBD 24576
SP 0
LCL 1
ARG 2
THIS 3
THAT 4
Symbol table
// Computes 1+...+RAM[0] // And stored the sum in RAM[1] @i M=1 // i = 1 @sum M=0 // sum = 0 (LOOP) @i // if i>RAM[0] goto WRITE D=M @R0 D=D‐M @WRITE D;JGT @i // sum += i D=M @sum M=D+M @i // i++M=M+1
@LOOP // goto LOOP 0;JMP
(WRITE)
@sum D=M
@R1
M=D // RAM[1] = the sum (END)
@END 0;JMP
Source code (example)
Initialization: create an empty symbol table and populate it with all the pre-defined symbols
Handling symbols: constructing the symbol table
R0 0
R1 1
R2 2
... R15 15
SCREEN 16384
KBD 24576
SP 0
LCL 1
ARG 2
THIS 3
THAT 4
LOOP 4
WRITE 18
END 22
Symbol table
// Computes 1+...+RAM[0] // And stored the sum in RAM[1] @i M=1 // i = 1 @sum M=0 // sum = 0 (LOOP) @i // if i>RAM[0] goto WRITE D=M @R0 D=D‐M @WRITE D;JGT @i // sum += i D=M @sum M=D+M @i // i++M=M+1
@LOOP // goto LOOP 0;JMP
(WRITE)
@sum D=M
@R1
M=D // RAM[1] = the sum (END)
@END 0;JMP
Source code (example)
Initialization: create an empty symbol table and populate it with all the pre-defined symbols First pass: go through the entire source code, and add all the user-defined label symbols to the symbol table (without generating any code)
Handling symbols: constructing the symbol table
R0 0
R1 1
R2 2
... R15 15
SCREEN 16384
KBD 24576
SP 0
LCL 1
ARG 2
THIS 3
THAT 4
LOOP 4
WRITE 18
END 22
i 16 sum 17
Symbol table
// Computes 1+...+RAM[0] // And stored the sum in RAM[1] @i M=1 // i = 1 @sum M=0 // sum = 0 (LOOP) @i // if i>RAM[0] goto WRITE D=M @R0 D=D‐M @WRITE D;JGT @i // sum += i D=M @sum M=D+M @i // i++M=M+1
@LOOP // goto LOOP 0;JMP
(WRITE)
@sum D=M
@R1
M=D // RAM[1] = the sum (END)
@END 0;JMP
Source code (example)
Initialization: create an empty symbol table and populate it with all the pre-defined symbols First pass: go through the entire source code, and add all the user-defined label symbols to the symbol table (without generating any code) Second pass: go again through the source code, and use the symbol table to translate all the commands. In the process, handle all the user-
Handling symbols: constructing the symbol table
R0 0
R1 1
R2 2
... R15 15
SCREEN 16384
KBD 24576
SP 0
LCL 1
ARG 2
THIS 3
THAT 4
Symbol table
// Computes 1+...+RAM[0] // And stored the sum in RAM[1] @i M=1 // i = 1 @sum M=0 // sum = 0 (LOOP) @i // if i>RAM[0] goto WRITE D=M @R0 D=D‐M @WRITE D;JGT @i // sum += i D=M @sum M=D+M @i // i++M=M+1
@LOOP // goto LOOP 0;JMP
(WRITE)
@sum D=M
@R1
M=D // RAM[1] = the sum (END)
@END 0;JMP
Source code (example)
Handling symbols: constructing the symbol table (one-pass solution?)
The assembly process (detailed)
Initialization: create the symbol table and initialize it with the pre- defined symbols
First pass: march through the source code without generating any code.
For each label declaration (LABEL) that appears in the source code,
add the pair <LABEL , n > to the symbol table
The assembly process (detailed)
Second pass: march again through the source, and process each line:
If the line is a C-instruction, simple
If the line is @xxx where xxx is a number, simple
If the line is @xxx and xxx is a symbol, look it up in the symbol table and proceed as follows:
If the symbol is found, replace it with its numeric value and complete the command’s translation
If the symbol is not found, then it must represent a new variable:
add the pair <xxx , n > to the symbol table, where n is the next available RAM address, and complete the command’s
translation.
(Platform design decision: the allocated RAM addresses are
running, starting at address 16).
Note that comment lines and pseudo-commands (label declarations) generate no code.
0000000000010000 1110111111001000 0000000000010001 1110101010001000 0000000000010000 1111110000010000 0000000000000000 1111010011010000 0000000000010010 1110001100000001 0000000000010000 1111110000010000 0000000000010001 1111000010001000 0000000000010000 1111110111001000 0000000000000100 1110101010000111 0000000000010001 1111110000010000 0000000000000001 1110001100001000 0000000000010110 1110101010000111
Target code
assemble
The result ...
// Computes 1+...+RAM[0]
// And stored the sum in RAM[1]
@i
M=1 // i = 1
@sum
M=0 // sum = 0 (LOOP)
@i // if i>RAM[0] goto WRITE D=M
@R0 D=D‐M
@WRITE D;JGT
@i // sum += i D=M
@sum M=D+M
@i // i++
M=M+1
@LOOP // goto LOOP 0;JMP
(WRITE)
@sum D=M
@R1
M=D // RAM[1] = the sum (END)
@END 0;JMP