Unit 11.1 Compiler II: Code Generation
Program compilation:
- Each class is compiled separately
- There two compilation tasks are relatively separate and standalone:
- class-level code
- subroutine-level code(constructors, methods, functions)
Compilation challenges:
- Handling variables
- Handling expressions
- Handling flow of control
- Handling objects
- Handling arrays
The challenge: expressiong the above semantics in the VM language.
Unit 11.2 Handling Variables
In order to generate actual VM code, we must know(among other things):
- whether each variable is field, static, local or argument
- whether each variable is the first, second, third… variable of it’s kind
All the possible variables in Jack:
- class-level variables: field, static
- subroutine-level variables: argument, local
About variable properties(in Jack):
- name(identifier)
- type(int, char, boolean, class name)
- kind(field, static, local, argument)
- scope(class level, subroutine level)
🎈 When you compile anything in Jack, you always have to maintain just two symbol tables: the class-level symbol table and the subroutine-level symbol table. And the codewriter will add the variable and it’s properties to the symbol table.
✨ This is something that we don’t need at all when we write the compiler for the Jack, because in jack we have only two symbol tables.
Unit 11.3 Handlding Expressions
Generating code for expressions: a two-stage approach1
2 parse code generation
source code --------> parse tree ----------------> stack-machine code
When executed, the generated code ends up leaving the value of the expression at the top of the stack.
🎉 What we use:
🎃 Attention:
- The Jack language defines no operator priority(except for parentheses)
- The order priority is left up to the compiler’s developer
- Parentheses can always be used to enforce operator priority
Unit 11.4 Handling Flow of Control
A program typically contains multiple if and while statements.
Soultion: the compiler can ensure that generated label are unique.
The f and while statements are often nested.
Soultion: the compiler employs a recursive compilation startegy.
Unit 11.5 Handling Objects: Low-Level Aspects
A. Handling local and argument variables
local, argument:
- respresent local and argument variables
- located on the stack
Implementation:
- Base address: LCL and ARG
- Managed by the VM implementation
B. Handling object and array data
this, that:
- represent object and array data
- located on the heap
Implementation:
- Base address: THIS and THAT
- Set using
pointer 0
(this) andpointer 1
(that) - Managed by VM code
- Obecjt data is accessed via the this segment
- Array data is accessed via the that segment
- Before we use these segments, we must first anchor them using pointer
Unit 11.6 Handling Objects: Construction
A. The caller’s side: compiling new
For declaration: The compiler updates the subroutine’s symbol table, and no code is generated.
For new: The caller assumes that the constructor’s code:
- arranges a memory block to store the new object
- returns it’s base address to the caller
B. Object construction: the big picture
A constructor typically does two things:
- Arranges the creation of a new object
- Initializes the new object to some initial state
Therefore, the constructor’s code typically needs access to the object’s fields. And how to access the object’s fields:
- The constructor’s code can access the object’s data using the this segment
- But first, the constructor’s code must anchor the this segment on the object’s data, using pointer
Unit 11.7 Handling Objects: Manipulation
A. Compiling method calls
The object is always treated as the first, implicit argument.1
2
3
4
5
6
7
8
9
10// OOP languange
p1.distance(p2);
p1.getx();
obj.foo(x1, x2);
// |
// V
// Procedural language
distance(p1, p2);
getx(p1);
foo(obj, x1, x2);
B. Compiling methods
Method are designed to operate on the current object(this). Therefore, each method’s code needs access to the object’s fields. And how to access the object’s fields:
- The method’s code can access the object’s i-th field by accessing this i
- But first, the method’s code must anchor the this segment on the object’s data, using pointer
C. Compiling void methods
Unit 11.8 Handling Arrays
Array construction
Code generation:
var Array arr;
: generate no code; only effects the symbol table.let arr = Array.new(n);
: from the caller’s perspective, handled exactly like object construction.
Array manipulation
Array access
Unit 11.9 Standard Mapping Over the Virtual Machine
A. Files and subroutines mapping
B. Variables mapping
C. Arrays mapping
D. Compiling subroutines
E. Compiling subroutine calls
F. Compiling constants
G. OS classes and subroutines
H. Special OS services
Unit Completing the compiler: Proposed Implementation
A. JackCompiler
B. SymbolTable
C. VMWriter
See the book for API informations.
PS: The JackTokenizer and CompilationEngine are very similar to what we did in project 10, so we won’t elaborate it again.
Unit 11.11 Project 11
Project 11: extend the syntax analyzer into a full-scale compiler.
Here is the code:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
using namespace std;
int main(int argc, char** argv) {
if(argc != 2) {
cerr << "Input error!\nUsage: .\\JackCompiler.exe [filename or filepath]" << endl;
return 0;
}
try {
string path(argv[1]);
JackCompiler jCompiler(path);
jCompiler.doCompiling();
} catch(exception& e) {
cerr << e.what() << endl;
}
return 0;
}
✅ main.cpp: the file is very similar with another file which is same name in project 10.
1 |
|
1 |
|
✅ jacktokenizer module: no more changes.
1 |
|
1 |
|
✅ compilationengine module: change some API, but the basic logic has not changed much.
1 |
|
1 |
|
✅ symboltable module: the new module for recording different variables.
1 |
|
1 |
|
✅ vmwriter module: the new module for writing vm code to a file.
Unit 11.12 Perspective
- What would it take to generate code for a more realistically complex language?
- Typing system, inheritance, public fields…
- How difficult will it be to close the gaps between Jack in languages like Java or Python?
- support switch-case statements, char assignments …, these is not difficult.
- What is the meaning of compiler optimization?
- The compiler will generate low-level code which is efficient and optimized.