整理之前的笔记...😪

两年前，这门课程的下半部分基本上都学完了，留了一点小尾巴，当时是直接在 iPad 上做的笔记，本来想着时候重新整理一下，结果就是不断放鸽子...（自己的拖延症真的无敌😵）。正好最近有空闲的时间，干脆就把之前的笔记整理好吧。

嗯，另外由于当时做的笔记是直接抄的老师 PPT，所以是英文的，但本文应该会采用中英混合的方式进行描述，估计读起来会有点累（不管了，自己写的舒服就行了😁）。不过，整体的行文脉络会与原来一致。
好了，废话少说。

Unit 7.1 Program Compilation Preview

A. Two-tier compilation(like Java)

              Java compiler
Java program ---------------> VM code(bytecode in Java)
                                        |
                                        |
                                        V
                                  JVM implementation

The VM code is an abstraction: write once, run anywhere.

B. Jack compilation(Java-like)

              Jack compiler
Jack program ---------------> VM code
                                |
                          ------ ------
                         |             |
                         V             V
                    VM emulator   Hack Computer

这一小节主要讲解的内容是 Jack 这门语言的一些特点：Jack 和 Java 很相似，也是采用“双层编译”的思想来设计的，二者都需要将高级语言编写的代码转化为虚拟机代码（VM code）。在 Java 中，虚拟机代码的形式是字节码，执行起来速度比较快；但在 Jack 中，虚拟机代码的形式是一种像是用虚拟机语言编写出来的代码，理解起来比较简单直观。
另外，在这门课程下半部分的前两个模块，会设计一个 VM translator。

Unit 7.2 VM Abstraction: The Stack

A. Stack

⭐Stack operations:

push: add a plate at the stack’s top.
pop: remove the plate

首先介绍的是在虚拟机中使用最多的一种抽象数据结构——栈（这个数据结构大家基本都熟悉的不能再熟悉了😂），它有两个操作：push和pop，具体这里就不解释了。

B. Stack arithmetic

Apply a function f on the stack:

Pops the arguments from the stack.
Computes f on the arguments.
Pushes the result onto the stack.

这里主要在介绍如何使用栈进行运算，一样不细说了，然后引出了关于抽象和实现的观点。

Abstraction/Implementation:

The high-level language is an abstraction.
It can be implemented by a stack machine.
The stack machine is an abstraction.
It can implemented by … Stay tuned.

简言之就是高级语言的抽象可以通过栈来实现，而栈又可以通过更低级的方法来实现（这是一种自顶向下的解决问题的思路）。

C. The Stack machine model

Stack machine, manipulated by:

Arithmetic/Logical commands
Memory segment commands
Branching commands
Function commands

整个栈机器的模型由上述这四部分组成，也就对应着虚拟机语言的四种指令。

D. Arithmetic/Logical commands

Command	Expression	Return value
add	x + y	integer
sub	x - y	integer
neg	-y	integer
eq	x == 0	boolean
gt	x > y	boolean
lt	x < y	boolean
and	x and y	boolean
or	x or y	boolean
not	x not y	boolean

Observation: Any arithmetic or logical expression can be expressed and evaluated by applying some sequence of the above on a stack.

表格中就是本门课程的虚拟机语言所支持的所有算术运算，任意的算术运算和逻辑运算都可以在一个栈上被上述九种基本运算经过特定的计算顺序而实现。

Unit 7.3 VM Abstraction：Memory Segments

A. Variable kinds

a. Argument variables <-(mapping)-> argument segments
b. Local variables <-(mapping)-> local segments
c. Stack variables <-(mapping)-> static segments

总共有三种变量类型，对应会映射到三个虚拟的内存段上。

B. Memory segments

8 segments: argument/local/static/constant/this/that/pointer/temp
The VM language syntax: push/pop segment i
❕ Attention: i is a non-negative integer.

VM 语言有两个指令：push和pop，这两个指令的语法如上所示。另外，这八个内存段都是虚拟的，不是真实存在的。

Unit 7.4 VM Implementation: the Stack

A. The VM abstraction

As we mentioned earlier, 8 segments that we have seen is completely imaginary.

B. Pointer manipulation

For example:

1	D = *p // D becomes 23

can be translate:

1
2
3

@p
A=M
D=M

这一部分主要在讲指针的概念和基本操作，有 C 语言基础的话，还是比较简单的。

C. Stack machine

Assumptions:

SP stored in RAM[0]
Stack base addr = 256

按照上面的假设，在 Hack 中，SP 是栈的栈顶指针，保存在 RAM[0]（物理地址 0）这个位置，默认从 256 开始的，

so, push constant 17 can be express in logically psuedo code:

1 2	*SP = 17 SP++

Further, translate to Hack assembly:

@17 // D = 17
D=A
@SP // *SP = D
A=M
M=D
@SP // SP++
M=M+1

这一部分主要在讲如何把一个push指令翻译为前面学过的汇编语言，理解这个后，就可以推广到所有的算术指令。

D. VM translator perspective

VM code translate Assembly code by VM Translator.
So, what is the VM Translator?
a. A program that translate VM code into machine language.
b. Each VM command generates several assembly commands.

Unit 7.5 VM Implementation: Memory Segments

A. Implementing local

LCL is just a constant number like SP, and it’s also a base address.
Our Goals:


pop local i  --  VM Translator    -- addr = LCL + i, SP--, *addr = *sp
               |---------------> |
push local i --                   -- addr = LCL + i, *SP = *addr, SP--

这个部分需要我们思考如何将pop/push指令转换为汇编代码，上述代码块中的内容是老师给出的伪码，可以参考这个来手动转换，然后再写到工程代码中即可。大致的过程就是这个样子，具体的代码都写在整个工程中了，这里就不展开了。

B. Implement local, argument, this, that

When translating the high-level code of some method into VM code, the compiler:
a. maps the method’s local and argument variables onto the local and argument segments
b. maps the object fields and the array entries that the method is currently processing onto the this and that segments
So, local, argument, this and that are implemented precisely by same way.

argument/this/that这三个内存段的实现方式实际上与local是一样的，所以可以采用同样的方式来实现。

C. Memory segment: constant

When translating the high-level code of some method into VM code, the compiler:
c. it translates high-level operations involving constants into VM operation involving the constant segment.

1	push constant i ---> *SP = i, SP++

对于const这个内存段而言，只有push指令没有pop指令。

D. Memory segment: static

When translating the high-level code of some method into VM code, the compiler:
d. maps the static variables that the method sees onto the static segment.

The challenge: static variables should be seen by all the methods in a program.
Solution: Store them in some “global space”.

Have the VM translator translate each VM reference static i(int file Foo.vm) into an assembly reference Foo.i.
Following assembly, the Hack assembler will map these references onto RAM[16], RAM[17], …, RAM[255].
Therefore, the entries of the static segment will end up being mapped onto RAM[16], RAM[17], …, RAM[255], in the order in which they appear in the program.

这块是在讲解用于存储静态变量的 static 内存段的实现，实际上 static 这个虚拟内存段被映射到实地址段 16-255 了。

E. Memory segment: temp

When translating the high-level code of some method into VM code, the compiler:
e. Sometimes needs to use some variables for temporary storage.
f. Our VM provides 8 such temporary variables

temp implementation: Mapped and RAM locations 5 to 12.

1
2
3

pop temp i  --  VM Translator     -- addr = 5 + i, SP--, *addr = *SP
              |---------------> |
push temp i --                    -- addr = 5 + i, *SP = *addr, SP--

temp 内存段一共有八个地址，被映射到了地址 5-12。

F. Memory segment: pointer

When translating the high-level code of some method into VM code, the compiler:
g. generates code that keeps track of the base addresses of the this and that segments using the pointer segment.

A fixed, 2-place segment:

accessing pointer 0 should be result in accessing THIS
accessing pointer 1 should be result in accessing THAT

1
2
3

push temp i --  VM Translator    -- *SP = THIS/THAT, SP++
              |---------------> |
pop temp i  --                   -- SP--, THIS/THAT = *SP

THIS是对象的自身的地址，THAT是数组的首地址。

Summary

In this chapter, we need to implement:

Arithmetic/Logical commands: add, sub, neg, eq, gt, lt, and, or, not.
Memory access commands: pop/push segment i.

Unit 7.6 The VM Emulator

Typical uses of the VM Emulator:

Running(compiled) Jack programs
Testing programs(systematically)
Experimenting with VM commands
Observing the VM internals(stack, memory and segments)

这一小节主要在介绍 VM Emulator 的使用方法，其实与之前的工具差不多。

Unit 7.7 VM Implementation on the Hack Platform

In order to write a VM translator, we must be familiar with:

the source language
the target language
the VM mapping on the target platform

A. Source: VM language
Arithmetic/Logical commands: add, sub, neg, eq, gt, lt, and, or, not
Memory access commands
pop/push segment i

B. Target: symbolic Hack code
A instructions and C instructions.

C. Standard VM mapping on the Hack Platform
VM mapping decisions:

How to map the VM’s data structures using the host hardware platform
How to express the VM’s commands using the host machine language

Standard mapping:

Specifies how to do the mapping in an agreed-upon way
Benefits:
- Compatibility with other software systems
- Standard testing

special_variables

这一小节主要介绍的内容是如何做一个 VM Translator，需要具备三点：

对源语言熟悉
对目标语言熟悉
对目标平台的地址映射关系要熟悉

Unit 7.8 VM Translator: Proposed Implementation

Prose design:

Parser: parses each VM command into its lexical elements
CodeWriter: write the assembly code that implements the parsed command
Main: drives the process(VMTranslator)

Main(VMTranslator):
Input: fileName.vm
Output: fileName.asm

Main logic:

Constructors a Parser to handle the input file
Constructors a CodeWriter to handle the output file
Marches through the input file, parsing each line and generating code form it.

这一小节主要是在介绍 VM Translator 的一些功能和设计思路，详细的设计思路可以看老师的讲解或者查阅老师给的资料。

Unit 7.9 Building the VM Translator, Part I

这个小节主要再讲这个章节需要完成的任务及对应的测试方法。在 VM Translator 做好之后，需要将 Project7 中每个目录下的.vm文件翻译成.asm文件，然后使用 CPUEmulator 加载翻译后的.asm和.tst脚本文件，执行完成后比对生成的.out文件和.cmp文件即可。

Unit 7.10 Perspectives

You mentioned that the VM is a rather old idea. How old is it?

1970s - 1980s

How close is our VM to Java’s JVM?

Both language are stack based, both use their push and pop and both access their memory using virtual memory segments instead of symbolic variables.

About efficiency
In reality, developers of VM Translator work very hard to generate low level code which is a tight and efficient as possible. And that’s something that up until now in nand2tetris part 2 we have completely ignored.
Also, the stack architecture that we use so carefully in this course, is not really a necessary ingredient of a virtual machine.

Unit 7.11 Project

如前所说，这个章节的任务是实现一个简化版的 VM Translator，只需要支持九个算数指令（Arithmetic Commands）和push/pop指令即可。按照老师的讲解，这个 VM Translator 由两个部分组成：Parser 和 CodeWriter，具体的功能说明这里不再提及了，直接看代码吧。

main.cpp

#include <algorithm>
#include <iostream>
#include <windows.h>

#include "parser.h"
#include "codewriter.h"

using namespace std;

void parserPath(vector<string>& filenames, string path) {
    bool isDir = true;
    for(int i = path.length() - 1; i >= 0; --i) {
        if(path[i] == '\\') {
            break;
        } else if(path[i] == '.') {
            isDir = false;
        }
    }
    if(isDir) {
        path.append("\\*");
        WIN32_FIND_DATAA data;
        HANDLE hFind = FindFirstFileA(path.c_str(), &data);
        do {
            string t(data.cFileName);
            auto it = t.find(".vm");
            if(it != string::npos) {
                filenames.push_back(t.substr(0, it));
            }
        } while(FindNextFileA(hFind, &data));
    } else {
        auto it1 = path.find_last_of("\\");
        auto it2 = path.find_last_of(".");
        filenames.push_back(path.substr(it1 + 1, it2 - it1 - 1));
    }
}

int main(int argc, char** argv) {
    if(argc > 1) {
        string path(argv[1]);
        vector<string> filenames;
        parserPath(filenames, path);
        if(filenames.size() == 1) {
            string newfilepath = path.substr(0, path.find(".vm"));
            newfilepath.append(".asm");
            Parser parser(path);
            CodeWriter codewriter(newfilepath);
            codewriter.setFileName(filenames[0]);
            while(parser.hasMoreCommands()) {
                parser.advance();
                CommandType pcmd = parser.commandType();
                if(pcmd == C_ARITHMETIC) {
                    codewriter.writeArithmetic(parser.arg1());
                } else if(pcmd == C_PUSH || pcmd == C_POP) {
                    codewriter.writePushPop(pcmd, parser.arg2(), parser.arg3());
                }
            }
        } else {

        }

    } else {
        cout << "please input .vm file path or directory" << endl;
    }

    return 0;
}

在阅读这门课程的书籍时，发现书本上对 VM Translator 的要求是既可以解析单个.vm文件，也可以解析目录下所有.vm文件并保存为一个.asm文件，所以在上述文件中，做好了 windows 系统下对文件名及目录的解析。不过，在看了本章需要测试的内容后，发现都是单个文件，索性就暂时先不写多文件解析的相关代码吧，偷个懒~。

commandtype.h

#ifndef __COMMANDTYPE_H__
#define __COMMANDTYPE_H__

enum CommandType {
    C_UNKONWN,
    C_ARITHMETIC,
    C_PUSH,
    C_POP,
    C_LABEL,
    C_GOTO,
    C_IF,
    C_FUNCTION,
    C_RETURN,
    C_CALL,
};

#endif

parser.h

#ifndef __PARSER_H__
#define __PARSER_H__

#include <fstream>
#include <string>
#include <vector>

#include "commandtype.h"

using namespace std;

class Parser {
public:
    Parser(const string& filename);
    ~Parser();
    bool hasMoreCommands();
    void advance();
    CommandType commandType();
    string arg1();
    string arg2();
    uint16_t arg3();

private:
    ifstream m_ifs;
    vector<string> m_tokens;
};

#endif

parser.cpp

#include "parser.h"

#include <exception>
#include <iostream>
#include <regex>
#include <sstream>

Parser::Parser(const string& filename) {
    m_ifs.open(filename, ios::in);
    if(!m_ifs) {
        throw runtime_error("Failed to open file: " + filename);
    }
}

Parser::~Parser() {
    m_ifs.close();
}

bool Parser::hasMoreCommands() {
    return !m_ifs.eof();
}

void Parser::advance() {
    if(!hasMoreCommands())
        return;
    
    m_tokens.clear();
    string line;
    while(getline(m_ifs, line)) {
        regex endl_re("\\r*\\n+");
        regex space_re("\\s+");
        line = regex_replace(line, endl_re, "");
        line = regex_replace(line, space_re, " ");
        stringstream ss(line);
        string token;
        while(getline(ss, token, ' ')){
            if(token == "//") break;
            m_tokens.push_back(token);
        }
        if(!m_tokens.empty()) break;
    }
}

CommandType Parser::commandType() {
    CommandType ret = C_UNKONWN;
#define XX(str, type) \
    if(m_tokens[0] == str) \
        ret = type;

    if(m_tokens.size() == 1) {
        XX("add", C_ARITHMETIC);
        XX("sub", C_ARITHMETIC);
        XX("neg", C_ARITHMETIC);
        XX("eq", C_ARITHMETIC);
        XX("gt", C_ARITHMETIC);
        XX("lt", C_ARITHMETIC);
        XX("and", C_ARITHMETIC);
        XX("or", C_ARITHMETIC);
        XX("not", C_ARITHMETIC);

        XX("return", C_RETURN);
    } else if(m_tokens.size() == 2) {
        XX("label", C_LABEL);
        XX("goto", C_GOTO);
        XX("if-goto", C_IF);
    } else if(m_tokens.size() == 3) {
        XX("push", C_PUSH);
        XX("pop", C_POP);
        XX("function", C_FUNCTION);
        XX("call", C_CALL);
    }

#undef XX
    return ret;
}

string Parser::arg1() {
    return m_tokens[0];
}

string Parser::arg2() {
    CommandType cmd = commandType();
    stringstream ss;
    if(cmd == C_PUSH || cmd == C_POP || cmd == C_FUNCTION || cmd == C_CALL) {
        ss << m_tokens[1];
    }
    return ss.str();
}

uint16_t Parser::arg3() {
    CommandType cmd = commandType();
    uint16_t ret = 0;
    if(cmd == C_PUSH || cmd == C_POP || cmd == C_FUNCTION || cmd == C_CALL) {
        ret = stoul(m_tokens[2]);
    }
    return ret;
}

codewriter.h

#ifndef __CODEWRITER_H__
#define __CODEWRITER_H__

#include <fstream>
#include <string>

#include "commandtype.h"

using namespace std;

class CodeWriter {
public:
    CodeWriter(const string& filename);
    ~CodeWriter();
    static void setFileName(const string& filename);
    void writeArithmetic(const string& command);
    void writePushPop(CommandType command, const string& segment, uint16_t index);
    void Close();

private:
    static string transName(const string &segment);

private:
    static string cd_add();
    static string cd_sub();
    static string cd_neg();
    static string cd_eq();
    static string cd_gt();
    static string cd_lt();
    static string cd_and();
    static string cd_or();
    static string cd_not();

    static string cd_push(const string& segment, uint16_t index);
    static string cd_pop(const string& segment, uint16_t index);

private:
    ofstream m_ofs;
    static string m_filename;
};

#endif

codewriter.cpp

#include "codewriter.h"

#include <functional>
#include <iostream>
#include <map>
#include <sstream>

static uint16_t labelCount = 0;
string CodeWriter::m_filename;

CodeWriter::CodeWriter(const string& filename) {
    m_ofs.open(filename, ios::out);
    if(!m_ofs) {
        throw runtime_error("Failed to open file: " + filename);
    }
}

CodeWriter::~CodeWriter() {
    m_ofs.close();
}

void CodeWriter::setFileName(const string& filename) {
    m_filename = filename;
}

void CodeWriter::writeArithmetic(const string& command) {
    static map<string, function<string()>> s_arithmetics = {
        {"add", cd_add},
        {"sub", cd_sub},
        {"neg", cd_neg},
        {"eq", cd_eq},
        {"gt", cd_gt},
        {"lt", cd_lt},
        {"and", cd_and},
        {"or", cd_or},
        {"not", cd_not},
    };
    m_ofs << s_arithmetics[command]() << '\n';
}

void CodeWriter::writePushPop(CommandType command, const string& segment, uint16_t index) {
    function<string(const string&, int)> cb;
    if(command == C_PUSH) {
        cb = cd_push;
    } else if(command == C_POP) {
        cb = cd_pop;
    }
    m_ofs << cb(segment, index) << '\n';
}

void CodeWriter::Close() {
    m_ofs.close();
}

string CodeWriter::transName(const string &segment) {
    stringstream ss;
    if(segment == "local") ss << "LCL";
    else if(segment == "argument") ss << "ARG";
    else if(segment == "this") ss << "THIS";
    else if(segment == "that") ss << "THAT";

    return ss.str();
}

string CodeWriter::cd_add() {
    return "@SP\nAM=M-1\nD=M\nA=A-1\nM=D+M\n";
}

string CodeWriter::cd_sub() {
    return "@SP\nAM=M-1\nD=M\nA=A-1\nM=M-D\n";
}

string CodeWriter::cd_neg() {
    return "@SP\nA=M-1\nM=-M\n";
}

string CodeWriter::cd_eq() {
    stringstream ss;
    ss << "@SP\nAM=M-1\nD=M\nA=A-1\nD=M-D\nM=-1\n" 
       << "@eqCMPx" << to_string(labelCount) 
       << "\nD;JEQ\n@SP\nA=M-1\nM=0\n(eqCMPx" 
       << to_string(labelCount) + ")\n";
    ++labelCount;
    return ss.str();
}

string CodeWriter::cd_gt() {
    stringstream ss;
    ss << "@SP\nAM=M-1\nD=M\nA=A-1\nD=M-D\nM=-1\n"
       << "@gtCMPx" + to_string(labelCount)
       << "\nD;JGT\n@SP\nA=M-1\nM=0\n(gtCMPx"
       << to_string(labelCount) + ")\n";
    ++labelCount;
    return ss.str();
}

string CodeWriter::cd_lt() {
    stringstream ss;
    ss << "@SP\nAM=M-1\nD=M\nA=A-1\nD=M-D\nM=-1\n"
       << "@ltCMPx" + to_string(labelCount)
       << "\nD;JLT\n@SP\nA=M-1\nM=0\n(ltCMPx"
       << to_string(labelCount) + ")\n";
    ++labelCount;
    return ss.str();
}

string CodeWriter::cd_and() {
    return "@SP\nAM=M-1\nD=M\nA=A-1\nM=D&M\n";
}

string CodeWriter::cd_or() {
    return "@SP\nAM=M-1\nD=M\nA=A-1\nM=D|M\n";
}

string CodeWriter::cd_not() {
    return "@SP\nA=M-1\nM=!M\n";
}

string CodeWriter::cd_push(const string& segment, uint16_t index) {
    stringstream ss;
    if(segment == "local" || segment == "argument" || segment == "this" || segment == "that") {
        ss << '@' << transName(segment) << "\nD=M\n@" << to_string(index)
           << "\nA=D+A\nD=M\n@SP\nA=M\nM=D\n@SP\nM=M+1\n";
    } else if(segment == "constant") {
        ss << '@' << to_string(index) 
           << "\nD=A\n@SP\nA=M\nM=D\n@SP\nM=M+1\n";
    } else if(segment == "static") {
        ss << '@' << m_filename << '.' << to_string(index)
           << "\nD=M\n@SP\nA=M\nM=D\n@SP\nM=M+1\n";
    } else if(segment == "temp") {
        ss << "@5\nD=A\n@" << to_string(index) << "\nA=D+A\n"
           << "D=M\n@SP\nA=M\nM=D\n@SP\nM=M+1\n";
    } else if(segment == "pointer") {
        if(index == 0) ss << "@THIS";
        else ss << "@THAT";
        ss << "\nD=M\n@SP\nA=M\nM=D\n@SP\nM=M+1\n";
    }
    return ss.str();
}

string CodeWriter::cd_pop(const string& segment, uint16_t index) {
    stringstream ss;
    if(segment == "local" || segment == "argument" || segment == "this" || segment == "that") {
        ss << '@' << transName(segment) << "\nD=M\n@" << to_string(index)
           << "\nD=D+A\n@R13\nM=D\n@SP\nAM=M-1\nD=M\n@R13\nA=M\nM=D\n";
    } else if(segment == "static") {
        ss << "@SP\nAM=M-1\nD=M\n@" << m_filename << '.' << to_string(index)
           << "\nM=D\n";
    } else if(segment == "temp") {
        ss << "@5\nD=A\n@" << to_string(index) << "\nD=D+A\n"
           << "@R13\nM=D\n@SP\nAM=M-1\nD=M\n@R13\nA=M\nM=D\n";
    } else if(segment == "pointer") {
        ss << "@SP\nAM=M-1\nD=M\n";
        if(index == 0) ss << "@THIS\n";
        else ss << "@THAT\n";
        ss << "M=D\n";
    }
    return ss.str();
}

在做 CodeWriter 这个模块之前，建议先手动写出从 VM 语言转换到 Hack Assembly 的代码，然后再合并到实际的项目代码中。

Summary

最后总结一下，这一章实现了一个虚拟机语言翻译器，功能是将 Hack 平台的 VM Language 翻译为 Hack Assembly Language，目前只支持算数指令和push/pop指令，所以只是个半成品，下一章会将剩余功能补全。
完成这个项目时，建议结合老师给的.vm文件入手，先用 VMEmulator 模拟一下 vm 代码的实际行为，然后根据这些行为，手动翻译出对应的汇编代码，再合并到项目中即可。
另外，有些指令的设计思路其实是很类似的，照葫芦画瓢即可。不过，要是不熟悉这门课程的汇编语言，那估计还是有点难的。
好了，先这样吧。

PS：中英混合着写笔记感觉怪怪的，有时候会感觉到思路不连贯🤔...