Week Target: use a Assembler translate Assembly Language to binary code. Basic Concept:
The “Assembler” is software
First software layer above the hardware
Basic Logic:
Read the next Assembly language command
Break it into the different fields it is composed of
Lookup the binary code for each field
Combine these codes into a single machine language command
Output this machine language command
example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1. Load R1, 18 <- char array == strings 👇 2. load R1 18 v s t <- 3 strings 👇 3. Command Table <- use a table to translate the command into machine language directly Load 11001 R1 01 18 000010010 ... ... 👇 4. Combination, sometime need to add other bits to it. Load + R1 + 18 = 1100101000010010 👇 5. Output
About Symbols: Assembler must replace names with address by using a Symbols-Address Table like Command Table.
Unit 6.2 The Hack Assembly Language: A Translator’s Perspective
Label symbols represent destinations of goto instructions. 3 steps:
Used to label destinations of goto commands
Declared by the pseudo-command: (xxx), and no need to translate, just ignore it.
This directive defines the symbols xxx to refer to the memory location holding the next instruction in the program, like that:
example
1 2 3 4
Symbol Value LOOP 4 STOP 18 END 22
So, translate the label symbol: @lableSymbol = @value, and here the value is just a line number.
Variable symbols
Variable symbols represent memory location where the programmer wants to maintain values. There’s a little hard to translate it.
First of all, any symbol xxx appearing in an assembly program which is not pre-defined and is not defined elsewhere using the (xxx) directive is treated as a variable.
Secondly, each variable is assigned a unique memory address, starting at 16(not a arbitrary number 👀).
So, translate variable symbols by 2 steps:
If you see it for the first time, assign a unique memory address.
Replace variable symbols with it’s value. And in order to get this value, we search the value by a Symbol Table.
Now, how do we get the Symbol Table? 3 steps:
Initialization: Add the pre-defined symbols
First pass: Add the label symbols
Second pass: Add the variable symbols
Unit 6.5 Developing a Hack Assembler: Proposed Software Architecture
Reading and parsing commands Steps: a. Start reading a file with a given name b. Move to the next command in the file c. Get the fields of current command Tips: No need to understand the meaning of anything.
Converting mnemonics -> code Use the API provided by Java/C++/Python string class to parse the instructions. Tips: No need to worry about the mnemonic fields were obtained.
Handling symbols Use the Use the Symbol Table that we have mentioned eariler translate symbols. Tips: No need to worry about what these symbols mean.
Over logic:
Initialization: Parser and Symbol Table
First Pass: Read all commands, only paying attention to labels and updating the symbol table
Second Pass: Restart reading and translating commands
Main Loop:
Get the next Assembly Language Command and parse it
For A-Commands: Translate symbols to binary addresses
For C-Commands: get code for each part and put them together
Output the resulting machine language command
Unit 6.6 Project 6 Overview(Programming Option)
Proposed design:
Parser
Code
Symbol Table
Main
Proposed Implementation:
Staged development
Develop a basic assembler that translate assembly programs without symbols
Develop an ability to handle symbols
Morph the basic assembler into an assembler that can translate any assembly program
Translating *.asm program manually✍, and it will be a boring process. So, it would be better to solve this problem by programming.
Unit 6.7 Project
As mentioned earlier, we can use Java, C++ or Python implement assembler. And here I use C++ to complete the project. I also get some template files about this project, so I just need to complete the methods that implement these classes. 🎈 Here are the files that have been implemented(can handle A/C instructions and symbols):
/** * Assembler constructor */ Assembler::Assembler() { // Your code here }
/** * Assembler destructor */ Assembler::~Assembler() { // Your code here }
/** * Assembler first pass; populates symbol table with label locations. * @param instructions An array of the assembly language instructions. * @param symbolTable The symbol table to populate. */ voidAssembler::buildSymbolTable(SymbolTable* symbolTable, string instructions[], int numOfInst){ // Your code here uint16_t lineNumber = 0; for(int i = 0; i < numOfInst; i++) { if(instructions[i][0] == '(') { string t = instructions[i].substr(1, instructions[i].size() - 2); symbolTable->addSymbol(t, lineNumber); } else { lineNumber++; } } }
/** * Assembler second pass; Translates a set of instructions to machine code. * @param instructions An array of the assembly language instructions to be converted to machine code. * @param symbolTable The symbol table to reference/update. * @return A string containing the generated machine code as lines of 16-bit binary instructions. */ string Assembler::generateMachineCode(SymbolTable* symbolTable, string instructions[], int numOfInst){ // Your code here // Only process A and C instructions string ret = ""; for(int i = 0; i < numOfInst; i++) { InstructionType t = parseInstructionType(instructions[i]); if(t == A_INSTRUCTION || t == L_INSTRUCTION) { ret = ret + '0' + translateSymbol(parseSymbol(instructions[i]), symbolTable); } elseif(t == C_INSTRUCTION) { ret += "111"; // C instructions prefix ret = ret + translateComp(parseInstructionComp(instructions[i])); ret = ret + translateDest(parseInstructionDest(instructions[i])); ret = ret + translateJump(parseInstructionJump(instructions[i])); } } return ret; }
/** * Parses the type of the provided instruction * @param instruction The assembly language representation of an instruction. * @return The type of the instruction (A_INSTRUCTION, C_INSTRUCTION, L_INSTRUCTION, NULL) */ Assembler::InstructionType Assembler::parseInstructionType(string instruction){ // Your code here: if(instruction[0] == '@' && ('0' <= instruction[1] && instruction[1] <= '9')) return A_INSTRUCTION; elseif(instruction[0] == '@' && !('0' <= instruction[1] && instruction[1] <= '9')) return L_INSTRUCTION; elseif(instruction[0] =='(') return NULL_INSTRUCTION; elsereturn C_INSTRUCTION; }
/** * Parses the destination of the provided C-instruction * @param instruction The assembly language representation of a C-instruction. * @return The destination of the instruction (A, D, M, AM, AD, MD, AMD, NULL) */ Assembler::InstructionDest Assembler::parseInstructionDest(string instruction){ // Your code here: InstructionDest ret = NULL_DEST; string::size_type idx = instruction.find("="); if(idx == string::npos) return ret; string dest = instruction.substr(0, idx); if(dest == "A") ret = A; elseif(dest == "D") ret = D; elseif(dest == "M") ret = M; elseif(dest == "AM") ret = AM; elseif(dest == "AD") ret = AD; elseif(dest == "MD") ret = MD; elseif(dest == "AMD") ret = AMD; return ret; }
/** * Parses the jump condition of the provided C-instruction * @param instruction The assembly language representation of a C-instruction. * @return The jump condition for the instruction (JLT, JGT, JEQ, JLE, JGE, JNE, JMP, NULL) */ Assembler::InstructionJump Assembler::parseInstructionJump(string instruction){ // Your code here: // for example if "JLT" appear at the comp field return enum label JLT if (instruction.find("JLT") != string::npos) { return JLT; } elseif(instruction.find("JGT") != string::npos) { return JGT; } elseif(instruction.find("JEQ") != string::npos) { return JEQ; } elseif(instruction.find("JLE") != string::npos) { return JLE; } elseif(instruction.find("JGE") != string::npos) { return JGE; } elseif(instruction.find("JNE") != string::npos) { return JNE; } elseif(instruction.find("JMP") != string::npos) { return JMP; } return NULL_JUMP; }
/** * Parses the computation/op-code of the provided C-instruction * @param instruction The assembly language representation of a C-instruction. * @return The computation/op-code of the instruction (CONST_0, ... ,D_ADD_A , ... , NULL) */ Assembler::InstructionComp Assembler::parseInstructionComp(string instruction){ // Your code here: // for example if "0" appear at the comp field return CONST_0 InstructionComp ret; string::size_type idx1 = instruction.find("="); string::size_type idx2 = instruction.find(";"); string comp; if(idx1 != string::npos && idx2 != string::npos) { comp = instruction.substr(idx1 + 1, idx2); } elseif(idx1 == string::npos && idx2 != string::npos) { comp = instruction.substr(0, idx2); } elseif(idx1 != string::npos && idx2 == string::npos) { comp = instruction.substr(idx1 + 1, instruction.length()); } else { comp = instruction; } if ("0" == comp) { ret = CONST_0; } elseif("1" == comp) { ret = CONST_1; } elseif("-1" == comp) { ret = CONST_NEG_1; } elseif("A" == comp) { ret = VAL_A; } elseif("M" == comp) { ret = VAL_M; } elseif("D" == comp) { ret = VAL_D; } elseif("!A" == comp) { ret = NOT_A; } elseif("!M" == comp) { ret = NOT_M; } elseif("!D" == comp) { ret = NOT_D; } elseif("-A" == comp) { ret = NEG_A; } elseif("-M" == comp) { ret = NEG_M; } elseif("-D" == comp) { ret = NEG_D; } elseif("A+1" == comp) { ret = A_ADD_1; } elseif("M+1" == comp) { ret = M_ADD_1; } elseif("D+1" == comp) { ret = D_ADD_1; } elseif("A-1" == comp) { ret = A_SUB_1; } elseif("M-1" == comp) { ret = M_SUB_1; } elseif("D-1" == comp) { ret = D_SUB_1; } elseif("D+A" == comp) { ret = D_ADD_A; } elseif("D+M" == comp) { ret = D_ADD_M; } elseif("D-A" == comp) { ret = D_SUB_A; } elseif("D-M" == comp) { ret = D_SUB_M; } elseif("A-D" == comp) { ret = A_SUB_D; } elseif("M-D" == comp) { ret = M_SUB_D; } elseif("D&A" == comp) { ret = D_AND_A; } elseif("D&M" == comp) { ret = D_AND_M; } elseif("D|A" == comp) { ret = D_OR_A; } elseif("D|M" == comp) { ret = D_OR_M; } return ret; }
/** * Parses the symbol of the provided A/L-instruction * @param instruction The assembly language representation of a A/L-instruction. * @return A string containing either a label name (L-instruction), * a variable name (A-instruction), or a constant integer value (A-instruction) */ string Assembler::parseSymbol(string instruction){ // Your code here: return instruction.substr(1); }
/** * Generates the binary bits of the dest part of a C-instruction * @param dest The destination of the instruction * @return A string containing the 3 binary dest bits that correspond to the given dest value. */ string Assembler::translateDest(InstructionDest dest){ // Your code here: string ret; switch(dest) { case A: ret = "100"; break; case D: ret = "010"; break; case M: ret = "001"; break; case AM: ret = "101"; break; case AD: ret = "110"; break; case MD: ret = "011"; break; case AMD: ret = "111"; break; case NULL_DEST: ret = "000"; break; default: break; } return ret; }
/** * Generates the binary bits of the jump part of a C-instruction * @param jump The jump condition for the instruction * @return A string containing the 3 binary jump bits that correspond to the given jump value. */ string Assembler::translateJump(InstructionJump jump){ // Your code here: string ret; switch(jump) { case JLT: ret = "100"; break; case JGT: ret = "001"; break; case JEQ: ret = "010"; break; case JLE: ret = "110"; break; case JGE: ret = "011"; break; case JNE: ret = "101"; break; case JMP: ret = "111"; break; case NULL_JUMP: ret = "000"; break; default: break; } return ret; }
/** * Generates the binary bits of the computation/op-code part of a C-instruction * @param comp The computation/op-code for the instruction * @return A string containing the 7 binary computation/op-code bits that correspond to the given comp value. */ string Assembler::translateComp(InstructionComp comp){ // Your code here: string ret; if (CONST_0 == comp) { ret = "0101010"; } elseif(CONST_1 == comp) { ret = "0111111"; } elseif(CONST_NEG_1 == comp) { ret = "0111010"; } elseif(VAL_A == comp) { ret = "0110000"; } elseif(VAL_M == comp) { ret = "1110000"; } elseif(VAL_D == comp) { ret = "0001100"; } elseif(NOT_A == comp) { ret = "0110001"; } elseif(NOT_M == comp) { ret = "1110001"; } elseif(NOT_D == comp) { ret = "0001101"; } elseif(NEG_A == comp) { ret = "0110011"; } elseif(NEG_M == comp) { ret = "1110011"; } elseif(NEG_D == comp) { ret = "0001111"; } elseif(A_ADD_1 == comp) { ret = "0110111"; } elseif(M_ADD_1 == comp) { ret = "1110111"; } elseif(D_ADD_1 == comp) { ret = "0011111"; } elseif(A_SUB_1 == comp) { ret = "0110010"; } elseif(M_SUB_1 == comp) { ret = "1110010"; } elseif(D_SUB_1 == comp) { ret = "0001110"; } elseif(D_ADD_A == comp) { ret = "0000010"; } elseif(D_ADD_M == comp) { ret = "1000010"; } elseif(D_SUB_A == comp) { ret = "0010011"; } elseif(D_SUB_M == comp) { ret = "1010011"; } elseif(A_SUB_D == comp) { ret = "0000111"; } elseif(M_SUB_D == comp) { ret = "1000111"; } elseif(D_AND_A == comp) { ret = "0000000"; } elseif(D_AND_M == comp) { ret = "1000000"; } elseif(D_OR_A == comp) { ret = "0010101"; } elseif(D_OR_M == comp) { ret = "1010101"; } return ret; }
/** * Generates the binary bits for an A-instruction, parsing the value, or looking up the symbol name. * @param symbol A string containing either a label name, a variable name, or a constant integer value * @param symbolTable The symbol table for looking up label/variable names * @return A string containing the 15 binary bits that correspond to the given sybmol. */ string Assembler::translateSymbol(string symbol, SymbolTable* symbolTable){ // Your code here: uint16_t n; if(!('0' <= symbol[0] && symbol[0] <= '9')) { if(symbolTable->getSymbol(symbol) == -1) { n = variableSymbolCount; symbolTable->addSymbol(symbol, variableSymbolCount); variableSymbolCount++; } else { n = symbolTable->getSymbol(symbol); } } else { n = stoi(symbol); } string binNum = ""; for(int i = 0; i < 15; i++) { if(n & 1) binNum += '1'; else binNum += '0'; n >>= 1; } reverse(binNum.begin(), binNum.end()); return binNum; }
/** * Symbol Table destructor */ SymbolTable::~SymbolTable() {}
/** * Adds a symbol to the symbol table * @param symbol The name of the symbol * @param value The address for the symbol */ voidSymbolTable::addSymbol(string symbol, uint16_t value){ // Your code here hashMap[symbol] = value; }
/** * Gets a symbol from the symbol table * @param symbol The name of the symbol * @return The address for the symbol or -1 if the symbol isn't in the table */ intSymbolTable::getSymbol(string symbol){ // Your code here if(hashMap.find(symbol) != hashMap.end()) return hashMap[symbol]; return-1; }
Can you possibly improve the symbolic Hack language without changing the binary code or the machine language which is underlying the symbolic level?
We have 2 layers of expression here: symbolic level and binary code. We can take symbolic level, and make it more user-friendly or more programmer-friendly. For example, D=M[100] can be translated to @100 and D=M.
Will I ever have to use an assembler outside school?
Vvvvvery rarely.
How was the first assembler actually written?
First, simply had to complie something by hand. You write an assembler in a high-level language and translate it by hand for the first time only into a machine language of your computer. Once you’ll finish this translation, which is extremely time-consuming, extremely annoying to do, but only needs to be done once conceptually. Then you already have a machine language that runs your compiler or your assembler or any high-level help that you want.