Assembly Language Programming Assemble each instruction in a console32 program,
ID: 3863624 • Letter: A
Question
Assembly Language Programming
Assemble each instruction in a console32 program, examine the listing file, and explain how each field of the ModR/M byte and each field in the SIB byte relates to the corresponding part of the assembly language source code.
(a) add 1000 Cebx +4 esi], edx
Explanation / Answer
THE ASSEMBLY PROCESS -------------------- -- a computer understands machine code -- people (and compilers) write assembly language assembly ----------------- machine source --> | assembler | --> code code ----------------- an assembler is a program -- a very deterministic program -- it translates each instruction to its machine code. in the past, there was a one-to-one correspondence between assembly language instructions and machine language instructions. this is no longer the case. Assemblers are now-a-days made more powerful, and can "rework" code. The Pentium (being based on the 8086) has a one-to-one correspondence between assembly language instructions and machine language instructions. ASSEMBLY --------- the assembler's job is to 1. assign addresses 2. generate machine code an assembler will -- assign addresses -- generate machine code -- generate an image of what memory must look like for the program to be executed. a simple assembler will make 2 complete passes over the data (source code) to complete this task. pass 1: create complete SYMBOL TABLE generate machine code for instructions other than branches, jumps, call, lea, etc. (those instructions that rely on an address for their machine code). pass 2: complete machine code for instructions that didn't get finished in pass 1. assembler starts at the top of the source code program, and SCANS. It looks for -- directives (.data .code .stack .486, etc. ) -- instructions IMPORTANT: there are separate memory spaces for data and instructions. the assembler allocates them IN SEQENTIAL ORDER as it scans through the source code program. the starting addresses are fixed -- ANY program will be assembled to have data and instructions that start at the same address. Generating Machine Code for an Instruction ------------------------------------------ This is complex due to the large variety of addressing modes combined with the large number of instructions. Most often, the machine code for an instruction consists of 1) an 8-bit opcode (the choice of opcode will depend somewhat on the addressing modes used for the operands) followed by 2) one or more bytes describing the addressing modes for the operands. EXAMPLE INSTRUCTION: add eax, 24 Find Appendix C. That is where all this machine code stuff is specified. For the add instruction, the table lists: add reg, r/m 03 /r r/m, reg 01 /r r/m, immed 81 /0 id The only one that would match the operand types is the third one in the list: add r/m, immed 81 /0 id So, this is the one we choose. The 81 is the 8-bit opcode. What follows the opcode is information about the addressing mode of the 2 operands (add always has exactly 2 operands and the addressing mode of each must be explicitly specified) Commonly, both operands are described (exactly) by the encoding of a single byte that Intel calls the ModR/M byte. Within the machine code description (81 /0 id), the /0 symbol describes part of this ModR/M byte. /0 is found in the explanations table on page 352. It says the reg field of the ModR/M byte is 000. The ModR/M byte: This byte describes the addressing mode of operands. It is divided up into 3 fields as follow BITS 7 6 5 4 3 2 1 0 mod reg/opcode r/m For this example instruction, bits 5, 4, 3 are set to be 000, giving BITS 7 6 5 4 3 2 1 0 mod reg/opcode r/m 0 0 0 This tells that the second operand is an immediate. The description of the first operand will be done with the mode and r/m fields of the ModR/M byte. Look in the table (page 353) to find register mode, using register EAX (since that is what the example instruction has). Table says that Mod is 11, R/M is 000 giving BITS 7 6 5 4 3 2 1 0 mod reg/opcode r/m 1 1 0 0 0 0 0 0 The last step is to get the id part. From the explanation table (page 352), id is described as 32-bit immediate. Therefore id corresponds to a 32-bit two's complement representation of the value 24. This is 0000 0000 0000 0000 0000 0000 0001 1000 In hex, this is 0x00000018. Putting all this stuff together, we get machine code for the example instruction ( add eax, 24 ) Note that everything is in hexadecimal. AND, immediate values are listed least significant byte first! opcode 81 ModR/M byte c0 immediate 18 00 00 00 Written out left to right: 81 c0 18 00 00 00 One more example of generating machine code. Machine code for the Pentium instruction dec dword ptr [EDX] From page 349, we want the form of the decrement instruction dec r/m ff /1 The opcode is ff, and it describes that there will be one operand, and it is of the general form. The /1 says that the Reg field of the ModR/M byte will be 001. BITS 7 6 5 4 3 2 1 0 Mod Reg/Opcode R/M 0 0 1 The table on page 353 describes the Mod and R/M fields. Find the register direct addressing mode, using register EDX in the table. It gives Mod 00 and R/M 010. BITS 7 6 5 4 3 2 1 0 Mod Reg/Opcode R/M 0 0 0 0 1 0 1 0 In hex, this is 0a. The machine code is now complete: ff 0a. A BIG EXAMPLE: .data a1 dd 4 a2 dd ? a3 dd 5 dup(0) .code main: mov ecx, 20 mov eax, 15 mov edx, 0 jz target_label loop1: add edx, eax imul [ebp + 8] dec ecx jg loop1 target_label: done First step: putting the data section together. Upon scanning the source code, the token .data is read. This is the directive that tells the assembler that what follows gets allocated within the data section of the program. Remember that a directive is a "command" to the assembler about how to assemble the source code. The next token encountered is the label a1. This symbol (label) is not yet in its symbol table, so the assembler assigns an address, and places it in the symbol table. Remember, the assembler assigns the first available address within the data section. Symbol table symbol address --------------------- a1 0040 0000 (I made up this address, 'cuz we need a starting address for the data section.) The next token is dd. It lets the assembler know to allocate one doubleword of space at the current address. The next token is 4. It tells the assembler that the value of the allocated space is to be the value 4. The following line does much the same, placing a2 in the symbol table at the next available address (0x0040 0004) allocating 1 doubleword not putting something specific in the allocated space When finished with the data section, we will have the symbol table symbol address --------------------- a1 0040 0000 a2 0040 0004 a3 0040 0008 0040 000c 0040 0010 0040 0014 0040 0018 0040 001c (the next available address within the data section. NOT PART OF THE TABLE.) and, memory map of data section address contents notes hex 0040 0000 0000 0004 for a1 0040 0004 0000 0000 for a2 (defaults to 0) 0040 0008 0000 0000 5 double words for a3 0040 000c 0000 0000 0040 0010 0000 0000 0040 0014 0000 0000 0040 0018 0000 0000 Upon encounting the .code directive, the assembler knows that the next addresses it assigns will be within the code section of the program (separate from the data). Assume that the code will be assembled such that the first instruction is placed at address 0x0000 0000. The code (repeated): .code main: mov ecx, 20 mov eax, 15 mov eax, 0 jz target_label loop1: add edx, eax imul [ebp + 8] dec ecx jg loop1 target_label: done The first token picked up after the .code directive is the label main. (As with ALL symbols,) the assembler looks to see if this symbol is already in the symbol table. It is not, so the assembler assigns the first available address, and places it in the symbol table. symbol table symbol address --------------------- a1 0040 0000 a2 0040 0004 a3 0040 0008 0040 000c 0040 0010 0040 0014 0040 0018 0040 001c (the next available address within the data section. NOT PART OF THE TABLE.) main 0000 0000 Next, the assembler picks up the token mov. It knows that this is an instruction, and reads the rest of the instruction in order to generate the machine code for this instruction. mov ecx, 20 mov reg, immed b8 + rd No ModR/M byte needed, since the register is incorporated into the opcode byte, and the immediate must follow. rd (from table on page 352) is 1, b8+1=b9 The immediate is 0x00000014. So, the machine code will be b9 14 00 00 00 These 5 bytes are placed at address 0x0000 0000, and the next available address for an instruction becomes 0x0000 0005. The assembler is ready for the next token. It will be the second mov instruction in the program. It knows that this is an instruction, and reads the rest of the instruction in order to generate the machine code for this instruction. mov eax, 15 mov reg, immed b8 + rd No ModR/M byte needed, since the register is incorporated into the opcode byte, and the immediate must follow. rd (from table on page 352) is 0, b8+0=b8 The immediate is 0x0000000f. So, the machine code will be b8 10 00 00 00 These 5 bytes are placed at address 0x0000 0005, and the next available address for an instruction becomes 0x0000 000a. The assembler is ready for the next token. It will be the third mov instruction in the program. It knows that this is an instruction, and reads the rest of the instruction in order to generate the machine code for this instruction. mov edx, 0 mov reg, immed b8 + rd No ModR/M byte needed, since the register is incorporated into the opcode byte, and the immediate must follow. rd (from table on page 352) is 2, b8+2=ba The immediate is 0x00000000. So, the machine code will be ba 00 00 00 00 These 5 bytes are placed at address 0x0000 000a, and the next available address for an instruction becomes 0x0000 000f. The assembler is ready for the next token. It will be the jz instruction in the program. It knows that this is an instruction, and reads the rest of the instruction in order to generate the machine code for this instruction. jz target_label jz rel32 0f 84 "cd" The "cd" is a 32-bit code offset. It needs to be the difference between what the PC will be when executing this code and the address assigned for label target_label. The problem with this is that target_label has not yet been assigned an address. So, the assembler will need to wait on figuring out the 32-bit code offset portion of this instruction until the second pass of the assembler. The assembler does know that this instruction will be exactly 6 bytes long, so it can continue with assembly at location 0x0000 0015. A memory map of text section so far is: memory map of text section address contents 0000 0000 b9 14 00 00 00 0000 0005 b8 0f 00 00 00 0000 000a ba 00 00 00 00 0000 000f 0f 84 ?? ?? ?? ?? 0000 0015 The assembler is ready for the next token. It will be the label loop1. The assembler checks if this symbol is in the symbol table. It is not, so the assembler assigns an address and places the symbol in the table. symbol table symbol address --------------------- a1 0040 0000 a2 0040 0004 a3 0040 0008 0040 000c 0040 0010 0040 0014 0040 0018 0040 001c (the next available address within the data section. NOT PART OF THE TABLE.) main 0000 0000 loop1 0000 0015 The assembler is ready for the next token. It will be the add instruction in the program. It knows that this is an instruction, and reads the rest of the instruction in order to generate the machine code for this instruction. add edx, eax add reg, r/m 03 /r or add r/m, reg 01 /r I doesn't matter which one is chosen. They are the same length. Chose the first one. /r means that the ModR/M byte has both a register operand and a R/M operand. BITS 7 6 5 4 3 2 1 0 Mod Reg/Opcode R/M 1 1 0 1 0 0 0 0 In hex, this is d0. So, the machine code for the instruction is 03 d0. These 2 bytes are placed at address 0x0000 0015. The next available address for code will be 0x0000 0017. A memory map of text section so far is: memory map of text section address contents 0000 0000 b9 14 00 00 00 0000 0005 b8 0f 00 00 00 0000 000a ba 00 00 00 00 0000 000f 0f 84 ?? ?? ?? ?? 0000 0015 03 d0 0000 0017 On to the next instruction. imul [ebp + 8] imul r/m f7 /5 /5 means that the ModR/M byte has a register field of 101 The addressing mode for [ebp + 8] is under disp32[EBP] in the table on page 353. BITS 7 6 5 4 3 2 1 0 Mod Reg/Opcode R/M 1 0 1 0 1 1 0 1 In hex, this is ad. The 32-bit displacement follows the ModR/M byte. It contains a 32-bit 2's complement encoding of the value 8. 0x 00 00 00 08 The machine code for this instruction is f7 ad 08 00 00 00. These 6 bytes are placed at address 0x0000 0017. The next available address for code will be 0x0000 0019. A memory map of text section so far is: memory map of code section address contents 0000 0000 b9 14 00 00 00 0000 0005 b8 10 00 00 00 0000 000a ba 00 00 00 00 0000 000f 0f 84 ?? ?? ?? ?? 0000 0015 03 d0 0000 0017 f7 ad 08 00 00 00 0000 001d The next instruction is easy. dec ecx dec reg 48 + rd rd is 1 for ecx. So the machine code is the single byte 49. memory map of code section address contents 0000 0000 b9 14 00 00 00 0000 0005 b8 10 00 00 00 0000 000a ba 00 00 00 00 0000 000f 0f 84 ?? ?? ?? ?? 0000 0015 03 d0 0000 0017 f7 ad 08 00 00 00 0000 001d 49 0000 001e The decrement instruction is followed by jg loop1 jg rel32 0f 8f "cd" Like the other control instruction: the "cd" is a 32-bit code offset. It needs to be the difference between what the PC will be when executing this code and the address assigned for label target_label. The assembler does know that this instruction will be exactly 6 bytes long. To calculate "cd", at execution time (for taken control instruction): contents of PC + offset field --> PC PC points to instruction after the control instruction when offset is added. at assembly time: byte offset = target addr - ( addr of instruction after conditional control instr addr ) = addr loop1 - (6 + 0x0000 001e) (taken from symbol table) = 0x0000 0015 - 0x0000 0024 Notice that this would be a negative number. That is OK -- generate a 32-bit 2's complement value. 0000 0000 0000 0000 0000 0000 0001 0101 - 0000 0000 0000 0000 0000 0000 0010 0100 ------------------------------------------ becomes 0000 0000 0000 0000 0000 0000 0001 0101 + 1111 1111 1111 1111 1111 1111 1101 1100 ------------------------------------------ 1111 1111 1111 1111 1111 1111 1111 0001 in hex 0x ff ff ff f1 this value is "cd", giving the machine code 0f 8f f1 ff ff ff (Remember that the least significant byte comes first.) Again, memory map of code section (so far) address contents 0000 0000 b9 14 00 00 00 0000 0005 b8 10 00 00 00 0000 000a ba 00 00 00 00 0000 000f 0f 84 ?? ?? ?? ?? 0000 0015 03 d0 0000 0017 f7 ad 80 00 00 00 0000 001d 49 0000 001e 0f 8f f1 ff ff ff 0000 0024 The last thing we'll worry about in the table is the next label: target_addr It gets placed in the symbol table at the next available address, 0x0000 0024. We now have a completed symbol table: symbol address --------------------- a1 0040 0000 a2 0040 0004 a3 0040 0008 0040 000c 0040 0010 0040 0014 0040 0018 0040 001c (the next available address within the data section. NOT PART OF THE TABLE.) main 0000 0000 loop1 0000 0015 target_addr0000 0024 After this first pass of the assembler is done, ALL the labels have been given addresses. During this second pass of the assembler, any remaining code left to be completed is completed. For this example code fragment, that is the jz instruction at address 0x0000 000f. All that remains is the offset calculation. It works just like the calculation for the other control instruction. byte offset = target addr - ( addr of instruction after conditional control instr addr ) = addr target_addr - (6 + 0x0000 000f) (taken from symbol table) = 0x0000 0024 - 0x0000 0015 0000 0000 0000 0000 0000 0000 0010 0100 - 0000 0000 0000 0000 0000 0000 0001 0101 ------------------------------------------ 0000 0000 0000 0000 0000 0000 0000 1111 Notice that this offset is a positive number. This is ok. It corresponds to a branch/jump forward in the code. A negative offset would correspond to a branch/jump backward within the code. It is the offset of 0x 0000000f that gets placed into the machine code. The completed machine code is memory map of text section address contents 0000 0000 b9 14 00 00 00 0000 0005 b8 10 00 00 00 0000 000a ba 00 00 00 00 0000 000f 0f 84 0f 00 00 00 0000 0015 03 d0 0000 0017 f7 ad 80 00 00 00 0000 001d 49 0000 001e 0f 8f f1 ff ff ff 0000 0024
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.