4.2 Assemblers and Tools
Assembler
An Assembler is a crucial component in the compilation process, translating assembly language into machine code. Depending on the target system's Instruction Set Architecture (ISA), different assemblers are utilized. Here are some notable ones:
Microsoft Macro Assembler (MASM):
ISA: Primarily for x86 architecture
Usage: Used with MS-DOS and Microsoft Windows
Syntax: Follows Intel syntax
GNU Assembler (GAS):
ISA: Diverse, used in various architectures
Usage: Default back-end assembler for the GNU Compiler Collection (GCC)
Syntax: Flexible, used with AT&T syntax by default
Netwide Assembler (NASM):
ISA: Primarily for x86 architecture (16-bit, 32-bit, 64-bit)
Usage: Popular assembler for Linux
Syntax: Offers multiple syntax styles, including Intel and NASM
Flat Assembler (FASM):
ISA: Primarily for x86 architecture (IA-32, x86-64)
Usage: Supports various operating systems
Syntax: Follows Intel-style assembly language
These assemblers play a crucial role in the software development process, allowing developers to write code in a human-readable assembly language and then converting it into machine code that can be executed by the target hardware. The choice of assembler depends on the target architecture and the specific requirements of the project. We'll use NASM Assembler in this case.
When a source code file is assembled, the resulting file is called an object file. It is a binary representation of the program.
While the assembly instructions and the machine code have a one-to-one correspondence, and the translation process may be simple, the assembler does some further operations such as assigning memory location to variables and instructions and resolving symbolic names.
Once the assembler has created the object file, a linker is needed in order to create the actual executable file. What a linker does is take one or more object files and combine them to create the executable file.
An example of these object files are the kernel32.dll and user32.dll which are required to create a windows executables that accesses certain libraries.
The process from the assembly code to the executable file can be represented here:
ASM File (Assembly Code):
Begin with the assembly code, typically saved with a
.asm
extension. This file contains human-readable instructions written in assembly language.
Assembler:
The assembler translates the assembly code into machine code, producing one or more object files (
.o
or.obj
). The object files contain the binary representation of the program.
Linker:
The linker combines one or more object files along with any necessary static libraries to generate the final executable file. It resolves symbols, assigns memory locations, and establishes connections between different parts of the program.
Executables:
The end result is the executable file. This file contains the compiled machine code that can be directly executed by the computer's hardware. It encapsulates the entire program and any necessary libraries.
Compiler
The compiler is similar to the assembler. It converts high-level source code (such as C) into low-level code or directly into an object file. Therefore, once the output file is created, the previous process will be executed on the file. The end result is an executable file.
NASM
Netwide Assembler (NASM) is an assembler that we will use. To make things easier, we will use the NASM-X project. It is a collection of macros, includes, and examples to help NASM programmers develop applications.
How to install:
Download NASMX and extract it to C:\nasmx
Add the bin to environment variables
Run setpath.bat
To work with demos:
Comment the following line in C:\nasmx\demos\windemos.inc
Add directly below it the following code
Open C:\nasmx\demos\win32\DEMO1
To use the assembler, we run the following command:
To link the object files, use golink, which is located in the same package (NASMX) when NASM is downloaded:
A prompt will show up if the operations are successful.
ASM Basics
High-level functions such as strcpy() are constructed from multiple ASM instructions combined to execute a specific operation, such as copying two strings.
The most basic assembly instruction is MOV, responsible for transferring data from one memory location to another.
Most instructions in assembly language involve two operands and can be categorized into one of the following classes:
MOV
ADD
CALL
STI
XCMG
SUB
RET
CLI
PUSH
MUL
LCOP
IN
POP
XOR
Jcc
OUT
The following is an example of a simple assembly code that sums 2 numbers:
Intel vs AT&T
Depending on the architectural syntax, instructions and rules may vary. For example, the source and destination operands may be in different positions.
example:
Assembly
MOV EAX, 8
MOVL $8, %EAX
Syntax
<instruction><destination><source>
<instruction><source><destination>
In AT&T syntax, "%" is placed before register names, and "$" is placed before numbers. Another thing to notice is that AT&T adds a suffix to the instruction, which defines the operand size:
Q (quad - 64 bits)
L (long - 32 bits)
W (word - 16 bits)
B (byte - 8 bits)
Push instruction
PUSH stores a value at the top of the stack, causing the stack to be adjusted by -4 bytes (on 32-bit systems): -0x04
Operation: PUSH 0x12345678
| | Before PUSH | After PUSH | | |-|-------------|------------| | | Lower memory address |||| || | 12345678 | <- Top of the Stack | | Top of the Stack -> | 00000001 | 00000001 | | | | 00000001 | 00000001 | | | | 00000001 | 00000001 | | | | 00000001 | 00000001 | | | Higher memory address ||||
Another interesting fact: PUSH 0x123456789
can be analogous to some other operations, for example:
Pop instruction
POP stores a value at the top of the stack, causing the stack to be adjusted by +4 bytes (on 32-bit systems): +0x04
Operation: POP reg
| | Before POP | After POP | | |-|-------------|------------| | | Lower memory address |||| | Top of the Stack -> | 12345678 | || || 00000001 | 00000001 | <- Top of the Stack | | | 00000001 | 00000001 | | | | 00000001 | 00000001 | | | | 00000001 | 00000001 | | | Higher memory address ||||
Another interesting fact: POP EAX
can be analogous to some other operations, for example:
Call instruction
Subroutines are implemented using the CALL and RET instruction pair.
The CALL instruction pushes the current instruction pointer (EIP) to the stack and jumps to the specified function address.
Whenever the function executes the RET instruction, the last element is popped from the stack, and the CPU jumps to the address.
Example of CALL in Assembly:
The following is an example of how to call a procedure. This example begins at proc_2
:
Tools Arsenal
Compilers
There are several options for compiling your C/C++ code. It is important to note that different compilers may result in different outputs. You can use IDEs or the command line.
IDEs:
Visual Studio
Orwell Dev-C++
Code::Blocks
Command line:
MinGW
gcc (example:
gcc -m32 main.c -o main.o
)
Debuggers
A debugger is a program that runs other programs in a way that allows us to exercise control over the program itself. In our specific case, the debugger will help us write exploits, analyze programs, reverse engineer binaries, and much more.
As we will see, the debugger allows us to:
Stop the program while it is running
Analyze the stack and its data
Inspect registers
Change the program or program variables and more
There are several options for debuggers:
IDA (Windows, Linux, MacOS)
GDB (Unix, Windows)
X64DBG (Windows)
EDB (Linux)
WinDBG (Windows)
OllyDBG (Windows)
Hopper (MacOS, Linux)
Decompiling
To be a successful pentester, you need the knowledge to reverse a compiled application.
You can use objdump.exe
that is bundled with gcc
to decompile a compiled application.
Example: objdump -d -Mintel main.exe > disasm.txt