4.2 Assemblers and Tools

Assembler

An Assembler is a crucial component in the compilation process, translating assembly language into machine code. Depending on the target system's Instruction Set Architecture (ISA), different assemblers are utilized. Here are some notable ones:

  1. Microsoft Macro Assembler (MASM):

    • ISA: Primarily for x86 architecture

    • Usage: Used with MS-DOS and Microsoft Windows

    • Syntax: Follows Intel syntax

  2. GNU Assembler (GAS):

    • ISA: Diverse, used in various architectures

    • Usage: Default back-end assembler for the GNU Compiler Collection (GCC)

    • Syntax: Flexible, used with AT&T syntax by default

  3. Netwide Assembler (NASM):

    • ISA: Primarily for x86 architecture (16-bit, 32-bit, 64-bit)

    • Usage: Popular assembler for Linux

    • Syntax: Offers multiple syntax styles, including Intel and NASM

  4. Flat Assembler (FASM):

    • ISA: Primarily for x86 architecture (IA-32, x86-64)

    • Usage: Supports various operating systems

    • Syntax: Follows Intel-style assembly language

These assemblers play a crucial role in the software development process, allowing developers to write code in a human-readable assembly language and then converting it into machine code that can be executed by the target hardware. The choice of assembler depends on the target architecture and the specific requirements of the project. We'll use NASM Assembler in this case.

When a source code file is assembled, the resulting file is called an object file. It is a binary representation of the program.

While the assembly instructions and the machine code have a one-to-one correspondence, and the translation process may be simple, the assembler does some further operations such as assigning memory location to variables and instructions and resolving symbolic names.

Once the assembler has created the object file, a linker is needed in order to create the actual executable file. What a linker does is take one or more object files and combine them to create the executable file.

An example of these object files are the kernel32.dll and user32.dll which are required to create a windows executables that accesses certain libraries.

The process from the assembly code to the executable file can be represented here:

  1. ASM File (Assembly Code):

    • Begin with the assembly code, typically saved with a .asm extension. This file contains human-readable instructions written in assembly language.

  2. Assembler:

    • The assembler translates the assembly code into machine code, producing one or more object files (.o or .obj). The object files contain the binary representation of the program.

  3. Linker:

    • The linker combines one or more object files along with any necessary static libraries to generate the final executable file. It resolves symbols, assigns memory locations, and establishes connections between different parts of the program.

  4. Executables:

    • The end result is the executable file. This file contains the compiled machine code that can be directly executed by the computer's hardware. It encapsulates the entire program and any necessary libraries.

Compiler

The compiler is similar to the assembler. It converts high-level source code (such as C) into low-level code or directly into an object file. Therefore, once the output file is created, the previous process will be executed on the file. The end result is an executable file.

NASM

Netwide Assembler (NASM) is an assembler that we will use. To make things easier, we will use the NASM-X project. It is a collection of macros, includes, and examples to help NASM programmers develop applications.

How to install:

  1. Download NASMX and extract it to C:\nasmx

  2. Add the bin to environment variables

  3. Run setpath.bat

To work with demos:

  • Comment the following line in C:\nasmx\demos\windemos.inc

    %include 'nasm.inc'
  • Add directly below it the following code

    %include 'C:\nasmx\inc\nasmx.inc'
  • Open C:\nasmx\demos\win32\DEMO1

To use the assembler, we run the following command:

nasm -f win32 demo1.asm -o demo1.obj

To link the object files, use golink, which is located in the same package (NASMX) when NASM is downloaded:

golink /entry _main demo1.obj kernel32.dll user32.dll

A prompt will show up if the operations are successful.

ASM Basics

High-level functions such as strcpy() are constructed from multiple ASM instructions combined to execute a specific operation, such as copying two strings.

The most basic assembly instruction is MOV, responsible for transferring data from one memory location to another.

Most instructions in assembly language involve two operands and can be categorized into one of the following classes:

Data Transfer
Arithmetic
Control Flow
Other

MOV

ADD

CALL

STI

XCMG

SUB

RET

CLI

PUSH

MUL

LCOP

IN

POP

XOR

Jcc

OUT

The following is an example of a simple assembly code that sums 2 numbers:

MOV EAX,2   ; store 2 in EAX
MOV EBX,5   ; store 5 in EBX
ADD EAX,EBX ; do EAX = EAX + EBX  operations
            : now EAX contains the result

Intel vs AT&T

Depending on the architectural syntax, instructions and rules may vary. For example, the source and destination operands may be in different positions.

example:

Intel (Windows)
AT&T(Linux)

Assembly

MOV EAX, 8

MOVL $8, %EAX

Syntax

<instruction><destination><source>

<instruction><source><destination>

In AT&T syntax, "%" is placed before register names, and "$" is placed before numbers. Another thing to notice is that AT&T adds a suffix to the instruction, which defines the operand size:

  • Q (quad - 64 bits)

  • L (long - 32 bits)

  • W (word - 16 bits)

  • B (byte - 8 bits)

Push instruction

PUSH stores a value at the top of the stack, causing the stack to be adjusted by -4 bytes (on 32-bit systems): -0x04

Operation: PUSH 0x12345678

| | Before PUSH | After PUSH | | |-|-------------|------------| | | Lower memory address |||| || | 12345678 | <- Top of the Stack | | Top of the Stack -> | 00000001 | 00000001 | | | | 00000001 | 00000001 | | | | 00000001 | 00000001 | | | | 00000001 | 00000001 | | | Higher memory address ||||

Another interesting fact: PUSH 0x123456789 can be analogous to some other operations, for example:

SUB ESP, 4            ; subtract 4 from ESP -> ESP=ESP-4
MOV [ESP], 0X12345678 ; store the value 0x12345678 to the location
                      ; pointed by ESP. Square brackets indicate the
                      ; address pointed to by the registers

Pop instruction

POP stores a value at the top of the stack, causing the stack to be adjusted by +4 bytes (on 32-bit systems): +0x04

Operation: POP reg

| | Before POP | After POP | | |-|-------------|------------| | | Lower memory address |||| | Top of the Stack -> | 12345678 | || || 00000001 | 00000001 | <- Top of the Stack | | | 00000001 | 00000001 | | | | 00000001 | 00000001 | | | | 00000001 | 00000001 | | | Higher memory address ||||

Another interesting fact: POP EAX can be analogous to some other operations, for example:

MOV EAX, [ESP]        ; store the value pointed to by ESP to EAX
                      ; the value at the top of the stack
ADD ESP, 4            ; add 4 to ESP - adjust the top of the stack

Call instruction

Subroutines are implemented using the CALL and RET instruction pair.

The CALL instruction pushes the current instruction pointer (EIP) to the stack and jumps to the specified function address.

Whenever the function executes the RET instruction, the last element is popped from the stack, and the CPU jumps to the address.

Example of CALL in Assembly:

MOV EAX, 1      ; store 1 in EAX
MOV EBX, 2      ; store 2 in EBX
CALL ADD_sub    ; call the subroutine named 'ADD_SUB'
INC EAX         ; Increment EAX: now EAX holds "4"
                ; 2(EBX)+1(EAX)+1(INC)
JMP end_sample
ADD_sub:
ADD EAX, EBX
RETN            ; Function completed
                ; so return back to the caller function
end_sample:

The following is an example of how to call a procedure. This example begins at proc_2:

proc proc_1
     Locals none
     MOV ECX, [EBP+8] ; ebp+8 is the function argument
     PUSH ECX
     POP ECX
end_proc
proc proc_2
     Locals none
     invoke proc_1, 5 ; invoke proc_1 proc
endproc

Tools Arsenal

Compilers

There are several options for compiling your C/C++ code. It is important to note that different compilers may result in different outputs. You can use IDEs or the command line.

IDEs:

  • Visual Studio

  • Orwell Dev-C++

  • Code::Blocks

Command line:

  • MinGW

  • gcc (example: gcc -m32 main.c -o main.o)

Debuggers

A debugger is a program that runs other programs in a way that allows us to exercise control over the program itself. In our specific case, the debugger will help us write exploits, analyze programs, reverse engineer binaries, and much more.

As we will see, the debugger allows us to:

  • Stop the program while it is running

  • Analyze the stack and its data

  • Inspect registers

  • Change the program or program variables and more

There are several options for debuggers:

Decompiling

To be a successful pentester, you need the knowledge to reverse a compiled application.

You can use objdump.exe that is bundled with gcc to decompile a compiled application.

Example: objdump -d -Mintel main.exe > disasm.txt

Other Resources