# 4.2 Assemblers and Tools

## Assembler

An A**ssembler** is a crucial component in the compilation process, translating assembly language into machine code. Depending on the target system's Instruction Set Architecture (ISA), different assemblers are utilized. Here are some notable ones:

1. **Microsoft Macro Assembler (MASM):**
   * **ISA:** Primarily for x86 architecture
   * **Usage:** Used with MS-DOS and Microsoft Windows
   * **Syntax:** Follows Intel syntax
2. **GNU Assembler (GAS):**
   * **ISA:** Diverse, used in various architectures
   * **Usage:** Default back-end assembler for the GNU Compiler Collection (GCC)
   * **Syntax:** Flexible, used with AT\&T syntax by default
3. **Netwide Assembler (NASM):**
   * **ISA:** Primarily for x86 architecture (16-bit, 32-bit, 64-bit)
   * **Usage:** Popular assembler for Linux
   * **Syntax:** Offers multiple syntax styles, including Intel and NASM
4. **Flat Assembler (FASM):**
   * **ISA:** Primarily for x86 architecture (IA-32, x86-64)
   * **Usage:** Supports various operating systems
   * **Syntax:** Follows Intel-style assembly language

These assemblers play a crucial role in the software development process, allowing developers to write code in a human-readable assembly language and then converting it into machine code that can be executed by the target hardware. The choice of assembler depends on the target architecture and the specific requirements of the project. We'll use **NASM** Assembler in this case.

When a source code file is assembled, the resulting file is called an object file. It is a binary representation of the program.

While the assembly instructions and the machine code have a one-to-one correspondence, and the translation process may be simple, the assembler does some further operations such as assigning memory location to variables and instructions and resolving symbolic names.

Once the assembler has created the object file, a linker is needed in order to create the actual executable file. What a linker does is take one or more object files and combine them to create the executable file.

An example of these object files are the kernel32.dll and user32.dll which are required to create a windows executables that accesses certain libraries.

The process from the assembly code to the executable file can be represented here:

1. **ASM File (Assembly Code):**
   * Begin with the assembly code, typically saved with a `.asm` extension. This file contains human-readable instructions written in assembly language.
2. **Assembler:**
   * The assembler translates the assembly code into machine code, producing one or more object files (`.o` or `.obj`). The object files contain the binary representation of the program.
3. **Linker:**
   * The linker combines one or more object files along with any necessary static libraries to generate the final executable file. It resolves symbols, assigns memory locations, and establishes connections between different parts of the program.
4. **Executables:**
   * The end result is the executable file. This file contains the compiled machine code that can be directly executed by the computer's hardware. It encapsulates the entire program and any necessary libraries.

## Compiler

The compiler is similar to the assembler. It converts high-level source code (such as C) into low-level code or directly into an object file. Therefore, once the output file is created, the previous process will be executed on the file. The end result is an executable file.

## NASM

**Netwide Assembler (NASM)** is an assembler that we will use. To make things easier, we will use the NASM-X project. It is a collection of macros, includes, and examples to help NASM programmers develop applications.

**How to install:**

1. Download NASMX and extract it to C:\nasmx
2. Add the bin to environment variables
3. Run setpath.bat

**To work with demos:**

* Comment the following line in C:\nasmx\demos\windemos.inc

  ```bash
  %include 'nasm.inc'
  ```
* Add directly below it the following code

  ```bash
  %include 'C:\nasmx\inc\nasmx.inc'
  ```
* Open C:\nasmx\demos\win32\DEMO1

**To use the assembler, we run the following command:**

```bash
nasm -f win32 demo1.asm -o demo1.obj
```

**To link the object files, use golink, which is located in the same package (NASMX) when NASM is downloaded:**

```bash
golink /entry _main demo1.obj kernel32.dll user32.dll
```

A prompt will show up if the operations are successful.

## ASM Basics

High-level functions such as strcpy() are constructed from multiple ASM instructions combined to execute a specific operation, such as copying two strings.

The most basic assembly instruction is MOV, responsible for transferring data from one memory location to another.

Most instructions in assembly language involve two operands and can be categorized into one of the following classes:

| Data Transfer | Arithmetic | Control Flow | Other |
| ------------- | ---------- | ------------ | ----- |
| MOV           | ADD        | CALL         | STI   |
| XCMG          | SUB        | RET          | CLI   |
| PUSH          | MUL        | LCOP         | IN    |
| POP           | XOR        | Jcc          | OUT   |

The following is an example of a simple assembly code that sums 2 numbers:

```nasm
MOV EAX,2   ; store 2 in EAX
MOV EBX,5   ; store 5 in EBX
ADD EAX,EBX ; do EAX = EAX + EBX  operations
            : now EAX contains the result
```

#### Intel vs AT\&T

Depending on the architectural syntax, instructions and rules may vary. For example, the source and destination operands may be in different positions.

example:

|          | Intel (Windows)                         | AT\&T(Linux)                            |
| -------- | --------------------------------------- | --------------------------------------- |
| Assembly | MOV EAX, 8                              | MOVL $8, %EAX                           |
| Syntax   | *\<instruction>\<destination>\<source>* | *\<instruction>\<source>\<destination>* |

In AT\&T syntax, "%" is placed before register names, and "$" is placed before numbers. Another thing to notice is that AT\&T adds a suffix to the instruction, which defines the operand size:

* **Q** (**quad** - 64 bits)
* **L** (**long** - 32 bits)
* **W** (**word** - 16 bits)
* **B** (**byte** - 8 bits)

### Push instruction

PUSH stores a value at the top of the stack, causing the stack to be adjusted by -4 bytes (on 32-bit systems): -0x04

**Operation:** `PUSH 0x12345678`

\| | Before PUSH | After PUSH | | |-|-------------|------------| | | Lower memory address |||| || | 12345678 | <- Top of the Stack | | Top of the Stack -> | 00000001 | 00000001 | | | | 00000001 | 00000001 | | | | 00000001 | 00000001 | | | | 00000001 | 00000001 | | | Higher memory address ||||

Another interesting fact: `PUSH 0x123456789` can be analogous to some other operations, for example:

```nasm
SUB ESP, 4            ; subtract 4 from ESP -> ESP=ESP-4
MOV [ESP], 0X12345678 ; store the value 0x12345678 to the location
                      ; pointed by ESP. Square brackets indicate the
                      ; address pointed to by the registers
```

### Pop instruction

POP stores a value at the top of the stack, causing the stack to be adjusted by +4 bytes (on 32-bit systems): +0x04

**Operation:** `POP reg`

\| | Before POP | After POP | | |-|-------------|------------| | | Lower memory address |||| | Top of the Stack -> | 12345678 | || || 00000001 | 00000001 | <- Top of the Stack | | | 00000001 | 00000001 | | | | 00000001 | 00000001 | | | | 00000001 | 00000001 | | | Higher memory address ||||

Another interesting fact: `POP EAX` can be analogous to some other operations, for example:

```nasm
MOV EAX, [ESP]        ; store the value pointed to by ESP to EAX
                      ; the value at the top of the stack
ADD ESP, 4            ; add 4 to ESP - adjust the top of the stack
```

### Call instruction

Subroutines are implemented using the **CALL** and **RET** instruction pair.

The CALL instruction pushes the current instruction pointer (EIP) to the stack and jumps to the specified function address.

Whenever the function executes the RET instruction, the last element is popped from the stack, and the CPU jumps to the address.

**Example of CALL in Assembly:**

```nasm
MOV EAX, 1      ; store 1 in EAX
MOV EBX, 2      ; store 2 in EBX
CALL ADD_sub    ; call the subroutine named 'ADD_SUB'
INC EAX         ; Increment EAX: now EAX holds "4"
                ; 2(EBX)+1(EAX)+1(INC)
JMP end_sample
ADD_sub:
ADD EAX, EBX
RETN            ; Function completed
                ; so return back to the caller function
end_sample:
```

The following is an example of how to call a procedure. This example begins at `proc_2`:

```nasm
proc proc_1
     Locals none
     MOV ECX, [EBP+8] ; ebp+8 is the function argument
     PUSH ECX
     POP ECX
end_proc
proc proc_2
     Locals none
     invoke proc_1, 5 ; invoke proc_1 proc
endproc
```

## Tools Arsenal

### **Compilers**

There are several options for compiling your C/C++ code. It is important to note that different compilers may result in different outputs. You can use IDEs or the command line.

**IDEs:**

* Visual Studio
* Orwell Dev-C++
* Code::Blocks

**Command line:**

* MinGW
* gcc (example: `gcc -m32 main.c -o main.o`)

### **Debuggers**

A debugger is a program that runs other programs in a way that allows us to exercise control over the program itself. In our specific case, the debugger will help us write exploits, analyze programs, reverse engineer binaries, and much more.

As we will see, the debugger allows us to:

* Stop the program while it is running
* Analyze the stack and its data
* Inspect registers
* Change the program or program variables and more

There are several options for debuggers:

* [Immunity Debugger](https://www.immunityinc.com/products/debugger/)
* [IDA](https://www.hex-rays.com/products/ida/) (Windows, Linux, MacOS)
* [GDB](https://www.gnu.org/software/gdb/) (Unix, Windows)
* [X64DBG](http://x64dbg.com/#start) (Windows)
* [EDB](http://codef00.com/projects#debugger) (Linux)
* [WinDBG](https://msdn.microsoft.com/en-us/windows/hardware/hh852365.aspx) (Windows)
* [OllyDBG](http://www.ollydbg.de/) (Windows)
* [Hopper](http://www.hopperapp.com/) (MacOS, Linux)

### Decompiling

To be a successful pentester, you need the knowledge to reverse a compiled application.

You can use `objdump.exe` that is bundled with `gcc` to decompile a compiled application.

**Example:** `objdump -d -Mintel main.exe > disasm.txt`

## Other Resources

{% embed url="<https://www.ired.team/offensive-security/code-injection-process-injection/binary-exploitation>" %}

* [KNX - Binary Exploitation v1 YT PlayList 🇮🇹](https://www.youtube.com/watch?v=3a_1zIMKMtE\&list=PLimbw9yhOVDYOJ-1wTlR4Jjxcsr4EuZa2)
* [KNX - Binary Exploitation v2 YT PlayList 🇮🇹](https://www.youtube.com/watch?v=tZsjIUkrcB8\&list=PLimbw9yhOVDZbmYKhKOF4JR9OQbWfvFdi)
* [Kevin Du - BoF YT PlayList](https://www.youtube.com/watch?v=LBo56Xyowvk\&list=PLW56THlev-gAhS9g1HdZn9TcQo3Hm-kgf) 🇬🇧
* [LiveOverflow - Binary Exploitation YT PlayList ](https://www.youtube.com/watch?v=iyAyN3GFM7A\&list=PLhixgUqwRTjxglIswKp9mpkfPNfHkzyeN)🇬🇧


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://dev-angelist.gitbook.io/ecpptv3-ptp-notes/readme/ecpptv3/system-security/1.2-assemblers-and-tools.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
