Cracking the Code: An Advanced Introduction to Reverse Engineering

Reverse engineering is a vital skill for red teamers and pen testers. In today’s world, most software applications are designed to protect their source code from prying eyes. However, for a professional operator, the binary is not a black box - it is a puzzle waiting to be solved. Whether you are analyzing a custom C2 implant to understand its capabilities, bypassing an EDR’s function hooks, identifying vulnerabilities in a proprietary protocol, or unpacking malware to extract indicators of compromise, reverse engineering is the key that unlocks it all.

In this article, we’ll explore the core concepts of reverse engineering, from the fundamental structure of executable files and x86/x64 assembly language to the psychological warfare of anti-debugging techniques and the practical workflows that professional analysts use.

1. The Fundamentals: What is Reverse Engineering?

Reverse engineering (RE) is the process of analyzing a finished product (in our case, a compiled binary) to understand its design, architecture, and functionality - without access to the original source code or documentation.

For offensive security professionals, RE serves several purposes:

Malware Analysis: Understanding how threats work to develop defenses or leverage their techniques.
Vulnerability Research: Finding exploitable bugs in closed-source software.
Anti-Cheat/DRM Bypass: Understanding protection mechanisms to test or circumvent them.
EDR Evasion: Analyzing how security products detect threats to avoid detection.
Custom Protocol Analysis: Understanding proprietary network protocols used by targets.

2. The Anatomy of an Executable: PE, ELF, and Mach-O

Before you can pull a program apart, you must understand how it’s put together. Different operating systems use different container formats for executable code.

PE (Portable Executable) - Windows

The PE format is used by Windows for executables (.exe), dynamic link libraries (.dll), and drivers (.sys).

Key Sections:

DOS Header: Legacy compatibility; contains the “MZ” magic bytes and a pointer to the PE header.
PE Header: Contains the “PE\0\0” signature and critical metadata.
Optional Header: Despite its name, this is required for executables. Contains the entry point (AddressOfEntryPoint), image base, and section alignment.
Section Table: Describes each section’s virtual address, raw size, and permissions.
.text: The executable code.
.data: Initialized global and static variables.
.rdata: Read-only data, including string literals and constants.
.idata: The Import Address Table (IAT) - lists external functions the program calls.
.edata: The Export Address Table (EAT) - lists functions the DLL exports.
.rsrc: Resources like icons, dialogs, and embedded files.
.reloc: Relocation information for ASLR.

Analysis Tools:

1
2
3
4
5
6
7
8
# Windows
dumpbin /headers program.exe
dumpbin /imports program.exe
dumpbin /exports program.dll

# Cross-platform
pefile (Python library)
peframe

ELF (Executable and Linkable Format) - Linux/Unix

ELF is the standard binary format for Linux, BSD, and many embedded systems.

Key Sections:

ELF Header: Contains the magic bytes (\\x7fELF), architecture, entry point, and program/section header offsets.
Program Headers: Describe segments for loading the binary into memory.
Section Headers: Describe sections for linking and debugging.
.text: Executable code.
.data: Initialized data.
.bss: Uninitialized data (zeroed at load time).
.rodata: Read-only data (strings, constants).
.plt / .got: Procedure Linkage Table and Global Offset Table for dynamic linking.
.symtab / .dynsym: Symbol tables (may be stripped).

Analysis Tools:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# Display headers
readelf -h binary
readelf -S binary  # Section headers
readelf -l binary  # Program headers

# Display symbols
nm binary
objdump -T binary  # Dynamic symbols

# Quick file type identification
file binary

Mach-O - macOS/iOS

Mach-O (Mach Object) is Apple’s executable format. A unique feature is support for “fat binaries” (Universal Binaries) that contain code for multiple architectures (e.g., both Intel x86_64 and Apple Silicon ARM64).

Analysis Tools:

1
2
3
4
5
6
# Display headers
otool -h binary
otool -l binary   # Load commands

# For fat binaries
lipo -info binary

[!TIP] Always start analysis by checking the file’s headers using file, readelf, or otool. This tells you the architecture (x86, x64, ARM), whether it’s stripped (no symbols), statically or dynamically linked, and the file type (executable, shared library, object file).

3. x86/x64 Assembly: The Language of the Machine

To reverse engineer compiled code, you must be able to read assembly language. Modern Windows and Linux systems primarily use x86 (32-bit) or x86_64 (64-bit) instruction sets, though ARM64 (64-bit) is becoming more common in both platforms and is the norm on macOS and most modern mobile devices.

Registers

Registers are small, fast storage locations inside the CPU.

General Purpose Registers (x64):

64-bit	32-bit	16-bit	8-bit (low)	Purpose
RAX	EAX	AX	AL	Accumulator (return values)
RBX	EBX	BX	BL	Base (general purpose)
RCX	ECX	CX	CL	Counter (loop counts)
RDX	EDX	DX	DL	Data (I/O, multiplication)
RSI	ESI	SI	SIL	Source Index
RDI	EDI	DI	DIL	Destination Index
RSP	ESP	SP	SPL	Stack Pointer
RBP	EBP	BP	BPL	Base Pointer (frame pointer)
R8-R15	R8D-R15D	R8W-R15W	R8B-R15B	Additional registers (x64 only)

Special Registers:

RIP/EIP: Instruction Pointer - points to the next instruction to execute.
RFLAGS/EFLAGS: Flags register - contains status flags (Zero, Carry, Sign, Overflow).

Common Instructions

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
; Data Movement
mov eax, 5          ; EAX = 5
mov eax, [rbx]      ; EAX = value at address in RBX
lea rax, [rbx+rcx]  ; RAX = RBX + RCX (address calculation, no memory access)

; Arithmetic
add eax, ebx        ; EAX = EAX + EBX
sub eax, 10         ; EAX = EAX - 10
imul eax, ebx       ; EAX = EAX * EBX (signed)
inc eax             ; EAX = EAX + 1
dec eax             ; EAX = EAX - 1

; Bitwise
and eax, 0xFF       ; Mask lower byte
or eax, 1           ; Set bit 0
xor eax, eax        ; EAX = 0 (common idiom)
shl eax, 4          ; Shift left by 4 bits (multiply by 16)
shr eax, 1          ; Shift right by 1 bit (divide by 2)

; Comparison and Flags
cmp eax, ebx        ; Sets flags based on EAX - EBX (result discarded)
test eax, eax       ; Sets flags based on EAX AND EAX (checks if zero)

; Branching
jmp label           ; Unconditional jump
je/jz label         ; Jump if Equal/Zero
jne/jnz label       ; Jump if Not Equal/Not Zero
jg/jge label        ; Jump if Greater/Greater or Equal (signed)
ja/jae label        ; Jump if Above/Above or Equal (unsigned)
jl/jle label        ; Jump if Less/Less or Equal (signed)
jb/jbe label        ; Jump if Below/Below or Equal (unsigned)

; Function Calls
call function_addr  ; Push return address, jump to function
ret                 ; Pop return address, jump to it

; Stack Operations
push rax            ; Push RAX onto stack
pop rbx             ; Pop top of stack into RBX

Calling Conventions

Understanding how functions receive arguments and return values is crucial for following program logic.

x64 Windows (Microsoft):

First 4 integer/pointer arguments: RCX, RDX, R8, R9
Additional arguments: pushed on stack (right to left)
Return value: RAX
Caller allocates 32-byte “shadow space” on stack

x64 Linux/macOS (System V AMD64 ABI):

First 6 integer/pointer arguments: RDI, RSI, RDX, RCX, R8, R9
Additional arguments: pushed on stack (right to left)
Return value: RAX

x86 (32-bit) cdecl:

All arguments pushed on stack (right to left)
Caller cleans up stack
Return value: EAX

4. Static Analysis: Disassembling and Decompiling

Static analysis is the act of studying code without executing it. It’s safe (no risk of triggering malware) but can be thwarted by obfuscation and packing.

Disassembly

Disassembly transforms machine code (raw bytes) into human-readable assembly language.

Tools:

Ghidra: Free, open-source, NSA-developed. Excellent decompiler. The go-to for most red teamers.
IDA Pro: Industry standard, expensive. Unmatched in analysis capabilities.
Binary Ninja: Modern, scriptable, with a good UI.
Radare2/Cutter: Free, open-source, command-line focused with a GUI option.
objdump: Built into GNU binutils, quick and dirty.

1
2
3
4
5
# Quick disassembly with objdump
objdump -d -M intel binary | less

# Disassemble specific function
objdump -d -M intel binary | grep -A 50 "<main>:"

Decompilation

Decompilation attempts to reconstruct high-level C/C++ code from assembly. It’s much easier to read but loses variable names, comments, and some structural information.

Ghidra’s Decompiler Workflow:

Create a new project and import the binary.
Run “Auto-Analyze” to identify functions, data types, and cross-references.
Navigate to the function of interest (e.g., main, or follow cross-references from interesting strings).
The decompiler window shows the reconstructed C code alongside the assembly.
Rename variables and functions as you understand them (right-click -> “Edit Function Signature” or “Rename Variable”).
Use “Define Data Type” to tell Ghidra about structures.

Identifying Key Functions

Entry Point: Where execution begins. In PE files, this is AddressOfEntryPoint. In ELF, it’s the e_entry field.
main(): The programmer’s entry point (called by the runtime after initialization).
String Cross-References: Find interesting strings (strings binary | grep -i password), then find where they’re referenced in Ghidra.
Imports: Look at what Windows API or libc functions are called. CreateRemoteThread, VirtualAlloc, socket, connect are red flags for malware.

5. Dynamic Analysis: Watching the Code Breathe

Dynamic analysis involves running the binary (usually in a controlled environment) and observing its behavior. This reveals runtime values, decrypted strings, and actual execution paths.

Debugging

A debugger allows you to pause execution, inspect registers and memory, set breakpoints, and step through instructions.

Windows Debuggers:

x64dbg: Free, open-source, modern. Excellent for malware analysis and CTFs.
WinDbg: Microsoft’s debugger. Powerful for kernel debugging and crash analysis.
OllyDbg: Classic, 32-bit only, largely superseded by x64dbg.

Linux Debuggers:

GDB: The GNU Debugger. Powerful but spartan by default.
GEF (GDB Enhanced Features): A GDB plugin that adds modern features and visualization.
Pwndbg: Another excellent GDB plugin, focused on exploit development.
Radare2: Also functions as a debugger.

Common Debugger Operations:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# GDB with GEF
gdb ./binary
> break main                 # Set breakpoint at main
> run                        # Start execution
> info registers             # View all registers
> x/10x $rsp                 # Examine 10 hex words at stack pointer
> stepi                      # Step one instruction
> nexti                      # Step over (don't follow calls)
> continue                   # Resume execution
> disassemble                # Disassemble current function

Behavioral Monitoring

Instead of debugging step-by-step, observe the program’s interactions with the system.

Process Monitor (Windows): Logs file, registry, network, and process activity.
API Monitor (Windows): Logs Windows API calls with arguments.
strace (Linux): Logs system calls.
ltrace (Linux): Logs library calls.
Wireshark: Captures network traffic.

Instrumentation with Frida

Frida is a dynamic instrumentation toolkit that lets you inject JavaScript into running processes to hook functions, modify behavior, and trace execution.

1
2
3
4
5
6
7
8
# Install Frida
pip install frida-tools

# List running processes
frida-ps

# Attach to a process and run a script
frida -p <PID> -l my_script.js

Example Frida Script (Hooking a Function):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
// my_script.js
Interceptor.attach(Module.findExportByName(null, "strcmp"), {
    onEnter: function(args) {
        console.log("strcmp called!");
        console.log("  arg0: " + Memory.readUtf8String(args[0]));
        console.log("  arg1: " + Memory.readUtf8String(args[1]));
    },
    onLeave: function(retval) {
        console.log("  return: " + retval);
    }
});

This is invaluable for bypassing SSL pinning, tracing encryption routines, or understanding complex logic without laboriously stepping through every instruction.

6. Anti-Reverse Engineering: The Defender’s Shield

Malware authors, EDR vendors, and software protection schemes use various tricks to make analysis difficult. As a reverse engineer, you must learn to recognize and bypass these.

Anti-Debugging Techniques

Windows:

IsDebuggerPresent(): Checks the PEB (Process Environment Block) for the BeingDebugged flag.
CheckRemoteDebuggerPresent(): Checks if a remote debugger is attached.
NtQueryInformationProcess() with ProcessDebugPort: Queries the debug port.
Timing Checks: Measures execution time; debuggers slow things down significantly.
Hardware Breakpoint Detection: Checks debug registers (DR0-DR7).
Exception Handling Tricks: Throws exceptions that debuggers handle differently than normal execution.

Bypass Strategies:

Patch the check in memory or on disk (change JZ to JMP).
Use x64dbg’s “ScyllaHide” plugin to hide the debugger from common checks.
Set the debugger to pass exceptions to the program instead of handling them.
Manually clear the BeingDebugged flag in the PEB.

Anti-VM Techniques

Malware often refuses to run (or behaves benignly) in virtual environments to evade sandbox analysis.

Common Checks:

VM Artifacts: Checks for VMware/VirtualBox drivers, registry keys, MAC address prefixes, or process names (e.g., vmtoolsd.exe).
Hardware Checks: Low RAM, few CPU cores, small disk size.
Timing Checks: rdtsc instruction to detect VM overhead.
Hypervisor Detection: cpuid instruction with specific leaf values.

Bypass Strategies:

Use a physical analysis machine.
Remove or mask VM artifacts (uninstall VMware Tools, change MAC addresses).
Use “stealth VM” configurations designed to evade detection.
Patch the checks in the binary.

Packing and Obfuscation

Packers (e.g., UPX, Themida, VMProtect): Packers compress or encrypt the executable’s code. At runtime, a small “stub” decompresses/decrypts the real code into memory. When you analyze a packed binary statically, you see the stub, not the actual logic.

Identifying Packers:

1
2
3
4
5
6
7
# Detect packer signatures
Detect It Easy (DIE)
PEiD
ExeInfoPE

# Look for high entropy sections (encrypted data)
binwalk -E binary

Unpacking:

Manual Unpacking: Run the binary in a debugger, wait for it to unpack itself in memory, then dump the unpacked image.
Automatic Unpackers: Some packers (like UPX) can be unpacked automatically: upx -d packed.exe

Obfuscation:

Control Flow Flattening: Breaks the natural structure of loops and conditionals into a giant switch statement.
Dead Code Insertion: Adds meaningless instructions.
Instruction Substitution: Replaces simple instructions with complex equivalents.
String Encryption: Strings are encrypted and decrypted at runtime.

7. Practical Workflow: The RE Loop

When you encounter a target binary, follow this methodical approach:

1. Initial Reconnaissance

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# Identify file type and architecture
file target.exe

# Check for packing/encryption
Detect-It-Easy target.exe

# Extract strings for quick wins (IPs, URLs, passwords, debug messages)
strings -n 8 target.exe | grep -iE "(http|password|key|secret|admin)"

# Calculate hashes for research
sha256sum target.exe
# Search hash on VirusTotal, Hybrid Analysis, etc.

2. Static Analysis Pass

Open in Ghidra (or your preferred tool).
Run Auto-Analysis.
Check the Imports list - what APIs does it call? Networking? File I/O? Process manipulation?
Check the Strings window - anything interesting?
Navigate to main() or the entry point.
Identify the high-level structure: initialization, main logic loop, cleanup.
Name functions and variables as you understand them.

3. Dynamic Verification

Set up a safe analysis environment (VM snapshot, network isolation or FakeNet).
Open the binary in a debugger (x64dbg, GDB).
Set breakpoints at functions you identified as interesting (encryption routines, network calls, config parsing).
Run and observe.
Examine register/memory contents at breakpoints to understand data flow.

4. Patching

If a security check (anti-debug, license check, etc.) is blocking your analysis:

Find the check in the disassembly.
Identify the conditional jump (e.g., JZ 0x401050 - Jump if Zero).
Patch it to always pass or always fail:
- JZ -> JMP (always jump)
- JZ -> NOP NOP (never jump, continue to next instruction)
Save the patched binary or apply the patch in the debugger.

8. Resources for Practice

Crackmes and CTF Challenges:

Crackmes.one: User-submitted reverse engineering challenges.
Microcorruption: Embedded CTF with a custom debugger.
PicoCTF: Beginner-friendly CTF with RE challenges.
OverTheWire Narnia: Exploit development.

Learning Resources:

Ghidra Courses - YouTube: Many free tutorials.
Practical Malware Analysis (Book): The bible of malware RE.
Reverse Engineering for Beginners (free ebook): Comprehensive x86/ARM coverage.

Conclusion

Reverse engineering is more than just a technical skill; it is a mindset of relentless curiosity. You are not just reading code; you are reconstructing the intent of another programmer from the machine’s perspective. By mastering the structure of executables, the language of assembly, the tools of disassembly and debugging, and the techniques for bypassing protections, you transform from a user of tools into a creator of exploits and a hunter of threats.

The binary is not a barrier; it is an invitation.

Happy hacking!

1. The Fundamentals: What is Reverse Engineering?#

2. The Anatomy of an Executable: PE, ELF, and Mach-O#

PE (Portable Executable) - Windows#

ELF (Executable and Linkable Format) - Linux/Unix#

Mach-O - macOS/iOS#

3. x86/x64 Assembly: The Language of the Machine#

Registers#

Common Instructions#

Calling Conventions#

4. Static Analysis: Disassembling and Decompiling#

Disassembly#

Decompilation#

Identifying Key Functions#

5. Dynamic Analysis: Watching the Code Breathe#

Debugging#

Behavioral Monitoring#

Instrumentation with Frida#

6. Anti-Reverse Engineering: The Defender’s Shield#

Anti-Debugging Techniques#

Anti-VM Techniques#

Packing and Obfuscation#

7. Practical Workflow: The RE Loop#

1. Initial Reconnaissance#

2. Static Analysis Pass#

3. Dynamic Verification#

4. Patching#

8. Resources for Practice#

Conclusion#

References#

1. The Fundamentals: What is Reverse Engineering?

2. The Anatomy of an Executable: PE, ELF, and Mach-O

PE (Portable Executable) - Windows

ELF (Executable and Linkable Format) - Linux/Unix

Mach-O - macOS/iOS

3. x86/x64 Assembly: The Language of the Machine

Registers

Common Instructions

Calling Conventions

4. Static Analysis: Disassembling and Decompiling

Disassembly

Decompilation

Identifying Key Functions

5. Dynamic Analysis: Watching the Code Breathe

Debugging

Behavioral Monitoring

Instrumentation with Frida

6. Anti-Reverse Engineering: The Defender’s Shield

Anti-Debugging Techniques

Anti-VM Techniques

Packing and Obfuscation

7. Practical Workflow: The RE Loop

1. Initial Reconnaissance

2. Static Analysis Pass

3. Dynamic Verification

4. Patching

8. Resources for Practice

Conclusion

References