Reverse engineering is a vital skill for red teamers and pen testers. In today’s world, most software applications are designed to protect their source code from prying eyes. However, for a professional operator, the binary is not a black box - it is a puzzle waiting to be solved. Whether you are analyzing a custom C2 implant to understand its capabilities, bypassing an EDR’s function hooks, identifying vulnerabilities in a proprietary protocol, or unpacking malware to extract indicators of compromise, reverse engineering is the key that unlocks it all.
In this article, we’ll explore the core concepts of reverse engineering, from the fundamental structure of executable files and x86/x64 assembly language to the psychological warfare of anti-debugging techniques and the practical workflows that professional analysts use.
1. The Fundamentals: What is Reverse Engineering?
Reverse engineering (RE) is the process of analyzing a finished product (in our case, a compiled binary) to understand its design, architecture, and functionality - without access to the original source code or documentation.
For offensive security professionals, RE serves several purposes:
- Malware Analysis: Understanding how threats work to develop defenses or leverage their techniques.
- Vulnerability Research: Finding exploitable bugs in closed-source software.
- Anti-Cheat/DRM Bypass: Understanding protection mechanisms to test or circumvent them.
- EDR Evasion: Analyzing how security products detect threats to avoid detection.
- Custom Protocol Analysis: Understanding proprietary network protocols used by targets.
2. The Anatomy of an Executable: PE, ELF, and Mach-O
Before you can pull a program apart, you must understand how it’s put together. Different operating systems use different container formats for executable code.
PE (Portable Executable) - Windows
The PE format is used by Windows for executables (.exe), dynamic link libraries (.dll), and drivers (.sys).
Key Sections:
- DOS Header: Legacy compatibility; contains the “MZ” magic bytes and a pointer to the PE header.
- PE Header: Contains the “PE\0\0” signature and critical metadata.
- Optional Header: Despite its name, this is required for executables. Contains the entry point (AddressOfEntryPoint), image base, and section alignment.
- Section Table: Describes each section’s virtual address, raw size, and permissions.
.text: The executable code..data: Initialized global and static variables..rdata: Read-only data, including string literals and constants..idata: The Import Address Table (IAT) - lists external functions the program calls..edata: The Export Address Table (EAT) - lists functions the DLL exports..rsrc: Resources like icons, dialogs, and embedded files..reloc: Relocation information for ASLR.
Analysis Tools:
| |
ELF (Executable and Linkable Format) - Linux/Unix
ELF is the standard binary format for Linux, BSD, and many embedded systems.
Key Sections:
- ELF Header: Contains the magic bytes (
\\x7fELF), architecture, entry point, and program/section header offsets. - Program Headers: Describe segments for loading the binary into memory.
- Section Headers: Describe sections for linking and debugging.
.text: Executable code..data: Initialized data..bss: Uninitialized data (zeroed at load time)..rodata: Read-only data (strings, constants)..plt/.got: Procedure Linkage Table and Global Offset Table for dynamic linking..symtab/.dynsym: Symbol tables (may be stripped).
Analysis Tools:
| |
Mach-O - macOS/iOS
Mach-O (Mach Object) is Apple’s executable format. A unique feature is support for “fat binaries” (Universal Binaries) that contain code for multiple architectures (e.g., both Intel x86_64 and Apple Silicon ARM64).
Analysis Tools:
| |
[!TIP] Always start analysis by checking the file’s headers using
file,readelf, orotool. This tells you the architecture (x86, x64, ARM), whether it’s stripped (no symbols), statically or dynamically linked, and the file type (executable, shared library, object file).
3. x86/x64 Assembly: The Language of the Machine
To reverse engineer compiled code, you must be able to read assembly language. Modern Windows and Linux systems primarily use x86 (32-bit) or x86_64 (64-bit) instruction sets, though ARM64 (64-bit) is becoming more common in both platforms and is the norm on macOS and most modern mobile devices.
Registers
Registers are small, fast storage locations inside the CPU.
General Purpose Registers (x64):
| 64-bit | 32-bit | 16-bit | 8-bit (low) | Purpose |
|---|---|---|---|---|
| RAX | EAX | AX | AL | Accumulator (return values) |
| RBX | EBX | BX | BL | Base (general purpose) |
| RCX | ECX | CX | CL | Counter (loop counts) |
| RDX | EDX | DX | DL | Data (I/O, multiplication) |
| RSI | ESI | SI | SIL | Source Index |
| RDI | EDI | DI | DIL | Destination Index |
| RSP | ESP | SP | SPL | Stack Pointer |
| RBP | EBP | BP | BPL | Base Pointer (frame pointer) |
| R8-R15 | R8D-R15D | R8W-R15W | R8B-R15B | Additional registers (x64 only) |
Special Registers:
- RIP/EIP: Instruction Pointer - points to the next instruction to execute.
- RFLAGS/EFLAGS: Flags register - contains status flags (Zero, Carry, Sign, Overflow).
Common Instructions
| |
Calling Conventions
Understanding how functions receive arguments and return values is crucial for following program logic.
x64 Windows (Microsoft):
- First 4 integer/pointer arguments: RCX, RDX, R8, R9
- Additional arguments: pushed on stack (right to left)
- Return value: RAX
- Caller allocates 32-byte “shadow space” on stack
x64 Linux/macOS (System V AMD64 ABI):
- First 6 integer/pointer arguments: RDI, RSI, RDX, RCX, R8, R9
- Additional arguments: pushed on stack (right to left)
- Return value: RAX
x86 (32-bit) cdecl:
- All arguments pushed on stack (right to left)
- Caller cleans up stack
- Return value: EAX
4. Static Analysis: Disassembling and Decompiling
Static analysis is the act of studying code without executing it. It’s safe (no risk of triggering malware) but can be thwarted by obfuscation and packing.
Disassembly
Disassembly transforms machine code (raw bytes) into human-readable assembly language.
Tools:
- Ghidra: Free, open-source, NSA-developed. Excellent decompiler. The go-to for most red teamers.
- IDA Pro: Industry standard, expensive. Unmatched in analysis capabilities.
- Binary Ninja: Modern, scriptable, with a good UI.
- Radare2/Cutter: Free, open-source, command-line focused with a GUI option.
- objdump: Built into GNU binutils, quick and dirty.
| |
Decompilation
Decompilation attempts to reconstruct high-level C/C++ code from assembly. It’s much easier to read but loses variable names, comments, and some structural information.
Ghidra’s Decompiler Workflow:
- Create a new project and import the binary.
- Run “Auto-Analyze” to identify functions, data types, and cross-references.
- Navigate to the function of interest (e.g.,
main, or follow cross-references from interesting strings). - The decompiler window shows the reconstructed C code alongside the assembly.
- Rename variables and functions as you understand them (right-click -> “Edit Function Signature” or “Rename Variable”).
- Use “Define Data Type” to tell Ghidra about structures.
Identifying Key Functions
- Entry Point: Where execution begins. In PE files, this is
AddressOfEntryPoint. In ELF, it’s thee_entryfield. main(): The programmer’s entry point (called by the runtime after initialization).- String Cross-References: Find interesting strings (
strings binary | grep -i password), then find where they’re referenced in Ghidra. - Imports: Look at what Windows API or libc functions are called.
CreateRemoteThread,VirtualAlloc,socket,connectare red flags for malware.
5. Dynamic Analysis: Watching the Code Breathe
Dynamic analysis involves running the binary (usually in a controlled environment) and observing its behavior. This reveals runtime values, decrypted strings, and actual execution paths.
Debugging
A debugger allows you to pause execution, inspect registers and memory, set breakpoints, and step through instructions.
Windows Debuggers:
- x64dbg: Free, open-source, modern. Excellent for malware analysis and CTFs.
- WinDbg: Microsoft’s debugger. Powerful for kernel debugging and crash analysis.
- OllyDbg: Classic, 32-bit only, largely superseded by x64dbg.
Linux Debuggers:
- GDB: The GNU Debugger. Powerful but spartan by default.
- GEF (GDB Enhanced Features): A GDB plugin that adds modern features and visualization.
- Pwndbg: Another excellent GDB plugin, focused on exploit development.
- Radare2: Also functions as a debugger.
Common Debugger Operations:
| |
Behavioral Monitoring
Instead of debugging step-by-step, observe the program’s interactions with the system.
- Process Monitor (Windows): Logs file, registry, network, and process activity.
- API Monitor (Windows): Logs Windows API calls with arguments.
- strace (Linux): Logs system calls.
- ltrace (Linux): Logs library calls.
- Wireshark: Captures network traffic.
Instrumentation with Frida
Frida is a dynamic instrumentation toolkit that lets you inject JavaScript into running processes to hook functions, modify behavior, and trace execution.
| |
Example Frida Script (Hooking a Function):
| |
This is invaluable for bypassing SSL pinning, tracing encryption routines, or understanding complex logic without laboriously stepping through every instruction.
6. Anti-Reverse Engineering: The Defender’s Shield
Malware authors, EDR vendors, and software protection schemes use various tricks to make analysis difficult. As a reverse engineer, you must learn to recognize and bypass these.
Anti-Debugging Techniques
Windows:
IsDebuggerPresent(): Checks the PEB (Process Environment Block) for theBeingDebuggedflag.CheckRemoteDebuggerPresent(): Checks if a remote debugger is attached.NtQueryInformationProcess()withProcessDebugPort: Queries the debug port.Timing Checks: Measures execution time; debuggers slow things down significantly.Hardware Breakpoint Detection: Checks debug registers (DR0-DR7).Exception Handling Tricks: Throws exceptions that debuggers handle differently than normal execution.
Bypass Strategies:
- Patch the check in memory or on disk (change
JZtoJMP). - Use x64dbg’s “ScyllaHide” plugin to hide the debugger from common checks.
- Set the debugger to pass exceptions to the program instead of handling them.
- Manually clear the
BeingDebuggedflag in the PEB.
Anti-VM Techniques
Malware often refuses to run (or behaves benignly) in virtual environments to evade sandbox analysis.
Common Checks:
- VM Artifacts: Checks for VMware/VirtualBox drivers, registry keys, MAC address prefixes, or process names (e.g.,
vmtoolsd.exe). - Hardware Checks: Low RAM, few CPU cores, small disk size.
- Timing Checks:
rdtscinstruction to detect VM overhead. - Hypervisor Detection:
cpuidinstruction with specific leaf values.
Bypass Strategies:
- Use a physical analysis machine.
- Remove or mask VM artifacts (uninstall VMware Tools, change MAC addresses).
- Use “stealth VM” configurations designed to evade detection.
- Patch the checks in the binary.
Packing and Obfuscation
Packers (e.g., UPX, Themida, VMProtect): Packers compress or encrypt the executable’s code. At runtime, a small “stub” decompresses/decrypts the real code into memory. When you analyze a packed binary statically, you see the stub, not the actual logic.
Identifying Packers:
| |
Unpacking:
- Manual Unpacking: Run the binary in a debugger, wait for it to unpack itself in memory, then dump the unpacked image.
- Automatic Unpackers: Some packers (like UPX) can be unpacked automatically:
upx -d packed.exe
Obfuscation:
- Control Flow Flattening: Breaks the natural structure of loops and conditionals into a giant switch statement.
- Dead Code Insertion: Adds meaningless instructions.
- Instruction Substitution: Replaces simple instructions with complex equivalents.
- String Encryption: Strings are encrypted and decrypted at runtime.
7. Practical Workflow: The RE Loop
When you encounter a target binary, follow this methodical approach:
1. Initial Reconnaissance
| |
2. Static Analysis Pass
- Open in Ghidra (or your preferred tool).
- Run Auto-Analysis.
- Check the Imports list - what APIs does it call? Networking? File I/O? Process manipulation?
- Check the Strings window - anything interesting?
- Navigate to
main()or the entry point. - Identify the high-level structure: initialization, main logic loop, cleanup.
- Name functions and variables as you understand them.
3. Dynamic Verification
- Set up a safe analysis environment (VM snapshot, network isolation or FakeNet).
- Open the binary in a debugger (x64dbg, GDB).
- Set breakpoints at functions you identified as interesting (encryption routines, network calls, config parsing).
- Run and observe.
- Examine register/memory contents at breakpoints to understand data flow.
4. Patching
If a security check (anti-debug, license check, etc.) is blocking your analysis:
- Find the check in the disassembly.
- Identify the conditional jump (e.g.,
JZ 0x401050- Jump if Zero). - Patch it to always pass or always fail:
JZ->JMP(always jump)JZ->NOP NOP(never jump, continue to next instruction)
- Save the patched binary or apply the patch in the debugger.
8. Resources for Practice
Crackmes and CTF Challenges:
- Crackmes.one: User-submitted reverse engineering challenges.
- Microcorruption: Embedded CTF with a custom debugger.
- PicoCTF: Beginner-friendly CTF with RE challenges.
- OverTheWire Narnia: Exploit development.
Learning Resources:
- Ghidra Courses - YouTube: Many free tutorials.
- Practical Malware Analysis (Book): The bible of malware RE.
- Reverse Engineering for Beginners (free ebook): Comprehensive x86/ARM coverage.
Conclusion
Reverse engineering is more than just a technical skill; it is a mindset of relentless curiosity. You are not just reading code; you are reconstructing the intent of another programmer from the machine’s perspective. By mastering the structure of executables, the language of assembly, the tools of disassembly and debugging, and the techniques for bypassing protections, you transform from a user of tools into a creator of exploits and a hunter of threats.
The binary is not a barrier; it is an invitation.
Happy hacking!