Reverse engineering is a vital skill for red teamers and pen testers. In today’s world, most software applications are designed to protect their source code from reverse engineering. However, reverse engineering is a critical skill that helps pen testers understand how the software works, identify vulnerabilities and create effective exploits. This article will introduce you to the basic reverse engineering concepts and provide code examples to help you understand the process.
What is Reverse Engineering?
Reverse engineering analyzes a software application or system to understand how it works without access to its source code. Reverse engineering involves using different techniques to extract useful information from the software, such as its functionality, algorithms, protocols, and data structures. The reverse engineering process can be used for different purposes, such as identifying security vulnerabilities, developing exploits, and creating compatible software or hardware.
Basic Concepts of Reverse Engineering
Disassembling
Disassembling is the process of transforming the compiled binary code of a program into a human-readable assembly language. Disassembly is essential in reverse engineering because it helps pen testers understand the program’s architecture, identify its functions, and follow its control flow. Disassembling can be done using different tools, such as IDA Pro, Ghidra, or Binary Ninja.
Here’s an example of disassembling a simple C program using the GNU Debugger (gdb):
$ gdb -q ./example
(gdb) disassemble main
Dump of assembler code for function main:
0x000000000040112d <+0>: push %rbp
0x000000000040112e <+1>: mov %rsp,%rbp
0x0000000000401131 <+4>: sub $0x10,%rsp
0x0000000000401135 <+8>: movl $0x0,-0x4(%rbp)
0x000000000040113c <+15>: mov $0x2,%eax
0x0000000000401141 <+20>: add $0x1,%eax
0x0000000000401144 <+23>: mov %eax,-0x8(%rbp)
0x0000000000401147 <+26>: movl $0x0,-0x4(%rbp)
0x000000000040114e <+33>: leave
0x000000000040114f <+34>: ret
End of assembler dump.
In the example above, we used gdb to disassemble the main function of a simple C program called “example.” The disassembled code shows the instructions that the program executes, their memory addresses, and their corresponding assembly code.
Decompiling
Decompiling is transforming a compiled binary code of a program into its high-level source code. Decompiling is helpful in reverse engineering because it gives pen testers a better understanding of the program’s structure. Decompiling can be done using tools such as Hex-Rays IDA Pro, Ghidra, or JetBrains IntelliJ IDEA.
Here’s an example of decompiling a simple Java program using JD-GUI:
public class Example {
public static void main(String[] args) {
int a = 0;
int b = 2;
int c = a + b;
System.out.println("Result: " + c);
}
}
In the example above, we used JD-GUI to decompile a simple Java program called “Example.” The decompiled code shows the original Java source code, including the variable declarations, assignments, and function calls.
Patching
Patching is modifying a compiled binary code of a program to fix vulnerabilities or change its behavior. Patching is helpful in reverse engineering because it allows pen testers to bypass security measures or add custom functionality. Patching can be done manually by modifying the binary code using a hex editor or using automated tools like OllyDbg or x64dbg.
Here’s an example of patching a simple C program to remove a security measure:
#include <stdio.h>
void secret_function(void) {
printf("Secret function called\n");
}
int main() {
int secret_key = 1234;
int user_key;
printf("Enter the secret key: ");
scanf("%d", &user_key);
if (user_key == secret_key) {
secret_function();
}
else {
printf("Wrong key\n");
}
return 0;
}
In the example above, a simple C program checks if the user enters the correct secret key to call the “secret_function.” We can patch this program to always call the “secret_function” by changing the jump instruction that checks the user’s key:
0804853e <main>:
804853e: 55 push %ebp
804853f: 89 e5 mov %esp,%ebp
8048541: 83 ec 28 sub $0x28,%esp
8048544: c7 45 f4 d2 04 00 00 movl $0x4d2,-0xc(%ebp)
804854b: 8d 45 f4 lea -0xc(%ebp),%eax
804854e: 50 push %eax
804854f: 68 90 86 04 08 push $0x8048690
8048554: e8 b7 fe ff ff call 8048410 <printf@plt>
8048559: 83 ec 0c sub $0xc,%esp
804855c: 8d 45 f4 lea -0xc(%ebp),%eax
804855f: 50 push %eax
8048560: 68 a0 86 04 08 push $0x80486a0
8048565: e8 a6 fe ff ff call 8048410 <printf@plt>
804856a: 83 c4 10 add $0x10,%esp
804856d: 83 ec 08 sub $0x8,%esp
8048570: 8d 45 f8 lea -0x8(%ebp),%eax
8048573: 50 push %eax
8048574: 68 c7 86 04 08 push $0x80486c7
8048579: e8 88 fe ff ff call 8048406 <scanf@plt>
804857e: 83 c4 10 add $0x10,%esp
8048581: 8b 45 f8 mov -0x8(%ebp),%eax
8048584: 3d d2 04 00 00 cmp $0x4d2,%eax
8048589: 75 07 jne 8048592 <main+59>
804858b: e8 10 00 00 00 call 80485a0 <secret_function>
8048590: eb 05 jmp 8048597 <main+70>
8048592: 83 ec 0c sub $0xc,%esp
8048595: 68 d8 86 04 08 push $0x80486d8
804859a: e8 71 fe ff ff call 8048410 printf@plt
804859f: 83 c4 10 add $0x10,%esp
80485a2: b8 00 00 00 00 mov $0x0,%eax
80485a7: c9 leave
80485a8: c3 ret
In the example above, we changed the jump instruction at memory address “0x8048589” to always jump to the “secret_function” by replacing the “jne” opcode with a “jmp” opcode. The program will always call the “secret_function” regardless of the user’s input.
Dynamic Analysis
Dynamic analysis is analyzing a program while running to understand its behavior, identify vulnerabilities, and test its defenses. Dynamic analysis can be done using different tools, such as debuggers, profilers, and fuzzer frameworks. Dynamic analysis is helpful in reverse engineering because it provides real-time feedback on the program’s behavior, which can help pen testers identify vulnerabilities that are hard to find using static analysis.
Here’s an example of using dynamic analysis to identify a buffer overflow vulnerability in a simple C program:
#include <stdio.h>
int main() {
char buffer[10];
printf("Enter your name: ");
gets(buffer);
printf("Hello, %s!\n", buffer);
return 0;
}
In the example above, a simple C program reads a string input from the user using the “gets” function, which can cause a buffer overflow vulnerability if the input is longer than the buffer size. We can use a debugger like gdb to analyze the program while it is running and identify the vulnerable code path:
$ gdb -q ./example
(gdb) run
Starting program: /home/user/example
Enter your name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Program received signal SIGSEGV, Segmentation fault.
0x41414141 in ?? ()
(gdb) bt
#0 0x41414141 in ?? ()
#1 0x08048491 in main ()
(gdb) info registers
eax 0x0 0
ecx 0x0 0
edx 0xb7fb1000 -1208233984
ebx 0xb7e818e0 -1208566112
esp 0xbffff3d0 0xbffff3d0
ebp 0xbffff3d8 0xbffff3d8
esi 0x0 0
edi 0x0 0
eip 0x41414141 0x41414141
eflags 0x10202 [ IF RF ]
cs 0x73 115
ss 0x7b 123
ds 0x7b 123
es
Obfuscation
Obfuscation is a crucial tactic in the malware developer’s arsenal to conceal their malicious code’s true nature and purpose. In essence, it’s the act of deliberately making code unintelligible, thereby rendering it harder to analyze or reverse engineer. This can be achieved through various means, such as code obfuscation, string obfuscation, and control flow obfuscation.
Code obfuscation is the most common method malware creators employ to make the code of their programs harder to understand. The technique involves modifying the structure and syntax of the code without affecting its functionality. This can include renaming variables and functions, inserting bogus code snippets, and adding redundant code blocks. Doing so makes the code more difficult to read and understand, making it harder for antivirus software to detect.
String obfuscation is another technique used to obfuscate malware. It involves encoding and hiding the strings used in the program, making identifying and understanding the manipulated data challenging. Attackers typically use encryption, compression, or encoding techniques to obscure the strings, making them harder to detect and analyze.
Control flow obfuscation is yet another method used by malware developers to make their code challenging to reverse engineer. This technique involves modifying the logical flow of the program, thereby making it harder to determine the program’s behavior. This is done by adding dead code, inserting junk instructions, and modifying the program’s control flow graph.
The primary goal of obfuscation in malware is to evade detection by antivirus software and to make it harder for reverse engineers to analyze the code. By obfuscating their code, attackers can make it difficult for security researchers to understand how the malware works, what it does, and how to remove it. In conclusion, obfuscation is essential in the malware developer’s toolkit, and security researchers must understand and combat its use.
Here’s an example of using string obfuscation to hide a secret message in a simple C program:
#include <stdio.h>
int main() {
const char* secret = "This is a secret message";
for (int i = 0; i < sizeof(secret); i++) {
printf("%c", secret[i] ^ 0xff);
}
printf("\n");
return 0;
}
In the example above, a simple C program uses string obfuscation to hide a secret message. The program XORs each message character with the value 0xff, making it harder for a reverse engineer to identify the original message.
Conclusion
Reverse engineering is an essential skill for red teamers and pen testers. It allows pen testers to understand how software works, identify vulnerabilities, and create effective exploits. In this article, we introduced you to the basic concepts of reverse engineering, including disassembling, decompiling, patching, dynamic analysis, and obfuscation. We also provided code examples to help you understand the process. With this knowledge, you will be better equipped to reverse engineer software applications and identify vulnerabilities attackers can exploit.