Reverse Engineering - Introduction and Basic Concepts

Reverse engineering is a vital skill for red teamers and pen testers. In today’s world, most software applications are designed to protect their source code from reverse engineering. However, reverse engineering is a critical skill that helps pen testers understand how the software works, identify vulnerabilities and create effective exploits. This article will introduce you to the basic reverse engineering concepts and provide code examples to help you understand the process.

What is Reverse Engineering?

Reverse engineering analyzes a software application or system to understand how it works without access to its source code. Reverse engineering involves using different techniques to extract useful information from the software, such as its functionality, algorithms, protocols, and data structures. The reverse engineering process can be used for different purposes, such as identifying security vulnerabilities, developing exploits, and creating compatible software or hardware.

Basic Concepts of Reverse Engineering

Disassembling

Disassembling is the process of transforming the compiled binary code of a program into a human-readable assembly language. Disassembly is essential in reverse engineering because it helps pen testers understand the program’s structure, identify its functions, and trace its control flow. Disassembling can be done using different tools, such as IDA Pro, Ghidra, or Binary Ninja.

Here’s an example of disassembling a simple C program using the GNU Debugger (gdb):

$ gdb -q ./example
(gdb) disassemble main
Dump of assembler code for function main:
   0x000000000040112d <+0>:     push   %rbp
   0x000000000040112e <+1>:     mov    %rsp,%rbp
   0x0000000000401131 <+4>:     sub    $0x10,%rsp
   0x0000000000401135 <+8>:     movl   $0x0,-0x4(%rbp)
   0x000000000040113c <+15>:    mov    $0x2,%eax
   0x0000000000401141 <+20>:    add    $0x1,%eax
   0x0000000000401144 <+23>:    mov    %eax,-0x8(%rbp)
   0x0000000000401147 <+26>:    movl   $0x0,-0x4(%rbp)
   0x000000000040114e <+33>:    leave
   0x000000000040114f <+34>:    ret
End of assembler dump.

In the example above, we used gdb to disassemble the main function of a simple C program called “example.” The disassembled code shows the instructions that the program executes, their memory addresses, and their corresponding assembly code.

Decompiling

Decompiling is transforming a compiled binary code of a program into its high-level source code. Decompiling helps pen testers understand a program’s logic and structure more easily than raw assembly. Decompiling can be done using tools such as Hex-Rays IDA Pro, Ghidra, or JetBrains IntelliJ IDEA.

Here’s an example of decompiling a simple Java program using JD-GUI:

public class Example {
  public static void main(String[] args) {
    int a = 0;
    int b = 2;
    int c = a + b;
    System.out.println("Result: " + c);
  }
}

In the example above, we used JD-GUI to decompile a simple Java program called “Example.” The decompiled code shows the original Java source code, including the variable declarations, assignments, and function calls.

Patching

Patching is modifying a compiled binary code of a program to fix vulnerabilities or change its behavior. Patching is helpful in reverse engineering because it allows pen testers to bypass security measures or add custom functionality. Patching can be done manually by modifying the binary code using a hex editor or using automated tools like OllyDbg or x64dbg.

Here’s an example of patching a simple C program to remove a security measure:

#include <stdio.h>

void secret_function(void) {
  printf("Secret function called\n");
}

int main() {
  int secret_key = 1234;
  int user_key;

  printf("Enter the secret key: ");
  scanf("%d", &user_key);

  if (user_key == secret_key) {
    secret_function();
  }
  else {
    printf("Wrong key\n");
  }

  return 0;
}

In the example above, a simple C program checks if the user enters the correct secret key to call the “secret_function.” We can patch this program to always call the “secret_function” by changing the jump instruction that checks the user’s key:

0804853e <main>:
 804853e:	55                   	push   %ebp
 804853f:	89 e5                	mov    %esp,%ebp
 8048541:	83 ec 28             	sub    $0x28,%esp
 8048544:	c7 45 f4 d2 04 00 00 	movl   $0x4d2,-0xc(%ebp)
 804854b:	8d 45 f4             	lea    -0xc(%ebp),%eax
 804854e:	50                   	push   %eax
 804854f:	68 90 86 04 08       	push   $0x8048690
 8048554:	e8 b7 fe ff ff       	call   8048410 <printf@plt>
 8048559:	83 ec 0c             	sub    $0xc,%esp
 804855c:	8d 45 f4             	lea    -0xc(%ebp),%eax
 804855f:	50                   	push   %eax
 8048560:	68 a0 86 04 08       	push   $0x80486a0
 8048565:	e8 a6 fe ff ff       	call   8048410 <printf@plt>
 804856a:	83 c4 10             	add    $0x10,%esp
 804856d:	83 ec 08             	sub    $0x8,%esp
 8048570:	8d 45 f8             	lea    -0x8(%ebp),%eax
 8048573:	50                   	push   %eax
 8048574:	68 c7 86 04 08       	push   $0x80486c7
 8048579:	e8 88 fe ff ff       	call   8048406 <scanf@plt>
 804857e:	83 c4 10             	add    $0x10,%esp
 8048581:	8b 45 f8             	mov    -0x8(%ebp),%eax
 8048584:	3d d2 04 00 00       	cmp    $0x4d2,%eax
 8048589:	75 07                	jne    8048592 <main+59>
 804858b:	e8 10 00 00 00          call   80485a0 <secret_function>
 8048590:	eb 05                   jmp    8048597 <main+70>
 8048592:	83 ec 0c                sub    $0xc,%esp
 8048595:	68 d8 86 04 08          push   $0x80486d8
 804859a:	e8 71 fe ff ff          call   8048410 printf@plt
 804859f:	83 c4 10                add    $0x10,%esp
 80485a2:	b8 00 00 00 00          mov    $0x0,%eax
 80485a7:	c9                      leave
 80485a8:	c3                      ret

In the example above, we changed the jump instruction at memory address “0x8048589” to always jump to the “secret_function” by replacing the “jne” opcode with a “jmp” opcode. The program will always call the “secret_function” regardless of the user’s input.

Dynamic Analysis

Dynamic analysis is analyzing a program while running to understand its behavior, identify vulnerabilities, and test its defenses. Dynamic analysis can be done using different tools, such as debuggers, profilers, and fuzzer frameworks. Dynamic analysis is helpful in reverse engineering because it provides real-time feedback on the program’s behavior, which can help pen testers identify vulnerabilities that are hard to find using static analysis.

Here’s an example of using dynamic analysis to identify a buffer overflow vulnerability in a simple C program:

#include <stdio.h>

int main() {
    char buffer[10];
    printf("Enter your name: ");
    gets(buffer);
    printf("Hello, %s!\n", buffer);
    return 0;
}

In the example above, a simple C program reads a string input from the user using the “gets” function, which can cause a buffer overflow vulnerability if the input is longer than the buffer size. We can use a debugger like gdb to analyze the program while it is running and identify the vulnerable code path:

$ gdb -q ./example
(gdb) run
Starting program: /home/user/example
Enter your name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

Program received signal SIGSEGV, Segmentation fault.
0x41414141 in ?? ()
(gdb) bt
#0 0x41414141 in ?? ()
#1 0x08048491 in main ()
(gdb) info registers
eax 0x0 0
ecx 0x0 0
edx 0xb7fb1000 -1208233984
ebx 0xb7e818e0 -1208566112
esp 0xbffff3d0 0xbffff3d0
ebp 0xbffff3d8 0xbffff3d8
esi 0x0 0
edi 0x0 0
eip 0x41414141 0x41414141
eflags 0x10202 [ IF RF ]
cs 0x73 115
ss 0x7b 123
ds 0x7b 123
es

Obfuscation

Obfuscation is a crucial tactic in the malware developer’s arsenal to conceal their malicious code’s true nature and purpose. In essence, it’s the act of deliberately making code unintelligible, thereby rendering it harder to analyze or reverse engineer. This can be achieved through various means, such as code obfuscation, string obfuscation, and control flow obfuscation.

Code obfuscation is the most common method malware creators employ to make the code of their programs harder to understand. The technique involves modifying the structure and syntax of the code without affecting its functionality. This can include renaming variables and functions, inserting bogus code snippets, and adding redundant code blocks. Doing so makes the code more difficult to read and understand, making it harder for antivirus software to detect.

String obfuscation is another technique used to obfuscate malware. It involves encoding and hiding the strings used in the program, making identifying and understanding the manipulated data challenging. Attackers typically use encryption, compression, or encoding techniques to obscure the strings, making them harder to detect and analyze.

Control flow obfuscation is yet another method used by malware developers to make their code challenging to reverse engineer. This technique involves modifying the logical flow of the program, thereby making it harder to determine the program’s behavior. This is done by adding dead code, inserting junk instructions, and modifying the program’s control flow graph.

The primary goal of obfuscation in malware is to evade detection by antivirus software and to make it harder for reverse engineers to analyze the code. By obfuscating their code, attackers can make it difficult for security researchers to understand how the malware works, what it does, and how to remove it. Obfuscation remains a core tactic for malware developers, making it critical for security researchers to recognize and counter these techniques.

Here’s an example of using string obfuscation to hide a secret message in a simple C program:

#include <stdio.h>

int main() {
  const char* secret = "This is a secret message";
  for (int i = 0; i < sizeof(secret); i++) {
    printf("%c", secret[i] ^ 0xff);
  }
  printf("\n");
  return 0;
}

In the example above, a simple C program uses string obfuscation to hide a secret message. The program XORs each message character with the value 0xff, making it harder for a reverse engineer to identify the original message.

Conclusion

Reverse engineering is an essential skill for red teamers and pen testers. It allows pen testers to understand how software works, identify vulnerabilities, and create effective exploits. In this article, we introduced you to the basic concepts of reverse engineering, including disassembling, decompiling, patching, dynamic analysis, and obfuscation. We also provided code examples to help you understand the process. With this knowledge, you will be better equipped to reverse engineer software applications and identify vulnerabilities attackers can exploit.

What is Reverse Engineering?#

Basic Concepts of Reverse Engineering#

Disassembling#

Decompiling#

Patching#

Dynamic Analysis#

Obfuscation#

Conclusion#