In today’s digital age, malware has become a pervasive threat that threatens the integrity and confidentiality of data in both personal and corporate environments. As a red teamer or pen tester, it is crucial to understand how malware works and how to analyze it. One of the essential skills in malware analysis is the ability to disassemble malicious code to understand its functionality fully. In this article, we will explore advanced malware analysis disassembly techniques that can help you dissect and understand malware.
What is Disassembly?
Disassembly is the process of converting binary code into human-readable assembly language instructions. It is an essential technique in malware analysis since malware authors often try to obfuscate their code to make it difficult to analyze. Disassembling malware code can provide insight into its functionality, identify system calls and functions used by the malware, and determine if the code is packed or obfuscated.
There are two main types of disassembly techniques: static and dynamic. Static disassembly involves analyzing the binary code without executing it, while dynamic disassembly involves executing the code and analyzing it as it runs.
Static Disassembly Techniques
Static disassembly techniques are commonly used to analyze malware, as they can be performed without executing the code. Here are some of the most common static disassembly techniques:
Manual Disassembly
Manual disassembly is a crucial skill in malware analysis, especially when dealing with complex and obfuscated code that automated tools cannot handle. Manual disassembly involves translating machine code into assembly language instructions manually, allowing an analyst to understand the underlying functionality of the code.
To perform manual disassembly, an analyst must have a solid understanding of assembly language and the architecture of the target system. The analyst must also be familiar with the various instructions used by the processor, the function calling conventions, and the structure of the executable file format.
There are several steps involved in manual disassembly, which we will discuss in detail below:
Identify the entry point
The entry point is the first instruction executed when the program is loaded into memory. To identify the entry point, the analyst must examine the executable file’s header and locate the program’s entry point address. Once the entry point is identified, the analyst can start tracing the program’s execution path. For example, if we look at the output of the “objdump” tool, we can see the entry point of a binary file:
$ objdump -f malware.bin
malware.bin: file format elf64-x86-64
architecture: i386:x86-64, flags 0x00000112:
EXEC_P, HAS_SYMS, D_PAGED
start address 0x400440
From the output, we can see that the entry point of the malware is at the address 0x400440.
Trace the program’s execution path
Tracing the program’s execution path involves following the flow of control from the entry point to various functions and subroutines. This process can be time-consuming and requires a deep understanding of the program’s logic and structure. As the analyst traces the execution path, they should annotate the code to keep track of the program’s state and identify key functions and system calls. For example, if we use the “objdump” tool to disassemble the binary code, we can trace the execution path of the malware:
$ objdump -d malware.bin
...
400440: 48 83 ec 08 sub $0x8,%rsp
400444: c7 44 24 04 00 00 00 movl $0x0,0x4(%rsp)
40044b: 00
40044c: 48 8d 44 24 04 lea 0x4(%rsp),%rax
400451: 48 89 04 24 mov %rax,(%rsp)
400455: 48 8d 45 f8 lea -0x8(%rbp),%rax
400459: be 08 00 00 00 mov $0x8,%esi
40045e: ba 02 00 00 00 mov $0x2,%edx
400463: 48 89 c7 mov %rax,%rdi
400466: e8 c5 fe ff ff callq 400330 <memcpy@plt>
...
From the output, we can see that the malware starts at the entry point address 0x400440
and then performs various
operations, including a call to the “memcpy
” function.
Identify key functions and system calls
Identifying key functions and system calls is a critical step in manual disassembly, as it can help the analyst understand the program’s functionality and behavior. Key functions and system calls include file I/O, network communication, encryption and decryption, and process injection. The analyst should examine the parameters passed to these functions and the values returned to understand their purpose.
For example, if we use the “objdump” tool to disassemble the code, we can identify key system calls used by the malware:
$ objdump -d malware.bin
...
4004a3: bf 01 00 00 00 mov $0x1,%edi
4004a8: e8 93 fe ff ff callq 400340 <close@plt>
4004ad: 48 8d 45 f8 lea -0x8(%rbp),%rax
4004b1: ba 08 00 00 00 mov $0x8,%edx
4004b6: be 01 00 00 00 mov $0x1,%esi
4004bb: 48 89 c7 mov %rax,%rdi
4004be: e8 6d fe ff ff callq 400330 <write@plt>
...
From the output, we can see that the malware uses the “close” and “write” system calls to perform file I/O operations.
Reconstruct high-level code
Once the analyst has identified the key functions and system calls, they can begin reconstructing high-level code to understand the program’s overall behavior. Reconstructing high-level code involves analyzing the code’s flow and logic and creating a pseudocode representation of the program. This process can be challenging, especially for large and complex programs, but it is essential in understanding the program’s functionality fully.
For example, let’s say we are analyzing a malware sample that encrypts files on an infected system. Using manual disassembly, we have identified the key functions used by the malware, including the encryption algorithm and the file I/O operations. We can then create a pseudocode representation of the malware’s behavior:
for each file in the target directory:
open the file for reading
read the contents of the file
encrypt the contents of the file using the encryption algorithm
close the file
open the file for writing
write the encrypted contents to the file
close the file
By creating a high-level representation of the malware’s behavior, we can gain a deeper understanding of its functionality and behavior.
Automated Disassembly Tools
Automated disassembly tools are an essential component of malware analysis, as they can quickly generate an assembly code representation of the executable file. Automated disassembly tools can handle large and complex codebases, identify key system calls and functions, and highlight potential vulnerabilities and weaknesses.
There are several popular automated disassembly tools available to malware analysts, including IDA Pro, Ghidra, and Binary Ninja. These tools use a variety of techniques to disassemble code, including static analysis, dynamic analysis, and emulation.
IDA Pro
IDA Pro is one of the most popular and powerful automated disassembly tools used by malware analysts. It supports a wide range of executable file formats, including Windows PE, Linux ELF, and MacOS Mach-O. IDA Pro uses a combination of static and dynamic analysis techniques to generate an assembly code representation of the executable file.
IDA Pro’s static analysis capabilities allow analysts to quickly identify key system calls and functions, strings, and variables in the code. It also provides a range of advanced features, including cross-referencing, call graph analysis, and interactive debugging.
For example, let’s say we are analyzing a malware sample using IDA Pro. We can load the executable file into IDA Pro and generate an assembly code representation of the code:
.text:0000000000401000 ; =============== S U B R O U T I N E =======================================
.text:0000000000401000
.text:0000000000401000 ; int __cdecl main(int argc, const char **argv, const char **envp)
.text:0000000000401000 _main proc near ; CODE XREF: __libc_start_main+23↑p
.text:0000000000401000
.text:0000000000401000 var_50 = qword ptr [-50h]
.text:0000000000401000 var_48 = qword ptr [-48h]
.text:0000000000401000 var_40 = qword ptr [-40h]
.text:0000000000401000 var_38 = qword ptr [-38h]
.text:0000000000401000 var_30 = qword ptr [-30h]
.text:0000000000401000 var_20 = qword ptr [-20h]
.text:0000000000401000 var_18 = qword ptr [-18h]
.text:0000000000401000 var_10 = qword ptr [-10h]
.text:0000000000401000 var_8 = qword ptr [-8h]
.text:0000000000401000 argc = dword ptr 8
.text:0000000000401000 argv = qword ptr 10h
.text:0000000000401000 envp = qword ptr 18h
.text:0000000000401000
.text:0000000000401000 push rbp
.text:0000000000401001 mov rbp, rsp
.text:0000000000401004 mov [rbp+var_8], rdi
.text:0000000000401008 mov [rbp+var_10], rsi
.text:000000000040100c mov eax, 0
.text:0000000000401011 pop rbp
.text:0000000000401012 retn
.text:0000000000401012 _main endp
From the output, we can see that the code has an entry point at address 0x401000
, and the “main” function starts at
the same address. We can also see the various variables and parameters used by the function.
Ghidra
Ghidra is another popular automated disassembly tool used by malware analysts. It is an open-source reverse engineering tool that supports a wide range of executable file formats. Ghidra uses a combination of static and dynamic analysis techniques to generate an assembly code representation of the executable file.
Ghidra’s static analysis capabilities are similar to those of IDA Pro, allowing analysts to quickly identify key system calls and functions, strings, and variables in the code. It also provides advanced features, including cross-referencing, function graph analysis, and decompilation.
For example, let’s say we are analyzing a malware sample using Ghidra. We can load the executable file into Ghidra and generate an assembly code representation of the code:
entry:
undefined8 main(void)
{
int32_t iVar1;
iVar1 = puts("Hello, world!");
return CONCAT71((int7)(iVar1 >> 8),1);
}
From the output, we can see that the code has an entry point called “main,” which prints the message “Hello, world!” and then returns a value.
Binary Ninja
Binary Ninja is a modern disassembly tool that supports a wide range of executable file formats. It uses a combination of static and dynamic analysis techniques to generate an assembly code representation of the executable file.
Binary Ninja’s static analysis capabilities are similar to those of IDA Pro and Ghidra, allowing analysts to quickly identify key system calls and functions, strings, and variables in the code. It also provides advanced features, including cross-referencing, function graph analysis, and automatic detection of common code patterns.
For example, let’s say we are analyzing a malware sample using Binary Ninja. We can load the executable file into Binary Ninja and generate an assembly code representation of the code:
int main(void)
{
puts("Hello, world!");
return 0;
}
From the output, we can see that the code has an entry point called “main,” which prints the message “Hello, world!” and then returns a value.
Decompilers
Decompilers are powerful tools that allow analysts to convert machine code back into high-level programming languages such as C, C++, or Java. Decompilers can significantly speed up the analysis process by providing a higher-level representation of the program, which can make it easier to identify vulnerabilities and weaknesses.
There are several popular decompilers available to malware analysts, including IDA Pro, Ghidra, and Hex-Rays. These tools use a variety of techniques to decompile code, including static analysis, dynamic analysis, and emulation.
IDA Pro
IDA Pro is one of the most popular and powerful decompilers used by malware analysts. It supports a wide range of executable file formats, including Windows PE, Linux ELF, and MacOS Mach-O. IDA Pro uses a combination of static and dynamic analysis techniques to decompile the code back into a high-level programming language. IDA Pro’s decompilation capabilities allow analysts to quickly identify vulnerabilities and weaknesses in the code. It also provides a range of advanced features, including cross-referencing, call graph analysis, and interactive debugging.
For example, let’s say we are analyzing a malware sample using IDA Pro. We can load the executable file into IDA Pro and generate a decompiled representation of the code:
int __cdecl main(int argc, const char **argv, const char **envp)
{
int result; // eax
sub_401B00("Hello, World!");
result = 0;
return result;
}
From the output, we can see that the code has an entry point called “main,” which calls the “sub_401B00
” function to
print the message “Hello, World!” and then returns a value.
Ghidra
Ghidra is another popular decompiler used by malware analysts. It is an open-source reverse engineering tool that supports a wide range of executable file formats. Ghidra uses a combination of static and dynamic analysis techniques to decompile the code back into a high-level programming language. Ghidra’s decompilation capabilities are similar to those of IDA Pro, allowing analysts to quickly identify vulnerabilities and weaknesses in the code. It also provides advanced features, including cross-referencing, function graph analysis, and decompilation.
For example, let’s say we are analyzing a malware sample using Ghidra. We can load the executable file into Ghidra and generate a decompiled representation of the code:
void __cdecl main(int argc,char **argv)
{
char *__s1;
__s1 = "Hello, world!";
puts(__s1);
return;
}
From the output, we can see that the code has an entry point called “main,” which initializes a string and then calls the “puts” function to print the message “Hello, world!”
Hex-Rays
Hex-Rays is a powerful decompiler that supports a wide range of executable file formats. It uses a combination of static and dynamic analysis techniques to decompile the code back into a high-level programming language. Hex-Rays’ decompilation capabilities are similar to those of IDA Pro and Ghidra, allowing analysts to quickly identify vulnerabilities and weaknesses in the code. It also provides advanced features, including cross-referencing, function graph analysis, and decompilation.
For example, let’s say we are analyzing a malware sample using Hex-Rays. We can load the executable file into Hex-Rays and generate a decompiled representation of the code:
int __cdecl main(int argc, const char **argv, const char **envp)
{
sub_4014E0("Hello, World!");
return 0;
}
From the output, we can see that the code has an entry point called “main,” which calls the “sub_4014E0
” function to
print the message “Hello, World!” and then returns a value.
Java Decompilers
Java decompilers are used to decompile Java class files back into Java source code. Java decompilers are used in situations where a Java program has been compiled and only the compiled bytecode is available. Some popular Java decompilers include JD-GUI, JAD, and Fernflower. For example, let’s say we are analyzing a Java malware sample using JD-GUI. We can load the Java class file into JD-GUI and generate a decompiled representation of the code:
public static void main(String[] args) throws Exception {
String message = "Hello, World!";
Cipher cipher = Cipher.getInstance("AES/CBC/PKCS5PADDING");
byte[] ivBytes = new byte[16];
Arrays.fill(ivBytes, (byte) 0);
IvParameterSpec iv = new IvParameterSpec(ivBytes);
SecretKeySpec key = new SecretKeySpec("mysecretpassword".getBytes("UTF-8"), "AES");
cipher.init(Cipher.ENCRYPT_MODE, key, iv);
byte[] encrypted = cipher.doFinal(message.getBytes());
System.out.println(new String(encrypted));
}
From the output, we can see that the code has an entry point called “main,” which uses the Java Cipher class to encrypt the message “Hello, World!” using AES encryption with a CBC mode.
Python Decompilers
Python decompilers are used to decompile Python bytecode back into Python source code. Python decompilers are used in situations where a Python program has been compiled and only the compiled bytecode is available. Some popular Python decompilers include uncompyle6 and disassembler. For example, let’s say we are analyzing a Python malware sample using uncompyle6. We can load the Python bytecode file into uncompyle6 and generate a decompiled representation of the code:
def main():
message = "Hello, World!"
key = b"mysecretpassword"
iv = b'\x00' * 16
aes = AES.new(key, AES.MODE_CBC, iv)
ciphertext = aes.encrypt(pad(message.encode(), AES.block_size))
print(ciphertext)
From the output, we can see that the code has an entry point called “main,” which uses the Python Cryptography package to encrypt the message “Hello, World!” using AES encryption with a CBC mode.
Dynamic Disassembly Techniques
Dynamic disassembly techniques involve executing the malware code and analyzing it as it runs. Here are some common dynamic disassembly techniques:
Debuggers
Debuggers are powerful tools that allow analysts to step through an executable file’s code and interact with it as it is running. Debuggers can be used to identify the cause of program crashes, analyze program behavior, and identify security vulnerabilities.
There are several popular debuggers available to malware analysts, including gdb, OllyDbg, and WinDbg. These tools use a variety of techniques to debug code, including setting breakpoints, stepping through code, and examining memory.
gdb
gdb is a powerful debugger that is commonly used in Linux environments. It supports a wide range of executable file formats, including Windows PE, Linux ELF, and MacOS Mach-O. gdb can be used to set breakpoints, examine memory, and step through code. For example, let’s say we are analyzing a Linux malware sample using gdb. We can load the executable file into gdb and set a breakpoint at the entry point of the program:
(gdb) file malware
Reading symbols from malware...
(gdb) break main
Breakpoint 1 at 0x40113c
(gdb) run
Starting program: /home/user/malware
Breakpoint 1, 0x000000000040113c in main ()
From the output, we can see that gdb has successfully loaded the executable file and set a breakpoint at the entry point of the program. We can now use gdb to step through the code and examine memory.
OllyDbg
OllyDbg is a powerful debugger that is commonly used in Windows environments. It supports a wide range of executable file formats, including Windows PE and COFF. OllyDbg can be used to set breakpoints, examine memory, and step through code. For example, let’s say we are analyzing a Windows malware sample using OllyDbg. We can load the executable file into OllyDbg and set a breakpoint at the entry point of the program:
- File > Open
- Select malware.exe
- Press F2 to open the breakpoint window
- Right click on the empty area of the window
- Click on “New breakpoint”
- Enter the address of the entry point of the program
- Click on “OK”
- Press F9 to run the program
From the output, we can see that OllyDbg has successfully loaded the executable file and set a breakpoint at the entry point of the program. We can now use OllyDbg to step through the code and examine memory.
WinDbg WinDbg is a powerful debugger that is commonly used in Windows environments. It supports a wide range of executable file formats, including Windows PE and COFF. WinDbg can be used to set breakpoints, examine memory, and step through code. For example, let’s say we are analyzing a Windows malware sample using WinDbg. We can load the executable file into WinDbg and set a breakpoint at the entry point of the program:
- File > Open executable
- Select malware.exe
- Type “bp main” to set a breakpoint at the entry point of the program
- Type “g” to run the program
- From the output, we can see that WinDbg has successfully loaded the executable file and set a breakpoint at the entry point of the program.
We can now use WinDbg to step through the code and examine memory.
Dynamic Binary Instrumentation (DBI)
Dynamic Binary Instrumentation (DBI) is a powerful technique used by malware analysts to observe and manipulate the behavior of a running program. DBI allows analysts to monitor system calls, API calls, and other events in real-time and analyze the program’s behavior. DBI can be used to identify malware behavior, detect malicious activity, and develop countermeasures.
There are several popular DBI frameworks available to malware analysts, including PIN, DynamoRIO, and Frida. These tools use a variety of techniques to dynamically instrument the code, including binary rewriting, dynamic code generation, and just-in-time compilation.
PIN
PIN is a powerful DBI framework developed by Intel. It supports a wide range of executable file formats, including Windows PE, Linux ELF, and MacOS Mach-O. PIN can be used to instrument the code, monitor system calls, and intercept API calls. For example, let’s say we are analyzing a Linux malware sample using PIN. We can load the executable file into PIN and instrument the code to monitor system calls:
$ pin -t /path/to/pin/source/tools/MyPinTool/obj-intel64/MyPinTool.so -- /path/to/malware
From the output, we can see that PIN has successfully loaded the MyPinTool and instrumented the code. We can now use PIN to monitor system calls and intercept API calls in real-time.
DynamoRIO
DynamoRIO is a powerful DBI framework developed by Microsoft Research. It supports a wide range of executable file formats, including Windows PE, Linux ELF, and MacOS Mach-O. DynamoRIO can be used to instrument the code, monitor system calls, and intercept API calls. For example, let’s say we are analyzing a Windows malware sample using DynamoRIO. We can load the executable file into DynamoRIO and instrument the code to monitor system calls:
> drrun -c /path/to/dynamorio/samples/simple/instr.dll -- malware.exe
From the output, we can see that DynamoRIO has successfully loaded the instr.dll and instrumented the code. We can now use DynamoRIO to monitor system calls and intercept API calls in real-time.
Frida
Frida is a powerful DBI framework developed by the security company NowSecure. It supports a wide range of platforms, including Windows, Linux, Android, and iOS. Frida can be used to instrument the code, monitor system calls, and intercept API calls. For example, let’s say we are analyzing an Android malware sample using Frida. We can load the APK file into Frida and instrument the code to monitor system calls:
import frida
def on_message(message, data):
print(message)
process = frida.get_usb_device().attach('com.example.malware')
script = process.create_script("""
Interceptor.attach(Module.findExportByName(null, "system"), {
onEnter: function(args) {
console.log("[*] system(" + args[0].readUtf8String() + ")");
}
});
""")
script.on('message', on_message)
script.load()
From the output, we can see that Frida has successfully loaded the script and instrumented the code. We can now use Frida to monitor system calls and intercept API calls in real-time.
Virtual Machines (VMs)
VMs are programs that can emulate a complete computer system, including hardware and software. VMs can be useful in analyzing malware in a safe and controlled environment, as they can isolate the malware from the host system. Some popular VMs for malware analysis include VirtualBox, VMware, and QEMU.
Examples
Let’s look at some real-world examples of how disassembly techniques can be used in malware analysis.
Stuxnet Worm
The Stuxnet worm was a sophisticated piece of malware that targeted industrial control systems. It used multiple zero-day exploits and was designed to attack specific hardware configurations.
To analyze Stuxnet, malware analysts used a combination of automated and manual disassembly techniques. They discovered that the malware was packed and used multiple layers of obfuscation to make it difficult to analyze.
Using IDA Pro, analysts were able to identify the worm’s main functions and analyze its behavior. They discovered that Stuxnet used a zero-day exploit to spread via USB drives and network shares, and then used additional exploits to attack specific hardware configurations.
Debugging the malware with WinDbg allowed analysts to monitor its behavior in real-time and identify key system calls used by the malware. They discovered that Stuxnet used a custom protocol to communicate with its command and control servers and that it was capable of modifying the code on infected systems to avoid detection.
WannaCry Ransomware
WannaCry was a global ransomware attack that affected hundreds of thousands of computers in May 2017. It exploited a vulnerability in the Windows operating system to spread via network shares and encrypt files on infected systems. To analyze WannaCry, malware analysts used a combination of automated and manual disassembly techniques. They discovered that the malware was packed and used multiple layers of obfuscation to avoid detection.
Using IDA Pro, analysts were able to identify the malware’s encryption algorithm and key system calls. They discovered that WannaCry used the NSA’s EternalBlue exploit to spread via network shares and that it communicated with its command and control servers using the Tor network.
Debugging the malware with GDB allowed analysts to monitor its behavior in real-time and identify specific function calls. They discovered that WannaCry was capable of scanning for and infecting vulnerable systems in a matter of seconds.
Conclusion
Disassembly techniques are essential in malware analysis, as they allow analysts to understand the functionality of malicious code and identify key system calls and functions used by the malware. By using a combination of automated and manual disassembly techniques, analysts can uncover the behavior of complex and sophisticated malware such as Stuxnet and WannaCry.
Whether you are a red teamer, pen tester, or malware analyst, understanding disassembly techniques is crucial in analyzing and understanding the behavior of malware. By using tools such as IDA Pro, Ghidra, WinDbg, and GDB, you can become a more effective and efficient malware analyst, and help protect against the ever-evolving threat of malware.