Advanced Malware Analysis - Disassembly Techniques

In today’s digital age, malware has become a pervasive threat that threatens the integrity and confidentiality of data in both personal and corporate environments. As a red teamer or pen tester, it is crucial to understand how malware works and how to analyze it. One of the essential skills in malware analysis is the ability to disassemble malicious code to understand its functionality fully. In this article, we will explore advanced malware analysis disassembly techniques that can help you dissect and understand malware.

What is Disassembly?

Disassembly is the process of converting binary code into human-readable assembly language instructions. It is an essential technique in malware analysis since malware authors often try to obfuscate their code to make it difficult to analyze. Disassembling malware code can provide insight into its functionality, identify system calls and functions used by the malware, and determine if the code is packed or obfuscated.

There are two main types of disassembly techniques: static and dynamic. Static disassembly involves analyzing the binary code without executing it, while dynamic disassembly involves executing the code and analyzing it as it runs.

Static Disassembly Techniques

Static disassembly techniques are commonly used to analyze malware, as they can be performed without executing the code. Here are some of the most common static disassembly techniques:

Manual Disassembly

Manual disassembly is a crucial skill in malware analysis, especially when dealing with complex and obfuscated code that automated tools cannot handle. Manual disassembly involves translating machine code into assembly language instructions manually, allowing an analyst to understand the underlying functionality of the code.

To perform manual disassembly, an analyst must have a solid understanding of assembly language and the architecture of the target system. The analyst must also be familiar with the various instructions used by the processor, the function calling conventions, and the structure of the executable file format.

There are several steps involved in manual disassembly, which we will discuss in detail below:

Identify the entry point

The entry point is the first instruction executed when the program is loaded into memory. To identify the entry point, the analyst must examine the executable file’s header and locate the program’s entry point address. Once the entry point is identified, the analyst can start tracing the program’s execution path. For example, if we look at the output of the “objdump” tool, we can see the entry point of a binary file:

$ objdump -f malware.bin

malware.bin:     file format elf64-x86-64
architecture: i386:x86-64, flags 0x00000112:
EXEC_P, HAS_SYMS, D_PAGED
start address 0x400440

From the output, we can see that the entry point of the malware is at the address 0x400440.

Trace the program’s execution path

Tracing the program’s execution path involves following the flow of control from the entry point to various functions and subroutines. This process can be time-consuming and requires a deep understanding of the program’s logic and structure. As the analyst traces the execution path, they should annotate the code to keep track of the program’s state and identify key functions and system calls. For example, if we use the “objdump” tool to disassemble the binary code, we can trace the execution path of the malware:

$ objdump -d malware.bin

...

400440:       48 83 ec 08             sub    $0x8,%rsp
400444:       c7 44 24 04 00 00 00    movl   $0x0,0x4(%rsp)
40044b:       00
40044c:       48 8d 44 24 04          lea    0x4(%rsp),%rax
400451:       48 89 04 24             mov    %rax,(%rsp)
400455:       48 8d 45 f8             lea    -0x8(%rbp),%rax
400459:       be 08 00 00 00          mov    $0x8,%esi
40045e:       ba 02 00 00 00          mov    $0x2,%edx
400463:       48 89 c7                mov    %rax,%rdi
400466:       e8 c5 fe ff ff          callq  400330 <memcpy@plt>
...

From the output, we can see that the malware starts at the entry point address 0x400440 and then performs various operations, including a call to the “memcpy” function.

Identify key functions and system calls

Identifying key functions and system calls is a critical step in manual disassembly, as it can help the analyst understand the program’s functionality and behavior. Key functions and system calls include file I/O, network communication, encryption and decryption, and process injection. The analyst should examine the parameters passed to these functions and the values returned to understand their purpose.

For example, if we use the “objdump” tool to disassemble the code, we can identify key system calls used by the malware:

$ objdump -d malware.bin

...

4004a3:       bf 01 00 00 00          mov    $0x1,%edi
4004a8:       e8 93 fe ff ff          callq  400340 <close@plt>
4004ad:       48 8d 45 f8             lea    -0x8(%rbp),%rax
4004b1:       ba 08 00 00 00          mov    $0x8,%edx
4004b6:       be 01 00 00 00          mov    $0x1,%esi
4004bb:       48 89 c7                mov    %rax,%rdi
4004be:       e8 6d fe ff ff          callq  400330 <write@plt>
...

From the output, we can see that the malware uses the “close” and “write” system calls to perform file I/O operations.

Reconstruct high-level code

Once the analyst has identified the key functions and system calls, they can begin reconstructing high-level code to understand the program’s overall behavior. Reconstructing high-level code involves analyzing the code’s flow and logic and creating a pseudocode representation of the program. This process can be challenging, especially for large and complex programs, but it is essential in understanding the program’s functionality fully.

For example, let’s say we are analyzing a malware sample that encrypts files on an infected system. Using manual disassembly, we have identified the key functions used by the malware, including the encryption algorithm and the file I/O operations. We can then create a pseudocode representation of the malware’s behavior:

for each file in the target directory:
    open the file for reading
    read the contents of the file
    encrypt the contents of the file using the encryption algorithm
    close the file
    open the file for writing
    write the encrypted contents to the file
    close the file

By creating a high-level representation of the malware’s behavior, we can gain a deeper understanding of its functionality and behavior.

Automated Disassembly Tools

Automated disassembly tools are an essential component of malware analysis, as they can quickly generate an assembly code representation of the executable file. Automated disassembly tools can handle large and complex codebases, identify key system calls and functions, and highlight potential vulnerabilities and weaknesses.

There are several popular automated disassembly tools available to malware analysts, including IDA Pro, Ghidra, and Binary Ninja. These tools use a variety of techniques to disassemble code, including static analysis, dynamic analysis, and emulation.

IDA Pro

IDA Pro is one of the most popular and powerful automated disassembly tools used by malware analysts. It supports a wide range of executable file formats, including Windows PE, Linux ELF, and MacOS Mach-O. IDA Pro uses a combination of static and dynamic analysis techniques to generate an assembly code representation of the executable file.

IDA Pro’s static analysis capabilities allow analysts to quickly identify key system calls and functions, strings, and variables in the code. It also provides a range of advanced features, including cross-referencing, call graph analysis, and interactive debugging.

For example, let’s say we are analyzing a malware sample using IDA Pro. We can load the executable file into IDA Pro and generate an assembly code representation of the code:

.text:0000000000401000 ; =============== S U B R O U T I N E =======================================
.text:0000000000401000
.text:0000000000401000 ; int __cdecl main(int argc, const char **argv, const char **envp)
.text:0000000000401000 _main           proc near               ; CODE XREF: __libc_start_main+23↑p
.text:0000000000401000
.text:0000000000401000 var_50          = qword ptr [-50h]
.text:0000000000401000 var_48          = qword ptr [-48h]
.text:0000000000401000 var_40          = qword ptr [-40h]
.text:0000000000401000 var_38          = qword ptr [-38h]
.text:0000000000401000 var_30          = qword ptr [-30h]
.text:0000000000401000 var_20          = qword ptr [-20h]
.text:0000000000401000 var_18          = qword ptr [-18h]
.text:0000000000401000 var_10          = qword ptr [-10h]
.text:0000000000401000 var_8           = qword ptr [-8h]
.text:0000000000401000 argc            = dword ptr  8
.text:0000000000401000 argv            = qword ptr  10h
.text:0000000000401000 envp            = qword ptr  18h
.text:0000000000401000
.text:0000000000401000                 push    rbp
.text:0000000000401001                 mov     rbp, rsp
.text:0000000000401004                 mov     [rbp+var_8], rdi
.text:0000000000401008                 mov     [rbp+var_10], rsi
.text:000000000040100c                 mov     eax, 0
.text:0000000000401011                 pop     rbp
.text:0000000000401012                 retn
.text:0000000000401012 _main           endp

From the output, we can see that the code has an entry point at address 0x401000, and the “main” function starts at the same address. We can also see the various variables and parameters used by the function.

Ghidra

Ghidra is another popular automated disassembly tool used by malware analysts. It is an open-source reverse engineering tool that supports a wide range of executable file formats. Ghidra uses a combination of static and dynamic analysis techniques to generate an assembly code representation of the executable file.

Ghidra’s static analysis capabilities are similar to those of IDA Pro, allowing analysts to quickly identify key system calls and functions, strings, and variables in the code. It also provides advanced features, including cross-referencing, function graph analysis, and decompilation.

For example, let’s say we are analyzing a malware sample using Ghidra. We can load the executable file into Ghidra and generate an assembly code representation of the code:

entry:
undefined8 main(void)

{
  int32_t iVar1;

  iVar1 = puts("Hello, world!");
  return CONCAT71((int7)(iVar1 >> 8),1);
}

From the output, we can see that the code has an entry point called “main,” which prints the message “Hello, world!” and then returns a value.

Binary Ninja

Binary Ninja is a modern disassembly tool that supports a wide range of executable file formats. It uses a combination of static and dynamic analysis techniques to generate an assembly code representation of the executable file.

Binary Ninja’s static analysis capabilities are similar to those of IDA Pro and Ghidra, allowing analysts to quickly identify key system calls and functions, strings, and variables in the code. It also provides advanced features, including cross-referencing, function graph analysis, and automatic detection of common code patterns.

For example, let’s say we are analyzing a malware sample using Binary Ninja. We can load the executable file into Binary Ninja and generate an assembly code representation of the code:

int main(void)

{
  puts("Hello, world!");
  return 0;
}

From the output, we can see that the code has an entry point called “main,” which prints the message “Hello, world!” and then returns a value.

Decompilers

Decompilers are powerful tools that allow analysts to convert machine code back into high-level programming languages such as C, C++, or Java. Decompilers can significantly speed up the analysis process by providing a higher-level representation of the program, which can make it easier to identify vulnerabilities and weaknesses.

There are several popular decompilers available to malware analysts, including IDA Pro, Ghidra, and Hex-Rays. These tools use a variety of techniques to decompile code, including static analysis, dynamic analysis, and emulation.

IDA Pro

IDA Pro is one of the most popular and powerful decompilers used by malware analysts. It supports a wide range of executable file formats, including Windows PE, Linux ELF, and MacOS Mach-O. IDA Pro uses a combination of static and dynamic analysis techniques to decompile the code back into a high-level programming language. IDA Pro’s decompilation capabilities allow analysts to quickly identify vulnerabilities and weaknesses in the code. It also provides a range of advanced features, including cross-referencing, call graph analysis, and interactive debugging.

For example, let’s say we are analyzing a malware sample using IDA Pro. We can load the executable file into IDA Pro and generate a decompiled representation of the code:

int __cdecl main(int argc, const char **argv, const char **envp)
{
  int result; // eax

  sub_401B00("Hello, World!");
  result = 0;
  return result;
}

From the output, we can see that the code has an entry point called “main,” which calls the “sub_401B00” function to print the message “Hello, World!” and then returns a value.

Ghidra

Ghidra is another popular decompiler used by malware analysts. It is an open-source reverse engineering tool that supports a wide range of executable file formats. Ghidra uses a combination of static and dynamic analysis techniques to decompile the code back into a high-level programming language. Ghidra’s decompilation capabilities are similar to those of IDA Pro, allowing analysts to quickly identify vulnerabilities and weaknesses in the code. It also provides advanced features, including cross-referencing, function graph analysis, and decompilation.

For example, let’s say we are analyzing a malware sample using Ghidra. We can load the executable file into Ghidra and generate a decompiled representation of the code:

void __cdecl main(int argc,char **argv)

{
  char *__s1;

  __s1 = "Hello, world!";
  puts(__s1);
  return;
}

From the output, we can see that the code has an entry point called “main,” which initializes a string and then calls the “puts” function to print the message “Hello, world!”

Hex-Rays

Hex-Rays is a powerful decompiler that supports a wide range of executable file formats. It uses a combination of static and dynamic analysis techniques to decompile the code back into a high-level programming language. Hex-Rays’ decompilation capabilities are similar to those of IDA Pro and Ghidra, allowing analysts to quickly identify vulnerabilities and weaknesses in the code. It also provides advanced features, including cross-referencing, function graph analysis, and decompilation.

For example, let’s say we are analyzing a malware sample using Hex-Rays. We can load the executable file into Hex-Rays and generate a decompiled representation of the code:

int __cdecl main(int argc, const char **argv, const char **envp)
{
  sub_4014E0("Hello, World!");
  return 0;
}

From the output, we can see that the code has an entry point called “main,” which calls the “sub_4014E0” function to print the message “Hello, World!” and then returns a value.

Java Decompilers

Java decompilers are used to decompile Java class files back into Java source code. Java decompilers are used in situations where a Java program has been compiled and only the compiled bytecode is available. Some popular Java decompilers include JD-GUI, JAD, and Fernflower. For example, let’s say we are analyzing a Java malware sample using JD-GUI. We can load the Java class file into JD-GUI and generate a decompiled representation of the code:

public static void main(String[] args) throws Exception {
    String message = "Hello, World!";
    Cipher cipher = Cipher.getInstance("AES/CBC/PKCS5PADDING");
    byte[] ivBytes = new byte[16];
    Arrays.fill(ivBytes, (byte) 0);
    IvParameterSpec iv = new IvParameterSpec(ivBytes);
    SecretKeySpec key = new SecretKeySpec("mysecretpassword".getBytes("UTF-8"), "AES");
    cipher.init(Cipher.ENCRYPT_MODE, key, iv);
    byte[] encrypted = cipher.doFinal(message.getBytes());
    System.out.println(new String(encrypted));
}

From the output, we can see that the code has an entry point called “main,” which uses the Java Cipher class to encrypt the message “Hello, World!” using AES encryption with a CBC mode.

Python Decompilers

Python decompilers are used to decompile Python bytecode back into Python source code. Python decompilers are used in situations where a Python program has been compiled and only the compiled bytecode is available. Some popular Python decompilers include uncompyle6 and disassembler. For example, let’s say we are analyzing a Python malware sample using uncompyle6. We can load the Python bytecode file into uncompyle6 and generate a decompiled representation of the code:

def main():
    message = "Hello, World!"
    key = b"mysecretpassword"
    iv = b'\x00' * 16
    aes = AES.new(key, AES.MODE_CBC, iv)
    ciphertext = aes.encrypt(pad(message.encode(), AES.block_size))
    print(ciphertext)

From the output, we can see that the code has an entry point called “main,” which uses the Python Cryptography package to encrypt the message “Hello, World!” using AES encryption with a CBC mode.

Dynamic Disassembly Techniques

Dynamic disassembly techniques involve executing the malware code and analyzing it as it runs. Here are some common dynamic disassembly techniques:

Debuggers

Debuggers are powerful tools that allow analysts to step through an executable file’s code and interact with it as it is running. Debuggers can be used to identify the cause of program crashes, analyze program behavior, and identify security vulnerabilities.

There are several popular debuggers available to malware analysts, including gdb, OllyDbg, and WinDbg. These tools use a variety of techniques to debug code, including setting breakpoints, stepping through code, and examining memory.

gdb

gdb is a powerful debugger that is commonly used in Linux environments. It supports a wide range of executable file formats, including Windows PE, Linux ELF, and MacOS Mach-O. gdb can be used to set breakpoints, examine memory, and step through code. For example, let’s say we are analyzing a Linux malware sample using gdb. We can load the executable file into gdb and set a breakpoint at the entry point of the program:

(gdb) file malware
Reading symbols from malware...
(gdb) break main
Breakpoint 1 at 0x40113c
(gdb) run
Starting program: /home/user/malware

Breakpoint 1, 0x000000000040113c in main ()

From the output, we can see that gdb has successfully loaded the executable file and set a breakpoint at the entry point of the program. We can now use gdb to step through the code and examine memory.

OllyDbg

OllyDbg is a powerful debugger that is commonly used in Windows environments. It supports a wide range of executable file formats, including Windows PE and COFF. OllyDbg can be used to set breakpoints, examine memory, and step through code. For example, let’s say we are analyzing a Windows malware sample using OllyDbg. We can load the executable file into OllyDbg and set a breakpoint at the entry point of the program:

File > Open
Select malware.exe
Press F2 to open the breakpoint window
Right click on the empty area of the window
Click on “New breakpoint”
Enter the address of the entry point of the program
Click on “OK”
Press F9 to run the program

From the output, we can see that OllyDbg has successfully loaded the executable file and set a breakpoint at the entry point of the program. We can now use OllyDbg to step through the code and examine memory.

WinDbg WinDbg is a powerful debugger that is commonly used in Windows environments. It supports a wide range of executable file formats, including Windows PE and COFF. WinDbg can be used to set breakpoints, examine memory, and step through code. For example, let’s say we are analyzing a Windows malware sample using WinDbg. We can load the executable file into WinDbg and set a breakpoint at the entry point of the program:

File > Open executable
Select malware.exe
Type “bp main” to set a breakpoint at the entry point of the program
Type “g” to run the program
From the output, we can see that WinDbg has successfully loaded the executable file and set a breakpoint at the entry point of the program.

We can now use WinDbg to step through the code and examine memory.

Dynamic Binary Instrumentation (DBI)

Dynamic Binary Instrumentation (DBI) is a powerful technique used by malware analysts to observe and manipulate the behavior of a running program. DBI allows analysts to monitor system calls, API calls, and other events in real-time and analyze the program’s behavior. DBI can be used to identify malware behavior, detect malicious activity, and develop countermeasures.

There are several popular DBI frameworks available to malware analysts, including PIN, DynamoRIO, and Frida. These tools use a variety of techniques to dynamically instrument the code, including binary rewriting, dynamic code generation, and just-in-time compilation.

PIN is a powerful DBI framework developed by Intel. It supports a wide range of executable file formats, including Windows PE, Linux ELF, and MacOS Mach-O. PIN can be used to instrument the code, monitor system calls, and intercept API calls. For example, let’s say we are analyzing a Linux malware sample using PIN. We can load the executable file into PIN and instrument the code to monitor system calls:

$ pin -t /path/to/pin/source/tools/MyPinTool/obj-intel64/MyPinTool.so -- /path/to/malware

From the output, we can see that PIN has successfully loaded the MyPinTool and instrumented the code. We can now use PIN to monitor system calls and intercept API calls in real-time.

DynamoRIO

DynamoRIO is a powerful DBI framework developed by Microsoft Research. It supports a wide range of executable file formats, including Windows PE, Linux ELF, and MacOS Mach-O. DynamoRIO can be used to instrument the code, monitor system calls, and intercept API calls. For example, let’s say we are analyzing a Windows malware sample using DynamoRIO. We can load the executable file into DynamoRIO and instrument the code to monitor system calls:

> drrun -c /path/to/dynamorio/samples/simple/instr.dll -- malware.exe

From the output, we can see that DynamoRIO has successfully loaded the instr.dll and instrumented the code. We can now use DynamoRIO to monitor system calls and intercept API calls in real-time.

Frida

Frida is a powerful DBI framework developed by the security company NowSecure. It supports a wide range of platforms, including Windows, Linux, Android, and iOS. Frida can be used to instrument the code, monitor system calls, and intercept API calls. For example, let’s say we are analyzing an Android malware sample using Frida. We can load the APK file into Frida and instrument the code to monitor system calls:

import frida

def on_message(message, data):
    print(message)

process = frida.get_usb_device().attach('com.example.malware')
script = process.create_script("""
Interceptor.attach(Module.findExportByName(null, "system"), {
    onEnter: function(args) {
        console.log("[*] system(" + args[0].readUtf8String() + ")");
    }
});
""")
script.on('message', on_message)
script.load()

From the output, we can see that Frida has successfully loaded the script and instrumented the code. We can now use Frida to monitor system calls and intercept API calls in real-time.

Virtual Machines (VMs)

VMs are programs that can emulate a complete computer system, including hardware and software. VMs can be useful in analyzing malware in a safe and controlled environment, as they can isolate the malware from the host system. Some popular VMs for malware analysis include VirtualBox, VMware, and QEMU.

Examples

Let’s look at some real-world examples of how disassembly techniques can be used in malware analysis.

Stuxnet Worm

The Stuxnet worm was a sophisticated piece of malware that targeted industrial control systems. It used multiple zero-day exploits and was designed to attack specific hardware configurations.

To analyze Stuxnet, malware analysts used a combination of automated and manual disassembly techniques. They discovered that the malware was packed and used multiple layers of obfuscation to make it difficult to analyze.

Using IDA Pro, analysts were able to identify the worm’s main functions and analyze its behavior. They discovered that Stuxnet used a zero-day exploit to spread via USB drives and network shares, and then used additional exploits to attack specific hardware configurations.

Debugging the malware with WinDbg allowed analysts to monitor its behavior in real-time and identify key system calls used by the malware. They discovered that Stuxnet used a custom protocol to communicate with its command and control servers and that it was capable of modifying the code on infected systems to avoid detection.

WannaCry Ransomware

WannaCry was a global ransomware attack that affected hundreds of thousands of computers in May 2017. It exploited a vulnerability in the Windows operating system to spread via network shares and encrypt files on infected systems. To analyze WannaCry, malware analysts used a combination of automated and manual disassembly techniques. They discovered that the malware was packed and used multiple layers of obfuscation to avoid detection.

Using IDA Pro, analysts were able to identify the malware’s encryption algorithm and key system calls. They discovered that WannaCry used the NSA’s EternalBlue exploit to spread via network shares and that it communicated with its command and control servers using the Tor network.

Debugging the malware with GDB allowed analysts to monitor its behavior in real-time and identify specific function calls. They discovered that WannaCry was capable of scanning for and infecting vulnerable systems in a matter of seconds.

Conclusion

Disassembly techniques are essential in malware analysis, as they allow analysts to understand the functionality of malicious code and identify key system calls and functions used by the malware. By using a combination of automated and manual disassembly techniques, analysts can uncover the behavior of complex and sophisticated malware such as Stuxnet and WannaCry.

Whether you are a red teamer, pen tester, or malware analyst, understanding disassembly techniques is crucial in analyzing and understanding the behavior of malware. By using tools such as IDA Pro, Ghidra, WinDbg, and GDB, you can become a more effective and efficient malware analyst, and help protect against the ever-evolving threat of malware.

What is Disassembly?#

Static Disassembly Techniques#

Manual Disassembly#

Identify the entry point#

Trace the program’s execution path#

Identify key functions and system calls#

Reconstruct high-level code#

Automated Disassembly Tools#

IDA Pro#

Ghidra#

Binary Ninja#

Decompilers#

IDA Pro#

Ghidra#

Hex-Rays#

Java Decompilers#

Python Decompilers#

Dynamic Disassembly Techniques#

Debuggers#

gdb#

OllyDbg#

Dynamic Binary Instrumentation (DBI)#

PIN#

DynamoRIO#

Frida#

Virtual Machines (VMs)#

Examples#

Stuxnet Worm#

WannaCry Ransomware#

Conclusion#

What is Disassembly?

Static Disassembly Techniques

Manual Disassembly

Identify the entry point

Trace the program’s execution path

Identify key functions and system calls

Reconstruct high-level code

Automated Disassembly Tools

IDA Pro

Ghidra

Binary Ninja

Decompilers

IDA Pro

Ghidra

Hex-Rays

Java Decompilers

Python Decompilers

Dynamic Disassembly Techniques

Debuggers

gdb

OllyDbg

Dynamic Binary Instrumentation (DBI)

PIN

DynamoRIO

Frida

Virtual Machines (VMs)

Examples

Stuxnet Worm

WannaCry Ransomware

Conclusion