Analyzing Packed Binaries
Packing is a technique used by software developers to reduce the size of executables, obfuscate machine code with the intention of protecting intellectual property, among other reasons. These are just some of the legitimate uses for implementing this technique on executables and other similar binary files. Malware developers also utilize this technique to prevent their malicious executables from being easily detected and to make it more difficult for analysis.
The process of packing an executable depends on the intended goal, which could be compression, obfuscation, encryption, or a combination of these techniques. Regardless of the extent of modifications made, the executable's structure is altered to incorporate the essential code for unpacking the machine code in memory during program execution on the user's system. Additionally, it includes the necessary data for the program.
There are both open-source and commercial packers available. These tools often insert strings or name the sections containing the unpacking sequence in a specific manner, which can be used to identify the packer that was used. However, some malware developers may devise their own packing techniques or tools. They may also alter section names or remove the strings added by well-known packers to make it even more challenging for malware analysts to determine which packer was employed.
A packing process typically involves several steps, which can vary depending on the specific tools and the purpose of implementation. The general process encompasses the following stages:
- Compression: The original executable undergoes compression using an algorithm to reduce its size.
- Encryption: An additional layer of protection is applied using a specific key or algorithm.
- Header modification: Changes are made to reflect the alterations and ensure the executable remains valid. This often includes markers for identifying the packer used.
- Runtime Decryption/Decompression: Code necessary for extracting the original executable into memory is added to the new file, serving as the new entry point for the program.
- Anti-analysis measures: Some packing tools may employ techniques to deter analysis, debugging, or emulation, aiming to hinder the examination of the original code.
- Execution flow control: Essential instructions are incorporated to manage the execution flow, ensuring seamless execution of the original code.
Additionally, some packing tools may include extra code to dynamically alter behavior during execution, making it more challenging for an analyst to discern the actions taken by the malware.
Detecting Packing
One important aspect is to determine whether a sample being analyzed is packed or not, since this determines the approach to the analysis of the binary file. While most analysis tools can identify whether the binary is packed and which packing program was used, it remains beneficial to understand how to analyze such files.
Tools capable of detecting packed binaries often rely on signatures, which are patterns or identifying strings inserted by packers and are widely recognized. The following Yara rule can be used to identify binaries packed with RLPack.
rule rlpack {
meta:
description = "RLPack packed file"
block = false
quarantine = false
strings:
$mz = "MZ"
$text1 = ".packed\x00"
$text2 = ".RLPack\x00"
condition:
$mz at 0 and $text1 in (0..1024) and $text2 in (0..1024)
}
Signatures can be created based on various elements, including section names, header values, structure, and the Import Address Table (IAT) of the binary generated by the packer software. Packers often introduce sections with specific names and characteristics. While some packers only display sections specific to the packer itself, others add the necessary sections to the list of sections that exist in the original binary. While this can assist in identifying the tool used, developers can also change the names of these sections without modifying the packer's code or the resulting binary. Despite the possibility of section name changes, the respective sections and other characteristics remain the same, and this information is used to determine which packer was employed.
Similarly, the header can be analyzed to determine the packer used. While the header's structure may vary depending on the original binary, packers often introduce values or set the header to a specific structure. The section headers specify both the size of data as it occupies on disk and its size when loaded into memory. A significant difference between these two values can indicate that the data is uncompressed or decrypted in memory. For instance, the UPX packer stores all data in one section, which has a size of zero on disk. However, when loaded into memory, the section expands to a larger size, encompassing the entire program.
The Import Address Table (IAT) is a structure within the PE header that specifies the list of functions or procedures imported from dynamic link libraries (DLLs). This allows the executable to locate and access the required functions at runtime. ELF binaries use the Procedure Linkage Table (PLT) and Global Offset Table (GOT) to achieve a similar purpose as the IAT in PE files. These structures differ in design and function, but they share the goal of importing functions or procedures from external libraries. Packers have specific functions that they call, which they may do by directly linking to external libraries or by dynamically resolving API calls. As a result, the import tables will have a specific set of functions or procedures listed or may have an incomplete list.
While the aspects mentioned above are considered during static analysis, dynamic analysis is also employed to assess the packing techniques in use. This involves using a debugger and disassembly to identify the instructions and flow of the binary. Dynamic analysis is particularly useful for extracting unpacked code when it resides in memory. It's important to note that the extracted program will require additional steps to become a valid executable on disk for further analysis and won't exactly match the original version of the binary file.
Let's examine an example binary that has been packed using UPX and ASPack to better understand the differences. This example is a Windows PE program that, when executed, displays a message box and performs no other actions.
The source code of this program is only the lines shown below, any additional functions and data is added by the compiler. This program was compiled in Microsoft Visual Studio 2022.
#include <Windows.h>
int WINAPI WinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance, LPSTR lpCmdLine, int nCmdShow) {
MessageBox(NULL, L"Hello there! - General Kenobi", L"Star Wars Quote", MB_OK | MB_ICONINFORMATION);
return 0;
}
For the analysis phase, Radare2 is used, but any other tool can be used to achieve the same results. The original binary file is initially examined to establish a baseline. The sections are displayed using the iS
command, the entropy of each section is displayed by adding the parameter entropy
to the command.
[Sections]
nth paddr size vaddr vsize perm entropy type name
――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
0 0x00000400 0xe00 0x00401000 0x1000 -r-x 5.92845120 ---- .text
1 0x00001200 0xc00 0x00402000 0x1000 -r-- 4.55813641 ---- .rdata
2 0x00001e00 0x200 0x00403000 0x1000 -rw- 0.28040117 ---- .data
3 0x00002000 0x200 0x00404000 0x1000 -r-- 4.70150326 ---- .rsrc
4 0x00002200 0x200 0x00405000 0x1000 -r-- 4.86675397 ---- .reloc
There are several sections that are listed, the size
refers to the amount of space consumed by the section on disk, while the vsize
refers to the amount of space that is allocated in memory for the program. The difference between these 2 sizes is small, with the sizes ranging from 512 bytes to 3.5 Kilobytes for the size on disk, while the memory allocated for each of the sections is for 4 Kilobytes. This is not a big enough difference to point to packing being used.
The entropy can be used to determine the level of randomness that exists in the section, these values will be compared later on when the packed samples are analyzed.
The linked libraries are listed using the command il
, for this program there are only 8 DLLs that are linked and none are suspicious
[Linked libraries]
user32.dll
vcruntime140.dll
api-ms-win-crt-runtime-l1-1-0.dll
api-ms-win-crt-math-l1-1-0.dll
api-ms-win-crt-stdio-l1-1-0.dll
api-ms-win-crt-locale-l1-1-0.dll
api-ms-win-crt-heap-l1-1-0.dll
kernel32.dll
8 libraries
The functions that are imported from these libraries can be listed with the ii
command. The only function that was specified in the source code used for this program is MessageBoxW
, the other functions are listed because they are used by other parts of the program that get added by the compiler and aren't necessarily used.
[Imports]
nth vaddr bind type lib name
―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
1 0x00402038 NONE FUNC USER32.dll MessageBoxW
1 0x00402040 NONE FUNC VCRUNTIME140.dll __current_exception
2 0x00402044 NONE FUNC VCRUNTIME140.dll _except_handler4_common
3 0x00402048 NONE FUNC VCRUNTIME140.dll __current_exception_context
4 0x0040204c NONE FUNC VCRUNTIME140.dll memset
1 0x0040206c NONE FUNC api-ms-win-crt-runtime-l1-1-0.dll _crt_atexit
2 0x00402070 NONE FUNC api-ms-win-crt-runtime-l1-1-0.dll _cexit
3 0x00402074 NONE FUNC api-ms-win-crt-runtime-l1-1-0.dll terminate
4 0x00402078 NONE FUNC api-ms-win-crt-runtime-l1-1-0.dll _register_onexit_function
5 0x0040207c NONE FUNC api-ms-win-crt-runtime-l1-1-0.dll _register_thread_local_exe_atexit_callback
6 0x00402080 NONE FUNC api-ms-win-crt-runtime-l1-1-0.dll _exit
7 0x00402084 NONE FUNC api-ms-win-crt-runtime-l1-1-0.dll _initterm
8 0x00402088 NONE FUNC api-ms-win-crt-runtime-l1-1-0.dll _get_narrow_winmain_command_line
9 0x0040208c NONE FUNC api-ms-win-crt-runtime-l1-1-0.dll _initialize_narrow_environment
10 0x00402090 NONE FUNC api-ms-win-crt-runtime-l1-1-0.dll _configure_narrow_argv
11 0x00402094 NONE FUNC api-ms-win-crt-runtime-l1-1-0.dll _c_exit
12 0x00402098 NONE FUNC api-ms-win-crt-runtime-l1-1-0.dll _set_app_type
13 0x0040209c NONE FUNC api-ms-win-crt-runtime-l1-1-0.dll _seh_filter_exe
14 0x004020a0 NONE FUNC api-ms-win-crt-runtime-l1-1-0.dll _initialize_onexit_table
15 0x004020a4 NONE FUNC api-ms-win-crt-runtime-l1-1-0.dll exit
16 0x004020a8 NONE FUNC api-ms-win-crt-runtime-l1-1-0.dll _controlfp_s
17 0x004020ac NONE FUNC api-ms-win-crt-runtime-l1-1-0.dll _initterm_e
1 0x00402064 NONE FUNC api-ms-win-crt-math-l1-1-0.dll __setusermatherr
1 0x004020b4 NONE FUNC api-ms-win-crt-stdio-l1-1-0.dll _set_fmode
2 0x004020b8 NONE FUNC api-ms-win-crt-stdio-l1-1-0.dll __p__commode
1 0x0040205c NONE FUNC api-ms-win-crt-locale-l1-1-0.dll _configthreadlocale
1 0x00402054 NONE FUNC api-ms-win-crt-heap-l1-1-0.dll _set_new_mode
1 0x00402000 NONE FUNC KERNEL32.dll GetCurrentProcessId
2 0x00402004 NONE FUNC KERNEL32.dll GetModuleHandleW
3 0x00402008 NONE FUNC KERNEL32.dll GetStartupInfoW
4 0x0040200c NONE FUNC KERNEL32.dll IsDebuggerPresent
5 0x00402010 NONE FUNC KERNEL32.dll InitializeSListHead
6 0x00402014 NONE FUNC KERNEL32.dll GetSystemTimeAsFileTime
7 0x00402018 NONE FUNC KERNEL32.dll GetCurrentThreadId
8 0x0040201c NONE FUNC KERNEL32.dll UnhandledExceptionFilter
9 0x00402020 NONE FUNC KERNEL32.dll QueryPerformanceCounter
10 0x00402024 NONE FUNC KERNEL32.dll IsProcessorFeaturePresent
11 0x00402028 NONE FUNC KERNEL32.dll TerminateProcess
12 0x0040202c NONE FUNC KERNEL32.dll GetCurrentProcess
13 0x00402030 NONE FUNC KERNEL32.dll SetUnhandledExceptionFilter
Looking at the list of functions, there are some that may generate some concern and are often used by malware for different reasons. One of these functions is IsDebuggerPresent
, which can be used by malware as an anti-analysis technique. The command axt
can be used to check where the function is being called, the address of the function is used as the argument for the command.
[0x00401268]> axt 0x0040200c
fcn.004016fe 0x4017d6 [CALL:--x] call dword [sym.imp.KERNEL32.dll_IsDebuggerPresent]
If this sample was a potential malware, then it would be necessary to determine what makes the call to this function, however, this is outside of the scope of this blog post.
Loading the sample that is packed with UPX into Radare2 and displaying the sections that the PE binary has shows a very different story.
[Sections]
nth paddr size vaddr vsize perm entropy type name
―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
0 0x00000400 0x0 0x00401000 0x6000 -rwx ---- UPX0
1 0x00000400 0x1200 0x00407000 0x2000 -rwx 7.32773884 ---- UPX1
2 0x00001600 0x600 0x00409000 0x1000 -rw- 3.88507214 ---- .rsrc
This program consists of only three sections. Two of these sections, namely UPX0
and UPX1
, immediately indicate the packer used.
The UPX0
section has a size of 0 bytes on disk but allocates 24 Kilobytes when loaded into memory. This significant difference raises concerns. Furthermore, this section possesses read, write, and execute permissions, allowing it to be altered during runtime.
The UPX1
section exhibits high entropy compared to the unpacked version, which is unusual. Additionally, it has all permissions, which is not a typical configuration.
The libraries that are linked to the executable are the same as the unpacked version.
[Linked libraries]
api-ms-win-crt-heap-l1-1-0.dll
api-ms-win-crt-locale-l1-1-0.dll
api-ms-win-crt-math-l1-1-0.dll
api-ms-win-crt-runtime-l1-1-0.dll
api-ms-win-crt-stdio-l1-1-0.dll
kernel32.dll
user32.dll
vcruntime140.dll
8 libraries
However, the difference is significant when listing the imported functions, with the list being shorter and showing the calls to the LoadLibraryA
, GetProcAddress
, and VirtualProtect
being used in this version but not in the unpacked version.
[Imports]
nth vaddr bind type lib name
―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
1 0x00409290 NONE FUNC api-ms-win-crt-heap-l1-1-0.dll _set_new_mode
1 0x00409298 NONE FUNC api-ms-win-crt-locale-l1-1-0.dll _configthreadlocale
1 0x004092a0 NONE FUNC api-ms-win-crt-math-l1-1-0.dll __setusermatherr
1 0x004092a8 NONE FUNC api-ms-win-crt-runtime-l1-1-0.dll exit
1 0x004092b0 NONE FUNC api-ms-win-crt-stdio-l1-1-0.dll _set_fmode
1 0x004092b8 NONE FUNC KERNEL32.DLL LoadLibraryA
2 0x004092bc NONE FUNC KERNEL32.DLL ExitProcess
3 0x004092c0 NONE FUNC KERNEL32.DLL GetProcAddress
4 0x004092c4 NONE FUNC KERNEL32.DLL VirtualProtect
1 0x004092cc NONE FUNC USER32.dll MessageBoxW
1 0x004092d4 NONE FUNC VCRUNTIME140.dll memset
Checking for references to these functions show that none exist, meaning that they are most likely being dynamically loaded
[0x00407e70]> axt 0x004092b8
[0x00407e70]> axt 0x004092c0
[0x00407e70]> axt 0x004092c4
[0x00407e70]> axt 0x004092cc
[0x00407e70]>
The functions that exist within the binary are listed using the afl
command, in the packed version it only shows two functions. Despite the program only having one function in the source code, it contains a lot more functions that the packed version does.
0x00407e70 51 439 entry0
0x004071bb 3 37 fcn.004071bb
Checking for where the function named fcn.004071bb
is called shows that there is no reference. The entry0
function has several call instructions listed, which isn't uncommon, but can point to areas to analyze further when debugging.
[0x00407e70]> axt 0x004071bb
[0x00407e70]> pdf @ entry0 ~ call
│ │╎╎ 0x00407f8a ff96b8820000 call dword [esi + 0x82b8]
│ ╎│ ╎ 0x00407f9f ff96c0820000 call dword [esi + 0x82c0]
│ │└──> 0x00407fb0 ff96bc820000 call dword [esi + 0x82bc]
│ ╎ 0x00407ffe ffd5 call ebp
│ ╎ 0x00408013 ffd5 call ebp
Finally, analyzing the sample that was packed using ASPack, it shows a very different section table. It contains the same sections as the original version of the sample, but it adds the .aspack
and .adata
sections, and there is a higher level of entropy for those sections that match the original.
[Sections]
nth paddr size vaddr vsize perm entropy type name
―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
0 0x00000400 0xa00 0x00401000 0x1000 -rwx 7.31497939 ---- .text
1 0x00000e00 0x600 0x00402000 0x1000 -rw- 7.35641437 ---- .rdata
2 0x00001400 0x200 0x00403000 0x1000 -rw- 0.74297410 ---- .data
3 0x00001600 0x200 0x00404000 0x1000 -rw- 0.66525059 ---- .rsrc
4 0x00001800 0x200 0x00405000 0x1000 -rw- 4.86675397 ---- .reloc
5 0x00001a00 0x1400 0x00406000 0x2000 -rwx 6.17277319 ---- .aspack
6 0x00002e00 0x0 0x00408000 0x1000 -rwx ---- .adata
The libraries that are linked remain as the same 8 that the original version references.
[Linked libraries]
kernel32.dll
user32.dll
vcruntime140.dll
api-ms-win-crt-runtime-l1-1-0.dll
api-ms-win-crt-math-l1-1-0.dll
api-ms-win-crt-stdio-l1-1-0.dll
api-ms-win-crt-locale-l1-1-0.dll
api-ms-win-crt-heap-l1-1-0.dll
8 libraries
However, checking the functions that are imported is more in line with the UPX packed version, with the difference being that it doesn't reference the ExitProcess
and VirtualProtect
functions.
[Imports]
nth vaddr bind type lib name
―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
1 0x00406fd0 NONE FUNC kernel32.dll GetProcAddress
2 0x00406fd4 NONE FUNC kernel32.dll GetModuleHandleA
3 0x00406fd8 NONE FUNC kernel32.dll LoadLibraryA
1 0x00407191 NONE FUNC user32.dll MessageBoxW
1 0x00407199 NONE FUNC vcruntime140.dll __current_exception
1 0x004071a1 NONE FUNC api-ms-win-crt-runtime-l1-1-0.dll _crt_atexit
1 0x004071a9 NONE FUNC api-ms-win-crt-math-l1-1-0.dll __setusermatherr
1 0x004071b1 NONE FUNC api-ms-win-crt-stdio-l1-1-0.dll _set_fmode
1 0x004071b9 NONE FUNC api-ms-win-crt-locale-l1-1-0.dll _configthreadlocale
1 0x004071c1 NONE FUNC api-ms-win-crt-heap-l1-1-0.dll _set_new_mode
Checking the list of functions, it does appear to contain more functions than the UPX packed version
0x00406001 1 11 entry0
0x00406014 9 118 fcn.00406014
0x004066e0 5 106 fcn.004066e0
0x00406a9c 3 126 fcn.00406a9c
0x00406d0a 1 14 fcn.00406d0a
0x00406827 1 37 fcn.00406827
0x00406b1a 1 97 fcn.00406b1a
0x00406b7b 33 399 fcn.00406b7b
0x00406d18 29 664 fcn.00406d18
0x004067bc 5 107 fcn.004067bc
0x0040684c 15 380 fcn.0040684c
0x004069c8 14 212 fcn.004069c8
Based on this static analysis, which allows for the comparison of two packing tools, it becomes evident that the UPX packer provides several indicators that make detecting a packed binary relatively easier. These indicators are not necessarily easy to obfuscate. On the other hand, the ASPacker presents a somewhat simpler situation by adding only two sections that could potentially be renamed in an attempt to avoid detection. It's important to note that this analysis didn't include all aspects of the header that can reveal the packed nature of binaries. However, having the original binary for comparison greatly aids in understanding the changes that occur, although this may not always be the case when dealing with samples found in the wild.
In an upcoming post, I will explore the process of extracting unpacked data through dynamic analysis using a debugger. Dynamic analysis is a powerful technique that allows observation of how a packed binary behaves when executed, providing insights into its inner workings and uncovering potential security threats. Debugging tools enable stepping through the code, analyzing memory structures, and gaining a deeper understanding of the program's runtime behavior.
As this discussion on packing techniques and analysis concludes, it becomes evident that understanding the intricacies of packing is essential for both cybersecurity professionals and malware analysts. Whether detecting the telltale signs of packing or dynamically analyzing a binary, these skills are invaluable in the ever-evolving landscape of cybersecurity.