Analyzing Packed Binaries

Packing is a technique used by software developers to reduce the size of executables, obfuscate machine code with the intention of protecting intellectual property, among other reasons. These are just some of the legitimate uses for implementing this technique on executables and other similar binary files. Malware developers also utilize this technique to prevent their malicious executables from being easily detected and to make it more difficult for analysis.

The process of packing an executable depends on the intended goal, which could be compression, obfuscation, encryption, or a combination of these techniques. Regardless of the extent of modifications made, the executable's structure is altered to incorporate the essential code for unpacking the machine code in memory during program execution on the user's system. Additionally, it includes the necessary data for the program.

There are both open-source and commercial packers available. These tools often insert strings or name the sections containing the unpacking sequence in a specific manner, which can be used to identify the packer that was used. However, some malware developers may devise their own packing techniques or tools. They may also alter section names or remove the strings added by well-known packers to make it even more challenging for malware analysts to determine which packer was employed.

A packing process typically involves several steps, which can vary depending on the specific tools and the purpose of implementation. The general process encompasses the following stages:

Additionally, some packing tools may include extra code to dynamically alter behavior during execution, making it more challenging for an analyst to discern the actions taken by the malware.

Detecting Packing

One important aspect is to determine whether a sample being analyzed is packed or not, since this determines the approach to the analysis of the binary file. While most analysis tools can identify whether the binary is packed and which packing program was used, it remains beneficial to understand how to analyze such files.

Tools capable of detecting packed binaries often rely on signatures, which are patterns or identifying strings inserted by packers and are widely recognized. The following Yara rule can be used to identify binaries packed with RLPack.

rule rlpack {
    meta:
        description = "RLPack packed file"
        block = false
        quarantine = false

    strings:
        $mz = "MZ"
        $text1 = ".packed\x00"
        $text2 = ".RLPack\x00"

    condition:
        $mz at 0 and $text1 in (0..1024) and $text2 in (0..1024)
}

Signatures can be created based on various elements, including section names, header values, structure, and the Import Address Table (IAT) of the binary generated by the packer software. Packers often introduce sections with specific names and characteristics. While some packers only display sections specific to the packer itself, others add the necessary sections to the list of sections that exist in the original binary. While this can assist in identifying the tool used, developers can also change the names of these sections without modifying the packer's code or the resulting binary. Despite the possibility of section name changes, the respective sections and other characteristics remain the same, and this information is used to determine which packer was employed.

Similarly, the header can be analyzed to determine the packer used. While the header's structure may vary depending on the original binary, packers often introduce values or set the header to a specific structure. The section headers specify both the size of data as it occupies on disk and its size when loaded into memory. A significant difference between these two values can indicate that the data is uncompressed or decrypted in memory. For instance, the UPX packer stores all data in one section, which has a size of zero on disk. However, when loaded into memory, the section expands to a larger size, encompassing the entire program.

The Import Address Table (IAT) is a structure within the PE header that specifies the list of functions or procedures imported from dynamic link libraries (DLLs). This allows the executable to locate and access the required functions at runtime. ELF binaries use the Procedure Linkage Table (PLT) and Global Offset Table (GOT) to achieve a similar purpose as the IAT in PE files. These structures differ in design and function, but they share the goal of importing functions or procedures from external libraries. Packers have specific functions that they call, which they may do by directly linking to external libraries or by dynamically resolving API calls. As a result, the import tables will have a specific set of functions or procedures listed or may have an incomplete list.

While the aspects mentioned above are considered during static analysis, dynamic analysis is also employed to assess the packing techniques in use. This involves using a debugger and disassembly to identify the instructions and flow of the binary. Dynamic analysis is particularly useful for extracting unpacked code when it resides in memory. It's important to note that the extracted program will require additional steps to become a valid executable on disk for further analysis and won't exactly match the original version of the binary file.

Let's examine an example binary that has been packed using UPX and ASPack to better understand the differences. This example is a Windows PE program that, when executed, displays a message box and performs no other actions.

The source code of this program is only the lines shown below, any additional functions and data is added by the compiler. This program was compiled in Microsoft Visual Studio 2022.

#include <Windows.h>

int WINAPI WinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance, LPSTR lpCmdLine, int nCmdShow) {
    MessageBox(NULL, L"Hello there! - General Kenobi", L"Star Wars Quote", MB_OK | MB_ICONINFORMATION);
    return 0;
}

For the analysis phase, Radare2 is used, but any other tool can be used to achieve the same results. The original binary file is initially examined to establish a baseline. The sections are displayed using the iS command, the entropy of each section is displayed by adding the parameter entropy to the command.

[Sections]

nth paddr        size vaddr        vsize perm entropy    type name
――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
0   0x00000400  0xe00 0x00401000  0x1000 -r-x 5.92845120 ---- .text
1   0x00001200  0xc00 0x00402000  0x1000 -r-- 4.55813641 ---- .rdata
2   0x00001e00  0x200 0x00403000  0x1000 -rw- 0.28040117 ---- .data
3   0x00002000  0x200 0x00404000  0x1000 -r-- 4.70150326 ---- .rsrc
4   0x00002200  0x200 0x00405000  0x1000 -r-- 4.86675397 ---- .reloc

There are several sections that are listed, the size refers to the amount of space consumed by the section on disk, while the vsize refers to the amount of space that is allocated in memory for the program. The difference between these 2 sizes is small, with the sizes ranging from 512 bytes to 3.5 Kilobytes for the size on disk, while the memory allocated for each of the sections is for 4 Kilobytes. This is not a big enough difference to point to packing being used.

The entropy can be used to determine the level of randomness that exists in the section, these values will be compared later on when the packed samples are analyzed.

The linked libraries are listed using the command il, for this program there are only 8 DLLs that are linked and none are suspicious

[Linked libraries]
user32.dll
vcruntime140.dll
api-ms-win-crt-runtime-l1-1-0.dll
api-ms-win-crt-math-l1-1-0.dll
api-ms-win-crt-stdio-l1-1-0.dll
api-ms-win-crt-locale-l1-1-0.dll
api-ms-win-crt-heap-l1-1-0.dll
kernel32.dll

8 libraries

The functions that are imported from these libraries can be listed with the ii command. The only function that was specified in the source code used for this program is MessageBoxW, the other functions are listed because they are used by other parts of the program that get added by the compiler and aren't necessarily used.

[Imports]
nth vaddr      bind type lib                               name
―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
1   0x00402038 NONE FUNC USER32.dll                        MessageBoxW
1   0x00402040 NONE FUNC VCRUNTIME140.dll                  __current_exception
2   0x00402044 NONE FUNC VCRUNTIME140.dll                  _except_handler4_common
3   0x00402048 NONE FUNC VCRUNTIME140.dll                  __current_exception_context
4   0x0040204c NONE FUNC VCRUNTIME140.dll                  memset
1   0x0040206c NONE FUNC api-ms-win-crt-runtime-l1-1-0.dll _crt_atexit
2   0x00402070 NONE FUNC api-ms-win-crt-runtime-l1-1-0.dll _cexit
3   0x00402074 NONE FUNC api-ms-win-crt-runtime-l1-1-0.dll terminate
4   0x00402078 NONE FUNC api-ms-win-crt-runtime-l1-1-0.dll _register_onexit_function
5   0x0040207c NONE FUNC api-ms-win-crt-runtime-l1-1-0.dll _register_thread_local_exe_atexit_callback
6   0x00402080 NONE FUNC api-ms-win-crt-runtime-l1-1-0.dll _exit
7   0x00402084 NONE FUNC api-ms-win-crt-runtime-l1-1-0.dll _initterm
8   0x00402088 NONE FUNC api-ms-win-crt-runtime-l1-1-0.dll _get_narrow_winmain_command_line
9   0x0040208c NONE FUNC api-ms-win-crt-runtime-l1-1-0.dll _initialize_narrow_environment
10  0x00402090 NONE FUNC api-ms-win-crt-runtime-l1-1-0.dll _configure_narrow_argv
11  0x00402094 NONE FUNC api-ms-win-crt-runtime-l1-1-0.dll _c_exit
12  0x00402098 NONE FUNC api-ms-win-crt-runtime-l1-1-0.dll _set_app_type
13  0x0040209c NONE FUNC api-ms-win-crt-runtime-l1-1-0.dll _seh_filter_exe
14  0x004020a0 NONE FUNC api-ms-win-crt-runtime-l1-1-0.dll _initialize_onexit_table
15  0x004020a4 NONE FUNC api-ms-win-crt-runtime-l1-1-0.dll exit
16  0x004020a8 NONE FUNC api-ms-win-crt-runtime-l1-1-0.dll _controlfp_s
17  0x004020ac NONE FUNC api-ms-win-crt-runtime-l1-1-0.dll _initterm_e
1   0x00402064 NONE FUNC api-ms-win-crt-math-l1-1-0.dll    __setusermatherr
1   0x004020b4 NONE FUNC api-ms-win-crt-stdio-l1-1-0.dll   _set_fmode
2   0x004020b8 NONE FUNC api-ms-win-crt-stdio-l1-1-0.dll   __p__commode
1   0x0040205c NONE FUNC api-ms-win-crt-locale-l1-1-0.dll  _configthreadlocale
1   0x00402054 NONE FUNC api-ms-win-crt-heap-l1-1-0.dll    _set_new_mode
1   0x00402000 NONE FUNC KERNEL32.dll                      GetCurrentProcessId
2   0x00402004 NONE FUNC KERNEL32.dll                      GetModuleHandleW
3   0x00402008 NONE FUNC KERNEL32.dll                      GetStartupInfoW
4   0x0040200c NONE FUNC KERNEL32.dll                      IsDebuggerPresent
5   0x00402010 NONE FUNC KERNEL32.dll                      InitializeSListHead
6   0x00402014 NONE FUNC KERNEL32.dll                      GetSystemTimeAsFileTime
7   0x00402018 NONE FUNC KERNEL32.dll                      GetCurrentThreadId
8   0x0040201c NONE FUNC KERNEL32.dll                      UnhandledExceptionFilter
9   0x00402020 NONE FUNC KERNEL32.dll                      QueryPerformanceCounter
10  0x00402024 NONE FUNC KERNEL32.dll                      IsProcessorFeaturePresent
11  0x00402028 NONE FUNC KERNEL32.dll                      TerminateProcess
12  0x0040202c NONE FUNC KERNEL32.dll                      GetCurrentProcess
13  0x00402030 NONE FUNC KERNEL32.dll                      SetUnhandledExceptionFilter

Looking at the list of functions, there are some that may generate some concern and are often used by malware for different reasons. One of these functions is IsDebuggerPresent, which can be used by malware as an anti-analysis technique. The command axt can be used to check where the function is being called, the address of the function is used as the argument for the command.

[0x00401268]> axt 0x0040200c
fcn.004016fe 0x4017d6 [CALL:--x] call dword [sym.imp.KERNEL32.dll_IsDebuggerPresent]

If this sample was a potential malware, then it would be necessary to determine what makes the call to this function, however, this is outside of the scope of this blog post.

Loading the sample that is packed with UPX into Radare2 and displaying the sections that the PE binary has shows a very different story.

[Sections]

nth paddr         size vaddr        vsize perm entropy    type name
―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
0   0x00000400     0x0 0x00401000  0x6000 -rwx            ---- UPX0
1   0x00000400  0x1200 0x00407000  0x2000 -rwx 7.32773884 ---- UPX1
2   0x00001600   0x600 0x00409000  0x1000 -rw- 3.88507214 ---- .rsrc

This program consists of only three sections. Two of these sections, namely UPX0 and UPX1, immediately indicate the packer used.

The UPX0 section has a size of 0 bytes on disk but allocates 24 Kilobytes when loaded into memory. This significant difference raises concerns. Furthermore, this section possesses read, write, and execute permissions, allowing it to be altered during runtime.

The UPX1 section exhibits high entropy compared to the unpacked version, which is unusual. Additionally, it has all permissions, which is not a typical configuration.

The libraries that are linked to the executable are the same as the unpacked version.

[Linked libraries]
api-ms-win-crt-heap-l1-1-0.dll
api-ms-win-crt-locale-l1-1-0.dll
api-ms-win-crt-math-l1-1-0.dll
api-ms-win-crt-runtime-l1-1-0.dll
api-ms-win-crt-stdio-l1-1-0.dll
kernel32.dll
user32.dll
vcruntime140.dll

8 libraries

However, the difference is significant when listing the imported functions, with the list being shorter and showing the calls to the LoadLibraryA, GetProcAddress, and VirtualProtect being used in this version but not in the unpacked version.

[Imports]
nth vaddr      bind type lib                               name
―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
1   0x00409290 NONE FUNC api-ms-win-crt-heap-l1-1-0.dll    _set_new_mode
1   0x00409298 NONE FUNC api-ms-win-crt-locale-l1-1-0.dll  _configthreadlocale
1   0x004092a0 NONE FUNC api-ms-win-crt-math-l1-1-0.dll    __setusermatherr
1   0x004092a8 NONE FUNC api-ms-win-crt-runtime-l1-1-0.dll exit
1   0x004092b0 NONE FUNC api-ms-win-crt-stdio-l1-1-0.dll   _set_fmode
1   0x004092b8 NONE FUNC KERNEL32.DLL                      LoadLibraryA
2   0x004092bc NONE FUNC KERNEL32.DLL                      ExitProcess
3   0x004092c0 NONE FUNC KERNEL32.DLL                      GetProcAddress
4   0x004092c4 NONE FUNC KERNEL32.DLL                      VirtualProtect
1   0x004092cc NONE FUNC USER32.dll                        MessageBoxW
1   0x004092d4 NONE FUNC VCRUNTIME140.dll                  memset

Checking for references to these functions show that none exist, meaning that they are most likely being dynamically loaded

[0x00407e70]> axt 0x004092b8
[0x00407e70]> axt 0x004092c0
[0x00407e70]> axt 0x004092c4
[0x00407e70]> axt 0x004092cc
[0x00407e70]>

The functions that exist within the binary are listed using the afl command, in the packed version it only shows two functions. Despite the program only having one function in the source code, it contains a lot more functions that the packed version does.

0x00407e70   51    439 entry0
0x004071bb    3     37 fcn.004071bb

Checking for where the function named fcn.004071bb is called shows that there is no reference. The entry0 function has several call instructions listed, which isn't uncommon, but can point to areas to analyze further when debugging.

[0x00407e70]> axt 0x004071bb
[0x00407e70]> pdf @ entry0 ~ call
│     │╎╎   0x00407f8a      ff96b8820000   call dword [esi + 0x82b8]
│    ╎│ ╎   0x00407f9f      ff96c0820000   call dword [esi + 0x82c0]
│     │└──> 0x00407fb0      ff96bc820000   call dword [esi + 0x82bc]
│       ╎   0x00407ffe      ffd5           call ebp
│       ╎   0x00408013      ffd5           call ebp

Finally, analyzing the sample that was packed using ASPack, it shows a very different section table. It contains the same sections as the original version of the sample, but it adds the .aspack and .adata sections, and there is a higher level of entropy for those sections that match the original.

[Sections]

nth paddr         size vaddr        vsize perm entropy    type name
―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
0   0x00000400   0xa00 0x00401000  0x1000 -rwx 7.31497939 ---- .text
1   0x00000e00   0x600 0x00402000  0x1000 -rw- 7.35641437 ---- .rdata
2   0x00001400   0x200 0x00403000  0x1000 -rw- 0.74297410 ---- .data
3   0x00001600   0x200 0x00404000  0x1000 -rw- 0.66525059 ---- .rsrc
4   0x00001800   0x200 0x00405000  0x1000 -rw- 4.86675397 ---- .reloc
5   0x00001a00  0x1400 0x00406000  0x2000 -rwx 6.17277319 ---- .aspack
6   0x00002e00     0x0 0x00408000  0x1000 -rwx            ---- .adata

The libraries that are linked remain as the same 8 that the original version references.

[Linked libraries]
kernel32.dll
user32.dll
vcruntime140.dll
api-ms-win-crt-runtime-l1-1-0.dll
api-ms-win-crt-math-l1-1-0.dll
api-ms-win-crt-stdio-l1-1-0.dll
api-ms-win-crt-locale-l1-1-0.dll
api-ms-win-crt-heap-l1-1-0.dll

8 libraries

However, checking the functions that are imported is more in line with the UPX packed version, with the difference being that it doesn't reference the ExitProcess and VirtualProtect functions.

[Imports]
nth vaddr      bind type lib                               name
―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
1   0x00406fd0 NONE FUNC kernel32.dll                      GetProcAddress
2   0x00406fd4 NONE FUNC kernel32.dll                      GetModuleHandleA
3   0x00406fd8 NONE FUNC kernel32.dll                      LoadLibraryA
1   0x00407191 NONE FUNC user32.dll                        MessageBoxW
1   0x00407199 NONE FUNC vcruntime140.dll                  __current_exception
1   0x004071a1 NONE FUNC api-ms-win-crt-runtime-l1-1-0.dll _crt_atexit
1   0x004071a9 NONE FUNC api-ms-win-crt-math-l1-1-0.dll    __setusermatherr
1   0x004071b1 NONE FUNC api-ms-win-crt-stdio-l1-1-0.dll   _set_fmode
1   0x004071b9 NONE FUNC api-ms-win-crt-locale-l1-1-0.dll  _configthreadlocale
1   0x004071c1 NONE FUNC api-ms-win-crt-heap-l1-1-0.dll    _set_new_mode

Checking the list of functions, it does appear to contain more functions than the UPX packed version

0x00406001    1     11 entry0
0x00406014    9    118 fcn.00406014
0x004066e0    5    106 fcn.004066e0
0x00406a9c    3    126 fcn.00406a9c
0x00406d0a    1     14 fcn.00406d0a
0x00406827    1     37 fcn.00406827
0x00406b1a    1     97 fcn.00406b1a
0x00406b7b   33    399 fcn.00406b7b
0x00406d18   29    664 fcn.00406d18
0x004067bc    5    107 fcn.004067bc
0x0040684c   15    380 fcn.0040684c
0x004069c8   14    212 fcn.004069c8

Based on this static analysis, which allows for the comparison of two packing tools, it becomes evident that the UPX packer provides several indicators that make detecting a packed binary relatively easier. These indicators are not necessarily easy to obfuscate. On the other hand, the ASPacker presents a somewhat simpler situation by adding only two sections that could potentially be renamed in an attempt to avoid detection. It's important to note that this analysis didn't include all aspects of the header that can reveal the packed nature of binaries. However, having the original binary for comparison greatly aids in understanding the changes that occur, although this may not always be the case when dealing with samples found in the wild.

In an upcoming post, I will explore the process of extracting unpacked data through dynamic analysis using a debugger. Dynamic analysis is a powerful technique that allows observation of how a packed binary behaves when executed, providing insights into its inner workings and uncovering potential security threats. Debugging tools enable stepping through the code, analyzing memory structures, and gaining a deeper understanding of the program's runtime behavior.

As this discussion on packing techniques and analysis concludes, it becomes evident that understanding the intricacies of packing is essential for both cybersecurity professionals and malware analysts. Whether detecting the telltale signs of packing or dynamically analyzing a binary, these skills are invaluable in the ever-evolving landscape of cybersecurity.