Chapter 14 Notes
to accompany Sikorski and Honig, Practical Malware Analysis, no starch press
Data Encoding
Simple ciphers may be used, or more sophisticated. We need to recognize the common forms.
Malware authors may use encoding to hide configuration information, or to prepare information to be
sent outside, or to hide strings until needed, thereby hiding the malware's malicious properties
Simple Ciphers
- May be good enough
- Easy to code
- Compact
- May be less obvious than more sophisticated ciphers
- Example: Caesar cipher, shift three letters left so that "malware" becomes "pdozduh"
- Example: single-byte XOR, which is its own inverse, and not too hard to break
- Textbook describes a script that tries to XOR a file with the values 0x01 to 0xff in hope of
finding a known file header such as MZ, or a string such as "This program cannot be run in DOS mode"
- Nobody XORs with 0x00. Likewise, when XOR'ing a string that has a lot of NULL characters, the key
becomes obvious.
- So a NULL-preserving XOR copies NULLs as is, and XORs the other characters
- Not too hard to spot in IDA: small loops with an XOR instruction in the middle
- But the XOR instruction has other uses!
- Other simple encoding schemes are found
Base64
- Used to represent binary data using ASCII strings
- Originally invented for email (MIME types) but now used in other contexts
- Three eight-bit bytes are mapped to four characters in the printable range
- As shown in Figure 14-4
- See the utility http://www.opinionatedgeek.com/dotnet/tools/base64decode/
- The string ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/ is a clue
- But malware authors can modify this string as they wish to customize their encoding scheme
Example (based on Exercise 13-1)
- A string the looks like the alphabet, upper and lower case plus numerals, is a Base64 indicator. Although it could mean other things, too, such as a Caesar cipher that uses more than just A-Z

Common Crypto
Custom-Encoding
- Use the simple techniques in combination, or...
- Encoding and decoding functions are sometimes near the I/O functions, i.e. encode shortly before write
and decode shortly after read
- A stream cipher will generate a pseudo-random string to be XOR'ed with other data: very tough
Decoding
- Use the decoding code in the malware
- Or write a short script to do the decoding, in Python or whatever
- See for example https://www.dlitz.net/software/pycrypto/
- Or make a script for the Immunity Debugger
Another Example (based on Exercise 13-1)
- Searching for XOR instructions is easy enough in Ghidra. Seach > Text>"xor". Most such instances are of the form "XOR EAX, EAX", which just zeros out EAX. But not this one!

Example (based on Exercise 13-1)
- It looks like a XOR cipher, with the 0x3b as the key. The tight loop is obvious in the Ghidra Decompile window.
