Scenario

You are given an unknown file format containing sensitive information. You have access to the application that reads and writes to the file.

Tools

IDA Pro
Immunity Debugger
PEid/SnD Reverser Tool
Process Monitor
Hex Editor – HxD

Compressed or serialized or encrypted

If a given file is just compressed with standard algorithm, the first few bytes (magic number) will usually reveal the compression algorithm. Binwalk/unix “file” utility may be used to identify various common file formats. Running “strings” utility may help gather more information about the file.

The objects in memory when dumped in binary format form a serialized file. Serialized files allow persistence of objects. The contents of the file can be easily read back as object. Serialized files can be usually identified by chunks of readable text sprayed throughout the file. Consider the following example.

The elements marked in blue are the strings and the ones marked in red are their respective size (Remember Little Endian).

Files that are either encrypted or compressed using custom algorithms look garbled and can be identified only by elimination. The following picture shows the contents of a file encrypted using simple xor operation (Can you find the key?).

Determine the encryption algorithm

Most encryption algorithms depend on constants in one way or other. For instance AES uses Substitution Box; Blowfish uses Substitution Box (Sbox), P Array etc. These constants can be used to identify the encryption algorithms. Tools like KANAL plug-in for PEid, SnD Reverser Tool can be used to identify most encryption algorithms. These tools reveal not only the encryption algorithm but also the exact location in the code where the constants have been referred.

In order to use these tools it is important to identify whether the executable is calling the encryption/decryption routine or if the application uses a DLL to read/write the encrypted file. Process monitor can be used to identify the exact DLL that invokes read/write operation on the file. (Right-click->Properties->Stack). Alternatively API monitor can be used to achieve the same.

Finding the key

Developing a high level understanding of the encryption algorithm used aids in determining the encryption key. Following is a C++ implementation of blowfish (http://www.schneier.com/code/bfsh-con.zip).

// constructs the enctryption sieve

void

CBlowFish::Initialize (BYTE key[],int keybytes)

{

inti, j ;

DWORD data, datal, datar ;

unionaword temp ;

// first fill arrays from data tables

for(i = 0 ; i < 18 ; i++)

PArray [i] = bf_P [i] ;

for(i = 0 ; i < 4 ; i++)

{

for(j = 0 ; j < 256 ; j++)

SBoxes [i][j] = bf_S [i][j] ;

}

j = 0 ;

for(i = 0 ; i < NPASS + 2 ; ++i)

{

temp.dword = 0 ;

temp.w.byte0 = key[j];

temp.w.byte1 = key[(j+1) % keybytes] ;

temp.w.byte2 = key[(j+2) % keybytes] ;

temp.w.byte3 = key[(j+3) % keybytes] ;

data = temp.dword ;

PArray [i] ^= data ;

j = (j + 4) % keybytes ;

}

datal = 0 ;

datar = 0 ;

for(i = 0 ; i < NPASS + 2 ; i += 2)

{

Blowfish_encipher (&datal, &datar) ;

PArray [i] = datal ;

PArray [i + 1] = datar ;

}

for(i = 0 ; i < 4 ; ++i)

{

for(j = 0 ; j < 256 ; j += 2)

{

Blowfish_encipher (&datal, &datar) ;

SBoxes [i][j] = datal ;

SBoxes [i][j + 1] = datar ;

}

voidCBlowFish::Blowfish_encipher (DWORD *xl, DWORD *xr)

{

unionaword Xl, Xr ;

Xl.dword = *xl ;

Xr.dword = *xr ;

Xl.dword ^= PArray [0];

ROUND (Xr, Xl, 1) ; ROUND (Xl, Xr, 2) ;

ROUND (Xr, Xl, 3) ; ROUND (Xl, Xr, 4) ;

ROUND (Xr, Xl, 5) ; ROUND (Xl, Xr, 6) ;

ROUND (Xr, Xl, 7) ; ROUND (Xl, Xr, 8) ;

ROUND (Xr, Xl, 9) ; ROUND (Xl, Xr, 10) ;

ROUND (Xr, Xl, 11) ; ROUND (Xl, Xr, 12) ;

ROUND (Xr, Xl, 13) ; ROUND (Xl, Xr, 14) ;

ROUND (Xr, Xl, 15) ; ROUND (Xl, Xr, 16) ;

Xr.dword ^= PArray [17] ;

*xr = Xl.dword ;

*xl = Xr.dword ;

}

Listing-1

It can be observed from the “CBlowFish::Initialize” routine that the key is only used to modify the P-Array and it is not used anywhere else. As we already know P-Array is constant and we know where it is being referenced in the code (from KANAL/SnD), you can directly find the initialization routine in the binary. Here is the assembly listing of the blowfish initialization routine from a commercial application.

.text:6ADD7892 ; int __cdecl init_blowfish(void *,int,int)

.text:6ADD7892 init_blowfish proc near ; CODE XREF: sub_6ADD1DD0+1Ap

.text:6ADD7892 ; sub_6ADD7A20+18p

.text:6ADD7892

.text:6ADD7892 var_14 = dword ptr -14h

.text:6ADD7892 var_10 = dword ptr -10h

.text:6ADD7892 var_C = dword ptr -0Ch

.text:6ADD7892 var_8 = dword ptr -8

.text:6ADD7892 var_4 = dword ptr -4

.text:6ADD7892 arg_0 = dword ptr 8

.text:6ADD7892 arg_4 = dword ptr 0Ch

.text:6ADD7892 arg_8 = dword ptr 10h

.text:6ADD7892

.text:6ADD7892 push ebp

.text:6ADD7893 mov ebp, esp

.text:6ADD7895 sub esp, 14h

.text:6ADD7898 push ebx

.text:6ADD7899 push esi

.text:6ADD789A mov esi, [ebp+arg_0]

.text:6ADD789D push edi

//initializes 1048h bytes allocated in esi to 0

.text:6ADD789E push 1048h ; size_t

.text:6ADD78A3 push 0 ; int

.text:6ADD78A5 push esi ; void *

.text:6ADD78A6 call memset

.text:6ADD78AB add esp, 0Ch

.text:6ADD78AE mov eax, offset sbox

.text:6ADD78B3 lea ecx, [esi+48h]

.text:6ADD78B6

.text:6ADD78B6 loc_6ADD78B6: ; CODE XREF: init_blowfish+3Bj

.text:6ADD78B6 mov edx, 100h

.text:6ADD78BB

.text:6ADD78BB loc_6ADD78BB: ; CODE XREF: init_blowfish+34j

//loop begin – to copy sbox to the memory pointed by esi+48

.text:6ADD78BB mov edi, [eax]

.text:6ADD78BD mov [ecx], edi

.text:6ADD78BF add eax, 4

.text:6ADD78C2 add ecx, 4

.text:6ADD78C5 dec edx

.text:6ADD78C6 jnz short loc_6ADD78BB

.text:6ADD78C8 cmp eax, offset unk_6ADDE680

.text:6ADD78CD jl short loc_6ADD78B6

//loop end

//parray

.text:6ADD78CF mov edx, offset p_array

.text:6ADD78D4 xor ecx, ecx

.text:6ADD78D6 mov eax, esi

.text:6ADD78D8 sub edx, esi

//size of parray

.text:6ADD78DA mov [ebp+var_8], 12h

.text:6ADD78E1

.text:6ADD78E1 loc_6ADD78E1: ; CODE XREF: init_blowfish+7Ej

//loop starts

.text:6ADD78E1 xor edi, edi

.text:6ADD78E3 mov [ebp+var_C], 4

.text:6ADD78EA

.text:6ADD78EA loc_6ADD78EA: ; CODE XREF: init_blowfish+6Fj

//it is

evident from the following sequence, that IDA has wrongly identified the second argument (arg_4) as int while it is a pointer and more specifically it’s a byte array.

//The following sequence picks 4 bytes from the byte array and converts it to a dword.

.text:6ADD78EA mov ebx, [ebp+arg_4]

.text:6ADD78ED movzx ebx, byte ptr [ecx+ebx]

.text:6ADD78F1 shl edi, 8

.text:6ADD78F4 or edi, ebx

.text:6ADD78F6 inc ecx

//arg_8 contains the size of arg_4

.text:6ADD78F7 cmp ecx, [ebp+arg_8]

.text:6ADD78FA jl short loc_6ADD78FE

.text:6ADD78FC xor ecx, ecx

.text:6ADD78FE

.text:6ADD78FE loc_6ADD78FE: ; CODE XREF: init_blowfish+68j

.text:6ADD78FE dec [ebp+var_C]

.text:6ADD7901 jnz short loc_6ADD78EA

//ebx contains

the p_array and edi contains the 4 byte integer value. Now if the following sequence is compared with the above high level implementation it is evident that arg4 is the key and arg8 is its size.

.text:6ADD7903 mov ebx, [edx+eax]

.text:6ADD7906 xor ebx, edi; ebx – p array; edi – key

.text:6ADD7908 mov [eax], ebx

.text:6ADD790A add eax, 4

.text:6ADD790D dec [ebp+var_8]

.text:6ADD7910 jnz short loc_6ADD78E1

//loops back;

the next set of 4 bytes in the byte array is picked and the process is repeated.

.text:6ADD7912 xor eax, eax

.text:6ADD7914 mov [ebp+var_8], eax

.text:6ADD7917 mov [ebp+var_C], eax

.text:6ADD791A mov [ebp+var_4], eax

.text:6ADD791D

loc_6ADD791D: ; CODE XREF: init_blowfish+B3j

.text:6ADD791D lea edi, [ebp+var_C]

.text:6ADD7920 lea ebx, [ebp+var_8]

.text:6ADD7923 mov ecx, esi

.text:6ADD7925 call sub_6ADD784F

.text:6ADD792A mov eax, [ebp+var_4]

.text:6ADD792D mov ecx, [ebp+var_8]

.text:6ADD7930 add [ebp+var_4], 2

.text:6ADD7934 shl eax, 2

.text:6ADD7937 cmp [ebp+var_4], 12h

.text:6ADD793B mov [eax+esi], ecx

.text:6ADD793E mov ecx, [ebp+var_C]

.text:6ADD7941 mov [eax+esi+4], ecx

.text:6ADD7945 jl short loc_6ADD791D

.text:6ADD7947 lea eax, [esi+4Ch]

.text:6ADD794A mov [ebp+var_14], 4

.text:6ADD7951

loc_6ADD7951: ; CODE XREF: init_blowfish+F2j

.text:6ADD7951 mov [ebp+var_4], eax

.text:6ADD7954 mov [ebp+var_10], 80h

.text:6ADD795B

loc_6ADD795B: ; CODE XREF: init_blowfish+EDj

.text:6ADD795B lea edi, [ebp+var_C]

.text:6ADD795E lea ebx, [ebp+var_8]

.text:6ADD7961 mov ecx, esi

.text:6ADD7963 call sub_6ADD784F

.text:6ADD7968 mov eax, [ebp+var_4]

.text:6ADD796B mov ecx, [ebp+var_8]

.text:6ADD796E mov [eax-4], ecx

.text:6ADD7971 mov ecx, [ebp+var_C]

.text:6ADD7974 mov [eax], ecx

.text:6ADD7976 add eax, 8

.text:6ADD7979 dec [ebp+var_10]

.text:6ADD797C mov [ebp+var_4], eax

.text:6ADD797F jnz short loc_6ADD795B

.text:6ADD7981 dec [ebp+var_14]

.text:6ADD7984 jnz short loc_6ADD7951

.text:6ADD7986 pop edi

.text:6ADD7987 pop esi

.text:6ADD7988 mov al, 1

.text:6ADD798A pop ebx

.text:6ADD798B leave

.text:6ADD798C retn

.text:6ADD798C init_blowfish endp

Listing-2

Similar idea could be extended to AES as well. In the key expansion phase of AES, the sbox along with the AES key is used in creating round keys. Once we get the code location where sbox is referred, we can easily find the key. Here is a high level implementation of the AES key expansion phase[1].

Constants: int Nb = 4; // but it might change someday

Inputs: int

Nk = 4, 6, or 8; // the number of words in the key

array key of 4*Nk bytes or Nk words // input key

Output:array

w of Nb*(Nr+1) words or 4*Nb*(Nr+1) bytes // expanded key

Algorithm:

void KeyExpansion(byte[] key, word[] w, int Nw) {

int Nr = Nk + 6;

w = new byte[4*Nb*(Nr+1)];

int temp;

int i = 0;

while ( i < Nk) {

w[i] = word(key[4*i], key[4*i+1], key[4*i+2], key[4*i+3]);

i++;

}

i = Nk;

while(i < Nb*(Nr+1)) {

temp = w[i-1];

if (i % Nk == 0)

temp = SubWord(RotWord(temp)) ^ Rcon[i/Nk];

else if (Nk > 6 && (i%Nk) == 4)

temp = SubWord(temp);

w[i] = w[i-Nk] ^ temp;

i++;

}

Listing-3

As observed the bytes in the key are used as an index to SBox. Spotting the code location that refers the SBOX directly aids in the discovery of the key. Remember that this varies with encryption algorithm involved. But in general, a high level understanding of the encryption algorithm aids in quick retrieval of the key.

Determine mode of operation

Block based ciphers cannot be directly used to encrypt data whose size is greater than the maximum block size the cipher supports. For instance, AES has a maximum block size of 16 bytes and for blowfish it is 8 bytes. In case of blowfish, to encrypt data whose size is greater than that of this maximum size, the data is split into 8 byte blocks and the encryption algorithm is applied on each block separately. If the data cannot be split even, padding bytes are used.

The mode of operation decides how these individual blocks are encrypted. There are several modes of operation: ECB, CBC, CFB, OFB, CTR etc. It is important to find the encryption function to identify the mode of operation that is being used. Normally the call to decryption ensues right after the initialization, with ciphertext as one of its arguments. But the call to decryption may very well occur on a “need to know” basis (C-5 in thick client pentest challenges??). The decryption function can be identified by tracing the application or placing a memory breakpoint for instance in AES, setting a memory bp on the round keys may reveal the encryption function. Following code is an assembly listing that shows the call to decryption function immediately following the initialization.

.text:64218035 lea eax, [ebp+var_117C]

.text:6421803B push eax ; void *

.text:6421803C call sub_6422355F//memsets 1048h bytes to 0

.text:64218041 lea eax, [ebp+var_21C4]

.text:64218047 push eax ; void *

.text:64218048 call sub_6422355F //memsets 1048h bytes to 0

.text:6421804D push offset byte_642653A4 ; int//hard coded key

.text:64218052 lea esi, [ebp+var_117C]

.text:64218058 call init_blowfish

.text:6421805D pop ecx

//ebx+1c contains the string to be decrypted. It is copied to var_28

.text:6421805E push 10h ; size_t

.text:64218060 lea eax, [ebx+1Ch]

.text:64218063 push eax ; void *

.text:64218064 lea eax, [ebp+var_28]

.text:64218067 push eax ; void *

.text:64218068 call memcpy

.text:6421806D add esp, 0Ch

.text:64218070 push 10h //size of the string to be decrypted

.text:64218072 pop eax

.text:64218073 lea edx, [ebp+var_28]//string

.text:64218076 mov ecx, esi

//decrypts the string

.text:64218078 call decrypt_blowfish ; fastcall

.text:64218078 ; eax-size of the string to be decrypted

.text:64218078 ; edx-string to be encrypted/decrypted

.text:64218078 ; ecx-object containing the modified sbox/pi

Listing-4

The absence of IV as an argument to the decrypt function call can be observed. The guessed mode of operation can be confirmed by analyzing the decrypt_blowfish call.

.text:642235A1 ; ¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦ S U B R O U T I N E ¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦

.text:642235A1

.text:642235A1 ; eax-size of the string to be decrypted

.text:642235A1 ; edx-string to be encrypted/decrypted

.text:642235A1 ; ecx-key xored with sbox/pi

.text:642235A1 ;

.text:642235A1 ; Attributes: bp-based frame

.text:642235A1 var_C = dword ptr -0Ch

.text:642235A1 var_8 = dword ptr -8

.text:642235A1 var_4 = dword ptr -4

.text:642235A1

.text:642235A1 push ebp

.text:642235A2 mov ebp, esp

.text:642235A4 sub esp, 0Ch

.text:642235A7 push edi

//check if the size of the string is greater than 0

.text:642235A8 mov edi, eax

.text:642235AA test edi, edi

.text:642235AC jbe short loc_64223602

//esi=edx+4 (edx-string to be decrypted)

.text:642235AE push ebx

.text:642235AF push esi

.text:642235B0 mov esi, edx

.text:642235B2 add esi, 4

.text:642235B5 dec edi

//divide the size/8

.text:642235B6 shr edi, 3

.text:642235B9 inc edi ; edi contains no of 8 byte blocks

.text:642235BA

.text:642235BA loc_642235BA: ; CODE XREF: encryptordecrypt_blowfish+5Dj

.text:642235BA mov eax, [esi-4]

.text:642235BD mov ebx, [esi]

.text:642235BF lea edx, [ecx+44h]

.text:642235C2 mov [ebp+var_4], edx

.text:642235C5 mov [ebp+var_8], 10h

.text:642235CC

.text:642235CC loc_642235CC:

decryption ; CODE XREF: encryptordecrypt_blowfish+44j

.text:642235CC mov edx, [ebp+var_4] ; edx contains the sbox modified by the key

.text:642235CF xor eax, [edx] ; eax contains the string to be decrypted

.text:642235D1 mov [ebp+var_C], eax

.text:642235D4 call sub_6422351F;contains more decryption logic and does not refer any . constants

.text:642235D9 sub [ebp+var_4], 4

.text:642235DD xor eax, ebx

.text:642235DF dec [ebp+var_8]

.text:642235E2 mov ebx, [ebp+var_C]

.text:642235E5 jnz short loc_642235CC ; edx contains the sbox modified by the key

.text:642235E7 mov edx, eax

.text:642235E9 mov eax, [ecx+4]

.text:642235EC xor eax, edx

.text:642235EE mov edx, [ecx]

.text:642235F0 xor edx, ebx

.text:642235F2 mov [esi-4], edx ;contains decrypted text

.text:642235F5 mov [esi], eax ;contains decrypted text

.text:642235F7 add esi, 8;move to the next block of 8 bytes

.text:642235FA dec edi

.text:642235FB mov [ebp+var_C], ebx

.text:642235FE jnz short loc_642235BA

.text:64223600 pop esi

.text:64223601 pop ebx

.text:64223602

.text:64223602 loc_64223602: ; CODE XREF: encryptordecrypt_blowfish+Bj

.text:64223602 pop edi

.text:64223603 leave

.text:64223604 retn

.text:64223604 decrypt_blowfish endp

Listing-5

Comments have been added only to the interesting parts of the code (To have a complete understanding please refer to the high level implementation of the code provided earlier). From the above, it is clear that there is no sign of an IV

being used clearly indicating that the code uses ECB as its mode of operation.

If IV is being used then the mode of operation can be identified using the following characteristics:

CBC

IV is xored only with the first block of the decrypted text

CFB

1) IV serves as the first block

2) The first 8 bytes of the cipher text serves as the 2^ndblock

OFB

1) IV serves as the first block

2) The output of the first block before it is xored with cipher text serves as the 2^nd block

CTR

1) The nonce is incremented for each block.

2) If the nonce is predetermined, CTR allows preprocessing and the plaintext is just obtained by xoring the cipher text block with the preprocessed block.

Decryption pitfall

To decrypt the entire file there are several options including writing a loader, using DBI- pin (homebrew mod – pinpy), writing a script that accepts encrypted file and produces decrypted file. If you don’t have enough information about the encryption used, then writing a loader/DBI may be helpful. But if you have sufficient details including the name of the encryption algorithm, key, the location of the cipher text in the file, and mode of operation used then it is better to write a script in language of your choice.

Here we will discuss the pitfalls on writing a script. Byte ordering (Little
Endian /Big Endian) plays a critical role in decryption. Using wrong byte order may drive us to conclude that our earlier deductions were wrong. The byte ordering of the key and cipher text is of significance here.

In the above case, the key is read as byte array and then converted to an integer. This would mean that we can directly use the key as such in the script. The highlighted text below shows the conversion.

//convert every 4 bytes in the byte array to an integer

.text:6ADD78E3 mov [ebp+var_C], 4

.text:6ADD78EA

.text:6ADD78EA loc_6ADD78EA: ; CODE XREF: init_blowfish+6Fj

//it is

evident that IDA has wrongly identified the second argument (arg_4) as int while it is a pointer.

//Also the

following sequence reads 4 bytes converts it to dword and stores it in edi

.text:6ADD78EA mov ebx, [ebp+arg_4]

.text:6ADD78ED movzx ebx, byte ptr [ecx+ebx]

.text:6ADD78F1 shl edi, 8

.text:6ADD78F4 or edi, ebx

.text:6ADD78F6 inc ecx

//arg_8 contains the size of arg_4

.text:6ADD78F7 cmp ecx, [ebp+arg_8]

.text:6ADD78FA jl short loc_6ADD78FE

.text:6ADD78FC xor ecx, ecx

.text:6ADD78FE

.text:6ADD78FE loc_6ADD78FE: ; CODE XREF: init_blowfish+68j

.text:6ADD78FE dec [ebp+var_C]

.text:6ADD7901 jnz short loc_6ADD78EA

Listing-6 (excerpt from Listing-2)

Let us assume that the file was read as byte array (not as an object) and placed in the memory. This implies that the content in memory and the file looks exactly the same (This can be verified using a debugger). As observed from the highlighted text in Listing-5, the cipher text is read as an “integer” from the memory location. This would mean that when the cipher text was written to the file, it must been written as integer and the application must have followed little Endian order – the least significant is placed first and the most significant byte is placed at the end. This implies that if the encrypted block in the file is “ABCDEFGHIJKL” then the actual cipher text is “DCBAHGFELKJI”. While writing the script, it is important to change the byte order of the cipher text before it is fed as an input to the decryption function. Similarly after decryption if the resulting decrypted text is “cnufniot” the actual plain text is “function” (Why?).

This is only a “heads up” on byte ordering and may not apply to all scenarios.

Modification and Encryption

The modification and encryption of a decrypted file depends entirely on the application and it requires understanding the structure of the file by reversing the modules that encrypts the file. Sometimes the file may be divided into blocks of fixed size and then each individual blocks may be encrypted separately. The encryption function may be identified by using the above methods but the structure of the file can only be obtained by reversing the modules that calls this encryption function.

The following might serve as a check list of all the items that may be required for modification and encryption.

Header of the file
Magic number/Format Identifier
Checksum/Hash of the header
Checksum/Hash of the encrypted block
Size of the encrypted block
Location of key in the file(if not hardcoded)
Block size of the encryption algorithm used, to add appropriate padding bytes.

Shown below is a sample of an encrypted file.

Red – Fixed Header –> the last 4 bytes specifies the location of the encrypted text.

Blue – checksum of the encrypted data

Green – key

Orange – Size of plain text

Yellow – Encrypted data

Conclusion

Reversing an encrypted file is a time consuming task and should be attempted only if that is your only option.

Attacker's logs

Reversing Encrypted Files