Few words about Kraken [Archive] - RCE Messageboard's Regroupment

ZaiRoN

April 24th, 2008, 05:50

Kraken is the word of the month for sure, but it has nothing to do with the beast from an old nice book written by Jules Verne, Twenty Thousand Leagues Under the Sea.
The word refers to a series of malwares, something like the Storm trojan, but with much more strength. Kraken seems to be out from August 2006, but until today I’ve never heard about it. Some days ago I read an article (http://searchsecurity.techtarget.com/news/article/0,289142,sid14_gci1308645,00.html) about it, the interesting part is here:
“One somewhat interesting feature of the code is that the binary is not packed, as many malware binaries tend to be. However, Royal said that the code does have some other forms of obfuscation that make it difficult to analyze completely.”. I decided to look at it.

I’m not going to give out a detailed explanation about the sample I’m working on (MD5 = 592523a88df3d043d61a14b11a79bd55), but I’ll spend some words on the “forms of obfuscation” used by the malware.

Detectors are not able to recognize any specific packer/protector. The file is not packed, but from the first lines of code it’s pretty easy to understand that a sort of obfuscation/encryption was included inside the file. I have not found interesting imports/strings, so I tried running the malware. Just to be sure to retrieve some useful information I started logging all API(s) called by the malware.
The malware calls some nice functions. Almost all the code of the binary file has been decrypted at runtime. The malware spawns one file and it deletes itself, you can spy the decrypted code but I didn’t get anything useful from it. The best thing to do is to look at the code trying to identify a general obfuscation scheme or a decryption routine. Don’t think to trace the entire exe, it’s madness!

In case like this one, if you are able to see a light over your head you are lucky, otherwise you can step and look at each instruction for the eternity. I was lucky… the real code has been hidden behind a virtual machine. I’m not a virtual machine expert for sure, I only read some articles about this kind of protection.
I won’t rebuild the entire machine, I’ll give out my findings only. If you think they are wrong and/or you want to add some more information about the virtual machine I’ll be happy to see a comment from you.

Like every virtual machine out there, after a little initialization it goes into a semi-infinite loop that starts at 4012DA. It simply selects a virtual machine instruction and jump to the code to run. There are a lot of instructions inside the loop, avoiding some junk code you can see the snippet used to select (and then jump to) the next instruction to execute:

Code:

004012E4 MOV AL,BYTE PTR DS:[ESI-1] // Byte pointed by esi-1 decides everything

004012F3 ADD AL,BL

0040F807 DEC AL

004103D9 DEC ESI   // Shift to the next byte

004103E7 ROL AL,2

004103F7 DEC AL

0040F590 XOR AL,0CF

0040F594 SUB AL,6B

004104A6 ADD BL,AL

004104AF MOVZX EAX,AL

004104B7 MOV ECX,DWORD PTR DS:[EAX*4+40FABB]   // EAX = index of the selected instruction

004104C6 NOT ECX

0040129C ROR ECX,1C

00410213 SUB ECX,4DCBE90C

0041021F ROL ECX,7

00410229 INC ECX

0041070D BSWAP ECX

00401195 ADD ECX,5E1E81EF

0040119C XOR ECX,77B911BC

004011AE NOT ECX

0041071B ADD ECX,60334BE6   // ECX = address of the selected instruction

0040FFF3 MOV DWORD PTR SS:[ESP+48],ECX

0040FFFB PUSH DWORD PTR SS:[ESP+48]

0040FFFF RETN 4C   // Go to the selected instruction

Everything starts from the value stored inside the buffer pointed by (esi-1), the buffer contains a series of bytes and they are used to select the virtual machine instruction to execute (Moreover they are used to retrieve one or more vm_instruction’s operand). The new value stored inside EAX (obtained after some minor operations) is used to retrieve a dword value, EAX represents the index of the vector that starts at 0×40FABB. As you can see from the code above the new value is used to obtain the address of the vm_instruction to execute.
Unlike a classical virtual machine this one doesn’t have a clear Instruction Table, spying the dead list from your favorite disassembler you won’t see the address of every single vm_instruction. The Instruction Table has been crypted and the first entry is located at 0×40FABB (there are 256 entries).
The virtual machine has 16 registers (from r_0 to r_15), they can be used to store byte, word or dword data. EDI register points to the first one, the registers are stored in memory consecutively starting from r_0 to r_15.
The virtual machine has a stack with a fixed size, EBP register contains the vm_esp value. After almost all push vm_instructions there’s a stack overflow check. The alignment is two bytes, “push byte_value” is not allowed and to push a single byte the virtual machine will extend the byte to a word value.

Is there a cmp/test instruction inside the snippet? Is there a reference to a vm_eip register? Seems like this virtual machine doesn’t need them. vm_eip is replaced by (esi-1), it’s not an eip per se but it *guides* the virtual machine. I haven’t all the vm_instructions on my notes but I think there are no direct cmp/test instructions. Seems like they are not included inside the virtual machine, strange.

From what I have seen there are more than 45 vm_instructions included in the virtual machine, to identify each vm_instruction you have to remove a lot of junk code. Once you have all the vm_instructions it’s not immediate to understand what the malware is trying to do.
Example: here are the vm_instructions used to patch a dword at 0×41CE06 (1° column represents the initial address of the vm_instruction, 2° column represents the name I gave to the vm_instruction):

Code:

401028: push_dword val      //    push F440C1CB

401028: push_dword val      //    push 8040414A

40F5BE: nor_stack           //    The value at vm_esp+4 is updated with a nor(vm_esp+4, vm_esp) operation

4105FA: pop_dword r_i       //    r_15 = 0×00000202

40F36F: push_dword r_i      //    r_0 = 0×0041CE05

401028: push_dword val      //    push 98754A9F

401028: push_dword val      //    push 43179031

40F198: push_dword vm_esp   //    push vm_esp

401396: mov_stack_pstack    //    mov dword ptr [vm_esp], dword ptr [dword ptr [vm_esp]]

40F25C: pop_word r_i        //    r_14 = 0×00009031

401028: push_dword val      //    push 678AB562

40F198: push_dword vm_esp   //    push vm_esp

40FEF3: push_bdword val     //    push 0×00000006, push a dword but the last 24 bits are 0, so it’s like a push byte extended to dword

410452: add_stack           //    add dword ptr [vm_esp+4], dword ptr [vm_esp]

4105FA: pop_dword r_i       //    r_15 = 0×216

40F0A0: pp_mov_dword        //    mov dword ptr [pop t1], (pop t2)

40F25C: pop_word r_i        //    r_11 = 0×015E4317

410452: add_stack           //    add dword ptr [vm_esp+4], dword ptr [vm_esp] <– 98754A9F + 678AB562 = 1

4105FA: pop_dword r_i       //    r_14

410452: add_stack           //    add dword ptr [vm_esp+4], dword ptr [vm_esp] <– 41CE05 + 1 = 41CE06

4105FA: pop_dword r_i       //    r_15

410171: mov_stack_pstack    //    mov dword ptr [dword ptr [vm_esp]], dword ptr [vm_esp+4] <– patch

Quite a simple patch operation, but the author didn’t use the straight way for sure. Believe it or not, this is the nature of the malware. Now you can understand the phrase: “Don’t think to trace the entire exe, it’s madness!”.

I tried inspecting some more samples of the same Kraken family. There are some similarities/differences:
- they are protected by a virtual machine too
- the routine used to select the next vm_instruction is not the same
- (I think) the vm_instructions are equal, but they are not defined in the same way. I mean, the code used to define a push is not the same but the result is the same infact in both cases you have a push vm_instruction
- the (encrypted)Instruction Table is not the same. At index i you won’t have the same vm_instruction for malware_x and malware_y
- the vm protection exists for the spawned file too

Now I fully understand the words used by the author of the interview, it’s complex to understand what’s going on…