Log in

View Full Version : IDC scripting a Win32.Virut variant - Part 1


Kayaker
December 19th, 2007, 04:10
I had started live tracing this piece of malware when I realized it was a prime candidate for some IDA idc scripting. There are a few things about the program which makes static analysis difficult.

It's encrypted. It's a rather simple encryption however which is easily scripted out.
The code is primarily executed in the .data section and there are many inline character strings and non-standard code instructions which prevent IDA from getting an accurate disassembly.
Variable pointers and import calls are referenced as EBP offsets, so IDA can't recognize absolute addresses to create the proper Xrefs, autogeneration and all the other wonderful analysis it normally performs.
Imports are determined dynamically through GetProcAddress, so until we define them IDA can't recognize them.
Let's address each of these problems through a bit of idc scripting and static analysis to supplement our live tracing. In Part 2 I'll mention a few points about the viral code itself.

I've included several files in the attachment, the idc scripts, a header file prototyping some functions and structures not included in the internal IDA definitions. As well there is as an IDB file in IDA 4.9 freeware version format which is fully commented with all functions and variables defined (or at least named, this is meant to be a "working" disassembly for further analysis, not necessarily a definitive treatise). The IDB file can be opened in any IDA version 4.9 and above.

Also included is of course the virus, or else what fun would this be? The win32_virut.exe file has been renamed with a .VXE extension and zip password protected with the password malware. It is quite an infectious file, but it readily detects a normal virtual machine sandbox and won't infect under those conditions. You actually have to force the code to decrypt its payload under a VM. Also, the remote site it tries to connect to has been closed for Terms Of Agreement violations (ya think?), so no live connection is ever made.

You may find it easier to read the idc scripts by downloading the originals or by reading this post in the Blogs Forum (follow the link under 'Post or View Comments' at the bottom of this blog).


A brief description of the virus family from
http://www.bitdefender.com/VIRUS-1000163-en--Win32.Virtob

Quote:
This virus is a polymorphic, memory-resident file-infector, with backdoor behaviour. Once executed, it injects itself into WINLOGON, creates a new thread in that process, and passes the execution control to the host file.

It also hooks the following functions in each running process (in NTDLL module):
NtCreateFile, NtOpenFile, NtCreateProcess, NtCreateProcessEx
so that every time an infected process calls one of these functions, the execution is passed to the virus, which infects the accessed file, and then returns the control to the original function.

It infects EXE and SCR files, using different infection techniques:
Appending to the last section of the victim, and setting the Entry Point directly to the viral code. (our variant)

The virus is able to avoid emulators and virtual machines. To ensure there's only one instance of it running in the system, it creates an event with one of the following names:
VT_3, VT_4, VevT, Vx_4

It tries to connect to some IRC server, and join a certain channel. Once it joins the channel, it waits for commands that instruct it to download several files from Internet, and then execute them. The IRC server can be:
proxim.ntkrnlpa.info (our variant, site no longer active)
Much of what is written above can be figured out by live tracing the malware and eventually letting it infect our sandbox. But let's see what damage we can do to it before it does damage to us..


Step 1: Decrypt

Here is the Entry Point of the virus, which is in the .data section:
Code:
:00403200 cld
:00403201 call loc_40322E
:00403206 push ebx
Take note that the return address of the Call pushed onto the stack will be 0x403206. Trace into the Call and after a bit of preliminary code we reach here:

Code:
:00403257 mov ebp, [esp+4]
// call return of 0x403206 placed in ebp

:0040325B sub dword ptr [esp+4], 21E9h
// new return address of (0x403206 - 0x21E9) = 0x40101D placed on stack
...
:0040326A sub ebp, 301006h
// ebp offset becomes (0x403206 - 0x301006) = 0x102200
:00403270 lea eax, [ebp+301082h]
// eax = (0x102200 + 0x301082) = 0x403282
// this is the starting address of the encrypted code
:00403276 mov dx, [eax-65h]
// word pointer at 0x40321D is a decryption seed value: db 8Eh, 0C8h
:0040327D call sub_403206 // Decryption routine
// the code from here on down is all encrypted
:00403282 db 65h
:00403282 enter 0BDDh, 0C1h
:00403287 push ss
The fact that none of the code from address 0x403282 onwards doesn't make much sense indicates that Call sub_403206 is a decryption routine. Let's take a look at that call:

Code:
:00403206 Decrypt proc near ; CODE XREF: :0040327D
:00403206 push ebx
:00403207 mov ecx, 0DA5h
:0040320C mov ebx, edx
:0040320E
:0040320E loc_40320E: ; CODE XREF: Decrypt+13
:0040320E xor [eax], dx
:00403211 lea eax, [eax+2]
:00403214 xchg dl, dh
:00403216 lea edx, [ebx+edx]
:00403219 loop loc_40320E
:0040321B pop ebx
:0040321C retn
:0040321C Decrypt endp
:0040321C
:0040321C ; ---------------------------------------------------------------
:0040321D Initial_Decrypt_Seed db 8Eh, 0C8h
:0040321F; ----------------------------------------------------------------
A simple XOR loop decryption where the xor value is modified on each iteration by the XCHG instruction. ECX is a counter decremented by the LOOP opcode. The initial decryption seed value is the db 8Eh, 0C8h we discovered above.

Armed with this small bit of analysis we can create the following idc script for decrypting.


Code:
#include <idc.idc>
// Step 1: idc to decrypt section between .data:0x403282 and .data:0x404DCC

// performs the equivalent asm function (xchg dl, dh)
#define bswap16(x) \
((((x) & 0xff00) >> 8) |
(((
x) & 0x00ff) << 8))

static
main()
{
auto startdecrypt, size, enddecrypt, seed, ea, decryptword, x;


// starting values determined from decrypt function

startdecrypt = 0x403282;
size = 0x0DA5 * 2; // word size replacement
enddecrypt = (startdecrypt + size); // = 0x404DCC
seed = 0xC88E;

ea = startdecrypt;
decryptword = seed;

Message("\nDecrypting... \n");
while (
ea < enddecrypt)
{
// (xor [eax], dx)
x = Word(ea); // fetch the word
x = (x ^ decryptword); // decrypt it
PatchWord(ea, x); // put it back
decryptword = bswap16(decryptword); // xchg dl, dh
decryptword = decryptword + seed; // lea edx, [ebx+edx]

ea = ea + 2;
}

// Let's try to get IDA to reanalyze the code
MakeUnknown (startdecrypt, size, 1);
AnalyzeArea (startdecrypt, enddecrypt);
Message("...Done \n");
}


After running this script you MUST go through the decrypted section and manually resolve the embedded string pointers with the IDA A(scii) command and any unrecognized or incorrect disassembly with the C(ode) command. This is a necessary step for the subsequent IDC scripts to work properly!

You will find a lot of things like the following, which you need to make sure is correctly resolved. By itself IDA won't properly disassemble the code.

Code:
:004032D8 E8 0D call loc_4032EA
:004032D8 ; --------------------------------------------
:004032DD 47 65+ aGetlasterror db 'GetLastError',0 // LPCSTR lpProcName
:004032EA ; --------------------------------------------
:004032EA
:004032EA loc_4032EA: ; CODE XREF: :004032D8
:004032EA 03 F3 add esi, ebx
:004032EC 53 push ebx // HMODULE hModule
:004032ED FF D6 call esi // GetProcAddress
Notice the neat little trick in the above code of how the second parameter of GetProcAddress is automatically pushed onto the stack by effectively being the return address of Call loc_4032EA, which jumps over the string. This type of thing is repeated throughout the program.

Chances are you won't get every bit of disassembly and ascii string identified correctly the first time through a manual fixup, but after applying the subsequent idc scripts those problem areas should be identified and you can go back and correct them before running the scripts again. You'll find odd things such as wsprintf format strings, non-null terminated string blocks with xrefs to parts of them, call instructions where the offset displacement is dynamically calculated, etc.


Step 2: Resolve EBP offsets to real addresses

You'll notice after decrypting the file that variable pointers and import calls are in the form of [ebp+30xxxx]. We've already determined above that EBP = 0x102200, so we simply need to calculate the real address used and replace the operand.

Rather than just replacing the operand text itself with the calculated real address, say by using the idc command
string AltOp (long ea,long n); // get manually entered operand

we will actually patch in the proper displacement in the hex bytes with
PatchDword (long ea,long value);

After handling each affected instruction we need to undefine it with
MakeUnkn (long ea, long expand);

and have IDA reanalyze with
AnalyzeArea (long sEA,long eEA);

The operands should be converted to a real offset and the proper xrefs resolved for each instruction.

We also use the idc commands

long GetOperandValue (long ea,long n); // get instruction operand value
string GetOpnd (long ea,long n); // get instruction operand

i.e. for the instruction
mov [ebp+302BD5h], eax

GetOpnd (ea,0); returns the string "[ebp+302BD5h]"
GetOperandValue (ea,0); returns 0x00302BD5

After patching in the real address and reanalyzing the code the operand will be rewritten with an "ss:" prefix and/or "[ebp]" suffix.

i.e. the previous example will resolve to:
ss:dword_404DD5[ebp]

We don't really want that so we can remove those string components by parsing them out. That will be the job of the next idc script. That step could be added here, but for demonstration purposes I keep them separate.


AnalyzeArea might not resolve all the instructions properly the first time through, so a second pass is necessary to convert any instructions that are still in the form of [ebp+xxxxxxxx]. This usually occured where we(I) didn't make the proper manual corrections to the disassembly or inline ascii strings after running the decryption idc script. We can use the GetFlags(long ea) command to get the internal flags for the operand definition and deal with each type individually.

Any problem operands remaining will be pointed out the by the idc script, and will also be highlighted in red by IDA. These should be handled manually. For example, the virus may create a dynamically determined call offset or otherwise change an instruction. IDA resolves these as Xrefs into the middle of an instruction, but doesn't quite get the syntax right when running AnalyzeArea through the idc script. However, if you right click on the errant operand you will probably find a more accurate selection.

Enough of the preamble, I just wanted to touch on a few points of using these idc commands.
Here's the second script:


Code:
#include <idc.idc>
// Step 2: idc to resolve EBP offsets to real addresses


static resolve_offsets(ea, n)
{
auto OpVal, realaddress, patchoffset, i;

OpVal = GetOperandValue(ea, n);

if (
OpVal > 0x400000)
{
return;
// we've already converted this operand
}
// calculate the real address
realaddress = GetOperandValue(ea, n) + 0x102200;

// calculate the offset where the operand begins in the instruction
for (i = 0; i < ItemSize(ea) - 3; i++)
{
if (
Dword(ea + i) == OpVal)
{
// Pattern found
patchoffset = (ea + i);
}
}

// patch in the real displacement
PatchDword(patchoffset, realaddress);

// undefine the instruction so it will be reanalyzed fresh later
MakeUnkn (ea, 0);
}

static
main()
{
auto startea, endea, ea, n, nextea, OpVal, uFlags, count1, count2, count3;

startea = 0x403270; // first occurence of [ebp+30xxxx] offset
endea = 0x404DCC; // determined from idc in Step 1


// Use some counters to check that all operands were handled properly.
// Remaining errors likely mean we didn't make the correct analysis
// after running the decrypt script in Step 1.
// Go back, correct those instructions and rerun this script.

count1 = 0;
count2 = 0;
count3 = 0;



/////////////////////////////////////////////////////////////////////////

// Step 1:
// Convert operands of the form "[ebp+30xxxxh]" to a real offset

/////////////////////////////////////////////////////////////////////////
ea = startea;

Message("\nConverting EBP offset operands to real addresses \n");

while (
ea != BADADDR)
{

// calculate next instruction pointer before we modify anything
nextea = NextHead(ea, endea);


// check both the first(0) and second(1) operand of the instruction
for (n=0; n<2; n++)
{

// for all instructions with an offset in the form of "[ebp+"
if( strstr( GetOpnd (ea, n), "[ebp+" ) != -1 )
{

count1 = count1 + 1;
resolve_offsets(ea, n);

}
}

ea = nextea; // next instruction

}

// Reanalyze
AnalyzeArea (startea, endea);




/////////////////////////////////////////////////////////////////////////

// Step 2:
// Make a second pass at autoanalysing operands
// still in the form of "[ebp+"

/////////////////////////////////////////////////////////////////////////

ea = startea;

Message("Running a second pass at autoanalysis \n");
while (
ea != BADADDR)
{

nextea = NextHead(ea, endea);

for (
n=0; n<2; n++)
{

// for all instructions with an offset in the form of "[ebp+"
if( strstr( GetOpnd (ea, n), "[ebp+" ) != -1 )
{
count2 = count2 + 1;

// Get operand value
OpVal = GetOperandValue(ea, n);

// Get value of internal flags to see how IDA
// has defined the operand to this point
uFlags = GetFlags(OpVal);

if(
isData(uFlags))
{

// If operand offset is already defined as 'data'
// then we only need to reanalyze the instruction
// to get IDA to resolve the xref

// undefine the instruction so it will be reanalyzed fresh
MakeUnkn (ea, 0);


} else

if(
isUnknown(uFlags))
{

// If operand offset is defined as 'unknown', create
// a data xref at the operand address and reanalyze
add_dref(ea, OpVal, XREF_USER | dr_O);
MakeUnkn (ea, 0);


} else {

// GetFlags(OpVal) indicates that what is left over is
// defined as 'isTail'. Undefine both the operand address
// and the calling instruction and let IDA reanalyze
MakeUnkn (OpVal, 0);
MakeUnkn (ea, 0);
}
}
}

ea = nextea;

}

// Reanalyze
AnalyzeArea (startea, endea);


/////////////////////////////////////////////////////////////////////////

// Step 3:
// Finally, let's inform ourselves of which instructions are still
// in the form of "[ebp+" and should be checked manually.
// The offsets will be highlighted in red by IDA as well.

/////////////////////////////////////////////////////////////////////////

ea = startea;
Message("The following instructions (if any) are still in error and \
should be fixed manually before rerunning this script \n"
);

while (
ea != BADADDR)
{

nextea = NextHead(ea, endea);

for (
n=0; n<2; n++)
{

// for all instructions with offset *still*
// in the form of "[ebp+"
if( strstr( GetOpnd (ea, n), "[ebp+" ) != -1 )
{

count3 = count3 + 1;
Message("%d 0x%08X %s \n", count3, ea, GetOpnd (ea, n));

}
}

ea = nextea;

}


Message("\n%d / %d operands analysed correctly on first pass \n",
count1-count2, count1);
Message("%d / %d operands corrected on second pass \n",
count2-count3, count1);


/////////////////////////////////////////////////////////////////////////

Message("...Done \n");
}



Step 3: Parse out unwanted operand text


Code:
#include <idc.idc>
// Step 3: idc to parse out unwanted text
// from an operand such as "ss:dword_404DD5[ebp]"


static clean_text(ea, n)
{

auto OldOpStr, TempOpStr, NewOpStr, pos, beforestr, afterstr;

beforestr = 0;
afterstr = 0;
OldOpStr = GetOpnd (ea, n);

// find position of "ss:" if present and remove it

pos = strstr(OldOpStr, "ss:");

if(
pos != -1) // contains substring
{
beforestr = substr(OldOpStr, 0, pos);
afterstr = substr(OldOpStr, pos+3, -1);

// combine string parts without "ss:"
TempOpStr = beforestr + afterstr;


} else {

TempOpStr = OldOpStr;

}


// find position of "[ebp]" if present and remove it
pos = strstr(TempOpStr, "[ebp]");

if(
pos != -1)
{

beforestr = substr(TempOpStr, 0, pos);
afterstr = substr(TempOpStr, pos+5, -1);

// combine string parts without "[ebp]"
NewOpStr = beforestr + afterstr;

OpAlt(ea, n, NewOpStr); // replace the operand

}
}


static
main()
{
auto startea, endea, ea, n;


startea = 0x403270; // first occurence of [ebp+30xxxx] offset
endea = 0x404DCC; // determined from idc in Step 1

ea = startea;
Message("\nCleaning up operand syntax... \n");

while (
ea != BADADDR)
{

// check both the first(0) and second(1) operand of the instruction
for (n=0; n<2; n++)
{

// for all instructions where we find "ss:" or "[ebp]"
if( strstr( GetOpnd (ea, n), "ss:" ) != -1 ||
strstr( GetOpnd (ea, n), "[ebp]" ) != -1 )
{

clean_text(ea, n);

}
}

ea = NextHead(ea, endea); // next instruction

}


Message("...Done \n");
}



Step 4: Resolve API names


Immediately after the code is decrypted by the program it retrieves the offset of GetProcAddress by finding the base of kernel32.dll and parsing through its export table. All other import addresses, including those it hooks from ntdll.dll, are obtained by using GetProcAddress.

The import names it wants are in an ascii table and a simple routine is used for each dll it searches for import addresses.
For example,
Code:
:00403369 lea ESI, aLstrcat ; "lstrcat"
:0040336F xor ecx, ecx
:00403371 lea EDI, dword_404DE9
:00403377 mov CL, 24h
:00403379 call GetProcAddress_Routine
ESI is the start of the API name table, which begins here:

Code:
:004037B3 aLstrcat db 'lstrcat',0 ; DATA XREF: :00403369t
:004037BB aLstrlen db 'lstrlen',0
:004037C3 aCreatefilea db 'CreateFileA',0
:004037CF aCreatefilemapp db 'CreateFileMappingA',0
...
EDI is the start of the table where it places the API addresses.
ECX (CL) contains the number of import names to find for this particular dll.
The Call is a simple LOOP which calls GetProcAddress for each import and stores their offsets.


Having cleaned up the disassembly with the first 3 idc scripts we can easily find where this GetProcAddress routine is cross referenced in the file and get the necessary values for each of the 5 dlls in order to resolve API names with the following script:


Code:
#include <idc.idc>
// Step 4: idc to resolve import calls and enter their name

static patchapi(apinametable, apiaddresstable, numapis)
{
while (
numapis != 0)
{

if (!
MakeNameEx(apiaddresstable,GetString(apinametable, -1,
ASCSTR_C),SN_AUTO))
{

// we will get an error because LoadLibraryA is already defined
// rename as LoadLibraryA_0

Message("API name already in use, renaming as %s \n",
GetString(apinametable, -1, ASCSTR_C)+"_0");

MakeNameEx(apiaddresstable,
GetString(apinametable, -1, ASCSTR_C)+"_0",SN_AUTO);

}

apinametable = NextHead(apinametable, BADADDR);
apiaddresstable = apiaddresstable+4;
numapis = numapis - 1;

}
}

static
main()
{
Message("\nResolving API names... \n");

patchapi(0x4037B3, 0x404DE9, 0x24);
patchapi(0x4039BE, 0x404E79, 0x0D);
patchapi(0x403B5F, 0x404EDD, 0x04);
patchapi(0x403AB6, 0x404EAD, 0x07);
patchapi(0x403AF4, 0x404EC9, 0x05);

Message("Game over \n");
}



Step 5: Apply C header file

The last step is to read in the header file, defines.h, with the IDA menu command File/Load file/Parse C header file (Ctrl+F9). This file contains some of the function prototypes and structures not defined by default by IDA, primarily the ntdll imports.

You'll probably notice that the parameter definitions for import calls are not always propagated correctly, some may have them, some may not. There are a few things that may help, that fall into the category of "dealing with IDA quirks".

Make sure the code containing the import(s) is within a defined function (Create function).
Make sure the function has a proper endpoint, i.e. some of the virus function blocks may end with a JMP (Set function end).
Select (Edit function). Don't make any changes, just close the dialog box. This seems to force IDA to reanalyze the function and often redefine and propagate any import parameters correctly.
Right click on the import function, undefine and then redefine as Code. Again, this seems to work for some cases.
Once all this "prettying up" of the disassembly is done you can finally get to the fun part of analyzing the program.

The included IDB file has most of the virus functionality in the .data section defined in a general way. The .text section has a small decryption function I didn't bother detailing, it is most easily dealt with under a debugger and is completely safe to let run under a closed sandbox environment.


Again, the idc scripts, IDB file and virus are in the attachment, the exe has been renamed .vxe and zip protected with the password malware

Part 2 ("http://www.woodmann.com/forum/showthread.php?t=11075") of this post will follow.


http://www.woodmann.com/forum/attach/zip.gif Win32_Virut_Analysis.zip ("http://www.woodmann.com/forum/blog_attachment.php?attachmentid=3&d=1198051304") (183.2 KB)

FaTaL_PrIdE
December 19th, 2007, 05:50
Nice work Kayaker and most of all, thanks for sharing. I'm trying to use IDA 4.9 more and more (previously only used Olly as this is a more of a distraction/hobby than profession) so this was interesting to read through.

Quote:

Notice the neat little trick in the above code of how the second parameter of GetProcAddress is automatically pushed onto the stack by effectively being the return address of Call loc_4032EA, which jumps over the string. This type of thing is repeated throughout the program.

Not sure how widely this is used but I think I've seen it before in the adata section of Armadillo protected files (more than willing to be corrected on that one ). I think the calls to GetProcAddress and LoadLibraryA use this technique before the list of APIs is decrypted and the first 5 bytes checked for 0xCC breakpoints.

Thanks again for sharing - off to read part 2

JMI
December 19th, 2007, 10:13
Great analysis, as always Kayaker.

Regards,

RolfRolles
January 1st, 2008, 16:16
Thanks for sharing -- very useful stuff for IDC newbies, and superior to DataRescue's own IDC/decryption/malware analysis example at http://www.datarescue.com/idabase/idacdecrypt.htm

Kayaker
January 1st, 2008, 16:51
Thank you Rolf, and others. As you can probably tell from the comments I left in the idc script, that was the example I started from for the decryption part.

The ExtraPass plugin by Sirmabus was helpful too for devising a strategy to get a corrected disassembly using the GetFlags() command.

Regards,
Kayaker