Here is a little addition to the analysis made by Marion, her report is nice but I would like to add something about the way I used to automatically naming/resolving the imported functions.
The idea is to change instructions like:
.text:0040100B call dword ptr [ecx+220h]
into something like:
.text:0040100B call dword ptr [ecx+_API.malloc]
I shared this reversing session with Kayaker, so credit for this post goes to him too.
To perform this switch I have written an IDC script. There are two functions inside it, GetAPINames and ResolveAPINames. The first function is used to retrieve the name of all the hidden API while the other one will change the call instructions into a new readable version.
The script stores all the information inside a structure named _API which is filled with all the API names. The structure is necessary and I’ll use it for some minor manual fix too.
GetAPINames
There’s no trace of clear API names inside the disasm, everything is constructed at runtime inside call 402DB0, take a look at this piece of code (without unnecessary junk code lines):
00403E42 C6 44 24 24 43 mov [esp+179CCh+var_179A8], 'C'
...
00403E49 C6 44 24 25 72 mov [esp+179CCh+var_179A7], 'r'
...
00403EC1 C6 44 24 26 65 mov [esp+179CCh+var_179A6], 'e'
...
00403F14 C6 44 24 2F 61 mov [esp+179D4h+var_179A5], 'a'
00403F19 C6 44 24 30 74 mov [esp+179D4h+var_179A4], 't'
00403F1E C6 44 24 31 65 mov [esp+179D4h+var_179A3], 'e'
00403F23 C6 44 24 32 54 mov [esp+179D4h+var_179A2], 'T'
00403F28 C6 44 24 33 68 mov [esp+179D4h+var_179A1], 'h'
00403F2D C6 44 24 34 72 mov [esp+179D4h+var_179A0], 'r'
00403F32 C6 44 24 35 65 mov [esp+179D4h+var_1799F], 'e'
00403F37 C6 44 24 36 61 mov [esp+179D4h+var_1799E], 'a'
00403F3C C6 44 24 37 64 mov [esp+179D4h+var_1799D], 'd'
00403F41 C6 44 24 38 00 mov [esp+179D4h+var_1799C], 0
As you can see CreateThread string is obtained appending every single char. To create names the malware uses another similar way:
00406EC9 C6 84 24 BC 0F 00 00 47 mov [esp+179CCh+var_16A10], 'G'
00406ED1 C6 84 24 BD 0F 00 00 65 mov [esp+179CCh+var_16A0F], 'e'
...
00406F4C C6 84 24 BE 0F 00 00 74 mov [esp+179CCh+var_16A0E], 't'
...
00406FA5 C6 84 24 C7 0F 00 00 4D mov [esp+179D4h+var_16A0D], 'M'
00406FAD C6 84 24 C8 0F 00 00 6F mov [esp+179D4h+var_16A0C], 'o'
00406FB5 C6 84 24 C9 0F 00 00 64 mov [esp+179D4h+var_16A0B], 'd'
00406FBD C6 84 24 CA 0F 00 00 75 mov [esp+179D4h+var_16A0A], 'u'
00406FC5 C6 84 24 CB 0F 00 00 6C mov [esp+179D4h+var_16A09], 'l'
00406FCD C6 84 24 CC 0F 00 00 65 mov [esp+179D4h+var_16A08], 'e'
00406FD5 C6 84 24 CD 0F 00 00 46 mov [esp+179D4h+var_16A07], 'F'
00406FDD C6 84 24 CE 0F 00 00 69 mov [esp+179D4h+var_16A06], 'i'
00406FE5 C6 84 24 CF 0F 00 00 6C mov [esp+179D4h+var_16A05], 'l'
00406FED C6 84 24 D0 0F 00 00 65 mov [esp+179D4h+var_16A04], 'e'
00406FF5 C6 84 24 D1 0F 00 00 4E mov [esp+179D4h+var_16A03], 'N'
00406FFD C6 84 24 D2 0F 00 00 61 mov [esp+179D4h+var_16A02], 'a'
00407005 C6 84 24 D3 0F 00 00 6D mov [esp+179D4h+var_16A01], 'm'
0040700D C6 84 24 D4 0F 00 00 65 mov [esp+179D4h+var_16A00], 'e'
00407015 C6 84 24 D5 0F 00 00 45 mov [esp+179D4h+var_169FF], 'E'
0040701D C6 84 24 D6 0F 00 00 78 mov [esp+179D4h+var_169FE], 'x'
00407025 C6 84 24 D7 0F 00 00 41 mov [esp+179D4h+var_169FD], 'A'
0040702D C6 84 24 D8 0F 00 00 00 mov [esp+179D4h+var_169FC], 0
The way used to create GetModuleFileNameExA is pretty similar to the previous one but there’s a little difference, look at the opcodes. The mov instructions are similar but the ModR/M byte defines a distinct displacement.
Is it possible to recognize and isolate all the instructions used to create all those strings? Well, it’s not so hard because some bytes are fixed! The idea is to parse all the instructions inside 402DB0 trying to recognize those two special mov instructions:
Code:
if (Byte(currAddress) == 0xC6) {
if (Byte(currAddress+1) == 0x44) {
if (Byte(currAddress+2) == 0x24) {
if (Byte(currAddress+4) != 0x00) {
// Get current char and append it to partial name
szChar = sprintf("%c", Byte(currAddress+4));
szAPI = sprintf("%s", szAPI + szChar);
} else {
if ((strstr(szAPI, ".DLL"

== -1) && (strstr(szAPI, ".dll"

== -1))
{
// Add member to struct (no DLL name)
AddStrucMember(id, szAPI, -1, FF_DATA, -1, 4);
}
szAPI = sprintf("%s", ""

; // reset for next string
}
}
}
}
The instructions are checked byte by byte and the strings are created char by char. szChar is the current char to append to the partial string szAPI.
There’s a little problem with this parser, it constructs DLL names too. I’m not interested in DLL names, so a check over the formatted string is necessary:
Code:
if ((strstr(szAPI, ".DLL"

== -1) && (strstr(szAPI, ".dll"

== -1))
Now that I’m sure I don’t have a DLL name I can insert it into the structure:
AddStrucMember(id, szAPI, -1, FF_DATA, -1, 4);
The use of the structure is fondamental for the script.
Now that you know how to parse the first type of mov instruction you can easily change some checks over the fixed bytes and you’ll retrieve names like GetModuleFileNameExA too:
Code:
if (Byte(currAddress) == 0xC6) {
if (Byte(currAddress+1) == 0x84) {
if (Byte(currAddress+2) == 0x24) {
if (Byte(currAddress+5) == 0x00) {
if (Byte(currAddress+6) == 0x00) {
if (Byte(currAddress+7) != 0x00) {
...
This part of the script works pretty fine but it has a little problem with few functions. To understand it here is an example with GetQueuedCompletionStatus:
00405C2A mov [esp+179CCh+var_179A8], 'G'
00405C31 mov [esp+179CCh+var_179A7], 'e'
00405CA9 mov [esp+179CCh+var_179A6], 't'
00405CF3 mov bl, 'Q'
00405CFE mov [esp+179D4h+var_179A5], bl
00405D02 mov [esp+179D4h+var_179A4], 'u'
00405D07 mov [esp+179D4h+var_179A3], 'e'
00405D0C mov [esp+179D4h+var_179A2], 'u'
00405D11 mov [esp+179D4h+var_179A1], 'e'
00405D16 mov [esp+179D4h+var_179A0], 'd'
00405D1B mov [esp+179D4h+var_1799F], 'C'
00405D20 mov [esp+179D4h+var_1799E], 'o'
00405D25 mov [esp+179D4h+var_1799D], 'm'
...
As you can see the letter ‘Q’ is obtained by a sequence of two instructions and my script is not able to catch it; it creates GetueuedCompletionStatus name. It has been proved that human brain is able to recognize word without few letters or with scrambled letters so I think I can pass over this minor problem!
ResolveAPINames
Ok, now that I have the API structure I need to use it for the resolution part. The function ResolveAPINames scans the entire disasmed code trying to fix the necessary calls. To identify the call you can use a simple strstr function, and to convert it you can use OpStroff (it converts operand to an offset in a structure):
Code:
if(strstr(GetDisasm(ea), "call dword ptr"

!= -1) {
OpStroff(ea, 0, GetStrucIdByName("_API"

);
The malware uses a nice addressing method and IDA is not able to parse the hidden API but the nature of the addressing method lets us to solve the problem with some lines of code only. Now you can understand why the structure is the core of the entire script.
Manual fix
The script is able to resolve 678 calls, but it fails to fix some special cases like this:
00414E36 lea esi, [eax+1FCh]
00414E3C call dword ptr [eax+_API.GetTickCount]
00414E42 push eax
00414E43 call dword ptr [esi]
GetTickCount has been resolved but the next call not. It’s obvious that esi points to the API at offset 0x1FC. You can solve it manually because the structure contains it. Right click over 0x1FC and select the line “[eax+_API.srand]“. Now you know how to manually fix special cases too.