Know this round function? [Archive] - RCE Messageboard's Regroupment

View Full Version : Know this round function?

Extrarius

March 30th, 2006, 23:23

I'm new to reverse engineering, but I'm somewhat familiar with the relevant assembly language (x86) so I've been doing pretty well so far, but I've come across what is almost certainly a round function (meaning one step in some kind of encryption/decription and/or hash function) that I haven't been able to identify. If it had some kind of magic numbers that I could research or the like, it'd be easy (as was identifying a function that calculates the md5 of a string), but the only constants are 1 (the amount everything is shifted by) and 30 (the number of times the same code occurs in sequence and the reason I'm fairly certain it's a round function).

The program was coded in C/C++ and compiled with MSVC 2002 or 2003 with unknown settings (likely standard 'release' settings - I've found at least one instance of a function both inlined and not)

If you have any idea what algorithm the following represents, I'd greatly appreciate your thoughts:

Code:



	mov  edi, [ebp+arg_0]

	shrd edi, esi, 1

	mov  ebx, edi

	mov  [ebp+arg_8], edi

	shr  esi, 1

	shl  eax, 1

	mov  edi, edx

	mov  [ebp+arg_0], ebx

	sub  edi, ebx

	mov  ebx, ecx

	sbb  ebx, esi

	js   short L_Two

	jg   short L_One

	test edi, edi

	jb   short L_Two

L_One:

	mov  edx, edi

	mov  ecx, ebx

	inc  eax

L_Two:

Besides this code segment appearing 30 times in a sequence, there is a somewhat similar (in instructions used, anyways) setup sequence and exit sequence that likely act as the 1st and 32nd round, respectively. The start and sequences use all 3 arguments (arg_0, arg_4, arg_8), while the intermediate rounds only use 2 (as can be easily seen).

The function is not called in a loop, and it doesn't contain a loop itself, so it seems as if it's not an encryption or hash function, but I'm not sure what else would have 32 rounds. Also, arg_0 and arg_4 seem to be based on some function of delta-time in clock cycles(measured based on a stored reading of rdtsc and a current reading), while arg_8 is always an unknown global value that is to be initialized only at program startup, based on some calculation on the result of rdtsc.

Is it perhaps some kind of random number generator? It seems odd that one would require 30 rounds of the same few instructions (and that it would be unrolled when there seems to be little overall inlining or unrolling), but since it's not looped I can't really think of much else that would use such a sequence of operations on nearly-random input.

wtbw

March 31st, 2006, 20:05

Can you post the entire function? Or at least the start/end/some sample calls/...? Is eax the only result used? etc etc

I've done a vague conversion into c++ and analysed some results a bit (and asked some other people) and the general consensus is that it's like to be something to do with ecc codes/parity/interpolation or something... it looks far too regular to be a PRNG/hash function, and doesn't look like crypto. It has 32 "rounds" because there are 32 bits to set in eax, the result :-)

More info please :-)

Cheers,

Will

Extrarius

April 2nd, 2006, 06:26

The prologue is

Code:

arg_0 = dword ptr 8

arg_4 = dword ptr 0Ch

arg_8 = dword ptr 10h



   push ebp

   mov ebp, esp

   mov eax, [ebp+arg_8]

   push ebx

   mov ebx, [ebp+arg_4]

   push esi

   xor esi, esi

   shld esi, eax, 1Fh

   shl eax, 1Fh

   push edi

   mov edi, [ebp+arg_0]

   mov [ebp+arg_0], eax

   xor eax, eax

   mov edx, edi

   sub edi, [ebp+arg_0]

   mov ecx, ebx

   sbb ebx, esi

   js short Prologue_Two

   

   jg short Prologue_One

   

   test edi, edi

   jb short Prologue_Two



Prologue_One:

   xor eax, eax

   mov edx, edi

   mov ecx, ebx

   inc eax

   

Prologue_Two:

And the epilogue is:

Code:

   mov edi, [ebp+arg_0]

   shrd edi, esi, 1

   shr esi, 1

   shl eax, 1

   sub edx, edi

   mov [ebp+arg_8], edi

   pop edi

   sbb ecx, esi

   pop esi

   mov [ebp+arg_4], ecx

   pop ebx

   js short Epilogue_Two

   

   jg short Epilogue_One

   

   test edx, edx

   jb short Epilogue_Two



Epilogue_One:

   inc eax



Epilogue_Two:

   pop ebp

   retn

They're separated by exactly 30 instances of the above code (with the labels local to each instance). I'll investigate a bit to figure out what seems to be a 'typical' call sequence after some sleep, but like I said in my original post, in the few instances I've already looked at, all 3 parameters seem to be based in some way(one calculation to find arg 0 and 4 as the high and low parts {or low and high maybe, don't remeber} and arg 8 is a global only set once using a different calculation) on the timestamp counter read via rdtsc.

From what I remeber, eax is the only value used after return, but I'll investigate that as well to make sure.

Thanks for putting time and effort into trying to figure this out. I greatly appreciate the assistance

wtbw

April 2nd, 2006, 08:04

Well, it seems to have become startlingly obvious now :-)

Converting to c++ gave me:

Code:

DWORD func(DWORD arg0, DWORD arg4, DWORD arg8)

{

	__int64 i=arg8;

	i=i << 32; //so arg8 high, 0 low; prologue simulates the first i/=2



	__int64 j=arg4;

	j=(j << 32)+arg0; //so arg4 high, arg0 low



	DWORD x = 0; //result to return

	for(int z = 0; z < 32; z++) //loop 32 times              

	{

		i= i >> 1; //so dividing by 2 and rounding down

		x *= 2;     

		

		//if j is bigger than i, then subtract i and set the current bit in x

		if (j >= i)

		{

			j -= i;

			x++;

		}

	}

	return x;

}

Which, if you try it with some sample inputs, seems to give the equation it calculates as something like:
floor(j/i * 0x100000000)

(floor meaning round down to the nearest integer)

where i is a 64 bit int with arg8 as the high 32 and 0 as the low 32,
and j is a 64 bit int with arg4 as the high 32 and arg0 as the low 32.

If j>=i, the result is 0xFFFFFFFF.

This does seem to make vague sense if you think about what the subtraction/bit setting is doing, but I agree it's not intuitive by staring at the code!

It's possible some point of my conversion is screwed up (the floor/multiplicand might be slightly different... and negative values for i/j might have odd behaviour), but I'm pretty certain that's the general idea. I haven't tried directly using the asm you pasted; I suggest you do that yourself to confirm that I'm not talking rubbish (I reserve the right to be completely wrong).

So what's the point of this? It gives a nice way to compare j and i without using floating points. What it's used for specifically here is up to you to find out :-) I suggest investigating how arg8 is calculated as a good first step, and of course also what it does with the result.

Cheers,

Will

Extrarius

April 2nd, 2006, 15:41

Here is how the variable used for Arg8 is initialized. I must have been looking at the wrong place previously, because this code is quite understandable =-)

Code:

Time_500ms proc near

   var_4 = dword ptr -4



   push ebp

   mov ebp, esp

   push 8

   pop eax

   call __alloca_probe



   push esi

   push edi

   call _clock

   mov edi, eax

   rdtsc

   mov [ebp+var_4], edx

   mov esi, eax



Wait_500ms_Loop:

   call _clock

   sub eax, edi

   cmp eax, 1F4h

   jl short Wait_500ms_Loop

   

   rdtsc

   sub eax, esi

   pop edi

   mov [ebp+var_4], edx

   shl eax, 1

   pop esi

   leave

   retn

Time_500ms endp



;In 'main'

   call Time_500ms

   mov dword_5E5978, eax

;Later in 'main'

   mov eax, dword_5E5978

   shr eax, 6

   mov d_Unk_Arg8, eax

Here are three example calls to the function UNK_32_Rounds (my temporary name for the function of interest) :

Code:

   rdtsc

   mov edi, d_TimeStamp_Low

   mov ebx, d_TimeStamp_High

   sub eax, edi

   sbb edx, ebx

   mov ebp, eax

   mov eax, edx

   and eax, 80000000h

   xor ecx, ecx

   or eax, ecx

   jz short loc_42B6D7



   rdtsc

   mov edi, eax

   mov ebx, edx

   mov d_TimeStamp_High, ebx

   mov d_TimeStamp_Low, edi

   xor eax, eax

   jmp short loc_42B6E7



loc_42B6D7:

   mov eax, d_Unk_Arg8

   push eax

   push edx

   push ebp

   call UNK_32_Rounds



   add esp, 0Ch

   mov [esi+0FCh], eax

Code:

   rdtsc

   mov ebx, d_TimeStamp_Low

   mov esi, d_TimeStamp_High

   sub eax, ebx

   sbb edx, esi

   mov esi, eax

   mov eax, edx

   and eax, 80000000h

   xor ecx, ecx

   or eax, ecx

   jz short loc_44EADC



   rdtsc

   mov d_TimeStamp_Low, eax

   mov d_TimeStamp_High, edx

   xor eax, eax

   jmp short loc_44EAEC



loc_44EADC:

   mov eax, d_Unk_Arg8

   push eax

   push edx

   push esi

   call UNK_32_Rounds



   add esp, 0Ch

   mov dword_5E597C, eax

   mov [ebp+4594h], eax

Code:

   rdtsc

   mov edi, d_TimeStamp_Low

   mov esi, d_TimeStamp_High

   sub eax, edi

   sbb edx, esi

   mov esi, eax

   mov eax, edx

   and eax, 80000000h

   xor ecx, ecx

   or eax, ecx

   jz short loc_41962C



   rdtsc

   mov d_TimeStamp_Low, eax

   xor eax, eax

   pop edi

   mov d_TimeStamp_High, edx

   mov dword_5E597C, eax

   mov dword_19D7B10, eax

   pop esi

   retn 4



loc_41962C:

   mov ecx, d_Unk_Arg8

   push ecx

   push edx

   push esi

   call UNK_32_Rounds



   add esp, 0Ch

   pop edi

   mov dword_5E597C, eax

   mov dword_19D7B10, eax

   pop esi

   retn 4

I really apprecaite your help with this. This is one of the most helpful forums I've seen =-)
If you have any pointers on general reading materials / resources etc to help with reversing, I'd apprecaite such information also. So far, I have a few assembly books (including intel's instruction set manuals) and I have the book "Reversing: Secrets of Reverse Engineering" which was somewhat helpful (mostly as a guide to assembly idioms - I know assembly well enough to do basic things, but am not proficient enough to always see the patterns).

wtbw

April 2nd, 2006, 15:58

Well, this is okay :-)

Time_500ms is getting the number of processor ticks in 1 second (notice the shl 1 at the end).

It then passes this as arg8 (which as I said above is the *high* dword for i).

A difference between two tickstamps is passed as j.

Therefore the equation now becomes:

ticksindifference/(tickspersecond*0x100000000) * 0x100000000

So the function is returning the number of seconds that has passed between the two tickstamps! (I call them tickstamps because they're measuring processor ticks returned by rdtsc, not an actual "time"

As it seems to have access to a _clock function anyway, I would guess that this is some sort of antidebugging thing.

*edit* As for reading materials, I think the best way is just intuition and experience... follow the registers/variables, make notes, try sample inputs, make predictions and guesses, analyse further. Assembly is just the means, so the manuals and so on will help you learn the syntax and meanings, but reversing is hard to learn from books I think. Maybe I just haven't found a good one!

*edit2* Oh wait, I didn't see the shr eax, 6 before it sets the global... fine, it measures 64ths of seconds then

Cheers,

Will

Extrarius

April 2nd, 2006, 17:21

Thanks for the help! As far as being some kind of anti-debugging, it's possible, but I really don't think so because the program doesn't seem to have anything of that sort (besides being poorly programmed and HUGE), but it is possible I suppose. As far as I can tell, the result is simply stored away and then never referenced again (at least, all the references IDA Freeware finds to the two globals "dword_5E597C" and "dword_19D7B10" are operations storing values into them). It'll take a while to trace down the object referenced by esi, but perhaps it's the only result actually used and the other two variables are 'global temporaries' left by the optimizer or something like that? I think one of my biggest obstacles is simply disbelief that optimzers really are so bad in some cases that are obvious to me when looking at the dissassmbly.

wtbw

April 2nd, 2006, 17:27

Well, does it ever use those values as constants for indirect access later? Or look in IDA around them, maybe they're accessed as members of a struct or array or something (although if they're being written to it's unlikely).

Try using IDA's search for immediate function with the two offsets +FCh and +4594h, maybe that will find something. Or run the application if you can, and use hardware breakpoints on memory access on all the places the results are stored.

Good luck

Extrarius

April 2nd, 2006, 17:49

Well, the best evidence is that I can debug it just fine with OllyDbg without any problems, and I've been using that along with IDE Freeware the whole time =-)
The reason I needed to know what it was is simply to fit a puzzle piece to a hole that could have been important. Since it's not what I thought, it's not as important, but at least I know I need to keep looking.

The root problem is that I'm trying to figure out a network protocol, which from what people tell me is the most difficult thing to do in reversing (besides reversing anti-reversing code =-), and in logging packets I noticed that the logon packet contains the user name as plaintext, but the password is nowhere to be found. I know it's hashed using MD5, but even the MD5 doesn't appear anywhere. I thought perhaps this function was involved in modifying the password, but writing this thread (and your responses) helped me understand that this is really a peripheral function to my quest =-)

I'll probably be back with more obscure code, though, because this executable is full of such weird things (that might not be weird at all in the grand scheme, but are certainly odd to me =-)

LLXX

April 2nd, 2006, 19:22

Maybe the code is just testing the speed of the processor?

Quote:

[Originally Posted by Extrarius]and in logging packets I noticed that the logon packet contains the user name as plaintext, but the password is nowhere to be found. I know it's hashed using MD5, but even the MD5 doesn't appear anywhere. I thought perhaps this function was involved in modifying the password

A second layer of hashing with a random "seed" might be involved, so that someone cannot simply resend the login packet to login without knowing the password. If the login packet is different every time, this might be it.

Maximus

April 3rd, 2006, 07:44

Quote:

[Originally Posted by Extrarius]I know it's hashed using MD5, but even the MD5 doesn't appear anywhere.

This is impossible. You cannot (almost) md5 without its fixed constants. have you tried KANAL or CryptoSearch on the unpacked code?

wtbw

April 3rd, 2006, 07:46

I think he means that the md5 hash does not appear in the login packet anywhere.

Extrarius

April 3rd, 2006, 19:53

Quote:

[Originally Posted by wtbw]I think he means that the md5 hash does not appear in the login packet anywhere.

Indeed. The constants were how I identified it as MD5, but even after doing so and seeing that the password is hashed using standard MD5 (at least, it matched the values returned by a few online hash scripts), I couldn't find any trace of the password (in any form) in the logon packet.

wtbw

April 3rd, 2006, 21:12

Have you tried putting a hardware breakpoint on access on the resulting md5? See where it goes and what it does with it.

Are the packets the same each time or different (with the same user/pass)?