Log in

View Full Version : Tutorial: finding encryption code


mike
March 13th, 2002, 16:36
Here's a short tutorial on finding encryption code within a program.

First, you have to know what you're looking for. Is it hashing a password? Is it encrypting a file? Is it doing a digital signature?

Read any docs on the security of the product; often they'll name specific algorithms, which will narrow your search by a lot.

Hash functions

There are a few standardized hashes that are used all the time. MD5 and SHA are two of them. A hash function takes a block of data and mixes it into some state registers. The final state of the registers is output as the hash. The initial state, though, is always the same. You can search the exe for these constants to find the hashing code.

Hashing code is also full of small nonlinear functions; lots of and's, or's, and xor's, very densely packed. Hashing code doesn't have jumps--it's pages and pages of bit operations.

A hash will nearly always have three calls associated with it: the init function, the update function, and the finalize function. Finalize pads the end of the data out to a certain size and appends the length in bits to form the last block. This prevents certain attacks.

Stream ciphers

Encrypting a file is often done with a stream cipher like RC4. RC4 has an initialization loop where it fills a 256-byte block with the values 0-255.

Also, stream ciphers are going to XOR the plaintext with the random output bytes. I found the encryption code for WordPro by dumping the exe in WDASM. I wrote a small program to search for XORs where the first and second parameters weren't the same. I only had to glance at the code surrounding about 20 of them before I found it.

Block ciphers

There are also a few popular block ciphers that get used a lot. DES used to be popular; Blowfish, IDEA, are some others, and now that AES is in place, Rijndael will become much more popular. In order to make these run fast, there are optimizations that usually occur as precomputed tables (especially Blowfish). You can search on these popular implementations.

Block ciphers generally work on 64-bit or 128-bit chunks of data in a big loop.

Public Key

Public Key crypto is only done well with bignum code. Operations nearly always include modular exponentiation. Modular exponentiation is done in a loop over the bits of the exponent, where the number is squared, conditionally multiplied, and then reduced.

Good luck, and happy cracking!

crUsAdEr
March 13th, 2002, 22:19
Thanx Mike,

Was looking for something like this myself :>...

Keep up the great work :>

Cheers

crUsAdEr
March 28th, 2002, 23:09
The Advanced Archive Password Recovery (2.0) story

As requested by Mike, here is my little contribution to the community, how i cracked Archpr 2.0... as this is only about crypto, i will not discuss any other aspect beside the Encryption routine..

The app is AsProtected so we have to unpack it first... After unpacking, trying to make the app register you will hit the registration check routine at 40A071... I patch it so that it always return eax=1, the app looks registered but trying bruteforce a 6 characters Zip password will cause a crash....

Looking into the registration routine again, you will see this
.text:0041A0A0 lea ebx, [eax-2]
.text:0041A0A3 lea eax, [ebp+var_6C]
.text:0041A0A6 call Call1
.text:0041A0AB lea edi, [esi+4]
.text:0041A0AE mov edx, edi
.text:0041A0B0 lea eax, [ebp+var_6C]
.text:0041A0B3 call Call2
.text:0041A0B8 lea edx, [ebp+var_6C]
.text:0041A0BB lea eax, [ebp+var_10]
.text:0041A0BE call Call3
.text:0041A0C3 mov ebx, 10h
.text:0041A0C8 mov edx, offset unk_43A160
.text:0041A0CD lea eax, [ebp+var_10]
.text:0041A0D0 call Compare10bytes
.text:0041A0D5 test eax, eax
.text:0041A0D7 jnz short bad
.text:0041A0D9 lea eax, [ebp+var_6C]
.text:0041A0DC call Call1
.text:0041A0E1 mov ebx, ecx
.text:0041A0E3 mov edx, esi
.text:0041A0E5 lea eax, [ebp+var_6C]
.text:0041A0E8 call Call2
.text:0041A0ED lea edx, [ebp+var_6C]
.text:0041A0F0 lea eax, [ebp+var_10]
.text:0041A0F3 call Call3
.text:0041A0F8 lea eax, [ebp+var_10]
.text:0041A0FB call FindEqualHash
.text:0041A100 test eax, eax
.text:0041A102 jz exit

Look into Call1 we'll see this
.text:0041A58B mov dword ptr [eax], 67452301h
.text:0041A591 mov dword ptr [eax+4], 0EFCDAB89h
.text:0041A598 mov dword ptr [eax+8], 98BADCFEh
.text:0041A59F mov dword ptr [eax+0Ch], 10325476h
.text:0041A5A6 mov dword ptr [eax+10h], 0
.text:0041A5AD mov dword ptr [eax+14h], 0
.text:0041A5B4 mov dword ptr [eax+58h], 0
.text:0041A5BB retn

yep, looks like some kind of hashing... refer to Mike's note
*******************************************
A hash will nearly always have three calls associated with it: the init function, the update function, and the finalize function.
*******************************************
Look into Call2 we will see that it is "also full of small nonlinear functions; lots of and's, or's, and xor's, very densely packed. Hashing code doesn't have jumps--it's pages and pages of bit operations."... it only make a few call to the same routine
call sub_41A9E2

look into this call we will see more bits operation and see lots of constant
2895b588
173848AA
242070DB
3E423112
0A83F051
4787C62A
etc...

To quote Dakien
*****************************************
You'll notice that NEG(2895B588h)=D76A4A78h. (The same goes for the other values you've given us)
The hash uses Sub EDI,2895B588 instead of Add EDI,D76A4A78. The only thing which has changed is the sign, not the result.
*****************************************

Yep, so this definitely help us identify that it is MD5, no more doubt about it....

So now we can safely replace Call1, Call2, Call3 by M5Ini, MD5Main and MD5Finalise...

now change that with IDA and we'll see what the registration routine is doing, first it check the middle part of our serial with a fixed hardcoded hash, then it find the MD5 Hash of the whole serial and compare with a look up table containing something like 1000 hardcoded hash... if either check fails error message pops up.. if both are passed then AsProtect API is called to decrypt code...

---------------------------------------------
to be cont.

crUsAdEr
March 28th, 2002, 23:27
Okie, now after we find the AsProtect API routine, step into it we'll see
heap:00B8C64C push ebp
heap:00B8C64D mov ebp, esp
heap:00B8C64F push ebx
heap:00B8C650 push esi
heap:00B8C651 mov esi, [ebp+0Ch]
heap:00B8C654 mov ebx, [ebp+8]
heap:00B8C657 push esi
heap:00B8C658 mov ecx, ebx
heap:00B8C65A mov edx, ds:length ; edx := 9C
heap:00B8C660 mov eax, dsffset
heap:00B8C665 call sub_B8C470 ;STOP here
heap:00B8C66A push esi
heap:00B8C66B push ebx
heap:00B8C66C call sub_B8C678
heap:00B8C671 pop esi
heap:00B8C672 pop ebx
heap:00B8C673 pop ebp
heap:00B8C674 retn 8

Okie, trace until line B8C665... look at data stored in edx we'll see edx = 9Ch, look at data pointed by eax we'll see some garboge, look at ecx we'll see part of pur serial.... Okie, trace over the call and you will notice data pointed by eax have changed.... Name the call sub_BBC470 as Call Something

Continue tracing, try tracing over the second call and the prog will crash... so restart again... trace into the second call this time, you will see this
heap:00B8C678 push ebp
heap:00B8C679 mov ebp, esp
heap:00B8C67B push ebx
heap:00B8C67C push esi
heap:00B8C67D push edi
heap:00B8C67E mov edi, offset off_B93554
heap:00B8C683 mov ebx, dsffset
heap:00B8C689 mov eax, ds:length
heap:00B8C68E mov ecx, 0Ch
heap:00B8C693 cdq
heap:00B8C694 idiv ecx
heap:00B8C696 mov esi, eax ; esi := Dh
heap:00B8C696 ; Number of segment of original code to be decrypted
heap:00B8C698 test esi, esi
heap:00B8C69A jle short decryptDONE
heap:00B8C69C
heap:00B8C69C loc_B8C69C: ; CODE XREF: sub_B8C678+69j
heap:00B8C69C mov eax, [edi]
heap:00B8C69E mov eax, [eax]
heap:00B8C6A0 add eax, [ebx]
heap:00B8C6A2 inc eax
heap:00B8C6A3 mov dword ptr [eax], 1
heap:00B8C6A9 mov eax, [edi]
heap:00B8C6AB mov eax, [eax]
heap:00B8C6AD add eax, [ebx+4]
heap:00B8C6B0 mov edx, [ebx+8]
heap:00B8C6B3 call sub_B8C63C
heap:00B8C6B8 mov eax, [ebp+arg_4]
heap:00B8C6BB push eax
heap:00B8C6BC mov eax, [edi]
heap:00B8C6BE mov eax, [eax]
heap:00B8C6C0 add eax, [ebx+4]
heap:00B8C6C3 mov ecx, [ebp+arg_0]
heap:00B8C6C6 mov edx, [ebx+8]
heap:00B8C6C9 call Something
heap:00B8C6CE mov eax, [edi]
heap:00B8C6D0 mov eax, [eax]
heap:00B8C6D2 add eax, [ebx+4]
heap:00B8C6D5 mov edx, [ebx+8]
heap:00B8C6D8 call sub_B8C644
heap:00B8C6DD add ebx, 0Ch
heap:00B8C6E0 dec esi
heap:00B8C6E1 jnz short loc_B8C69C
heap:00B8C6E3
heap:00B8C6E3 decryptDONE: ; CODE XREF: sub_B8C678+22j
heap:00B8C6E3 mov al, 1
heap:00B8C6E5 pop edi
heap:00B8C6E6 pop esi
heap:00B8C6E7 pop ebx
heap:00B8C6E8 pop ebp
heap:00B8C6E9 retn 8

Hmm, that Call Something is called again, try tracing down, you will see that the value 9Ch (remeber this?) is divided by Ch, equals to Dh and is stored in esi...

Okie, trace down a bit more, we'll see that ebx is pointing to the garbage that eax was pointing to just now, the first dword is added to eax, eax was originally 400000 (image base), then a value of 1 will be move into data pointed to eax, most likely ur prog will crash here when u execute this command ... so again, restart the app and trace, skip that instruction at B8C6A3, trace untill you reach that Call Something again, we can see that again the second and third dword from the garbage we saw earlier is passed on as parameter... stop here... so now we can decide that Call Something is a decrypt call that decrypt data pointed to by eax... length stored in ecx and key stored in edx....

Okie, rename Call Something to Call Decrypt and start tracing into it again....

-----------------------------------------------------
to be cont.

crUsAdEr
March 28th, 2002, 23:47
Phew, okie almost there... this is more stressful than cracking, i just dont know whether i should post more code, or write more explanation.. ah well....

Trace into Call Decrypt we'll see something like this in the middle
heap:00B8C49B mov ebx, eax
heap:00B8C49D mov eax, ebx ; Working Area Offset
heap:00B8C49F mov edx, [eax]
heap:00B8C4A1 call dword ptr [edx] ; MD5 Initialise??
heap:00B8C4A3 mov edx, edi ; serial
heap:00B8C4A5 mov ecx, [ebp+arg_0] ; ecx = 10
heap:00B8C4A8 mov eax, ebx ; Working Area Offset
heap:00B8C4AA mov edi, [eax]
heap:00B8C4AC call dword ptr [edi+4] ; call B84E90
heap:00B8C4AF mov eax, ebx
heap:00B8C4B1 mov edx, [eax]
heap:00B8C4B3 call dword ptr [edx+8] ; call B84E08
heap:00B8C4B6 push 0
heap:00B8C4B8 mov eax, ebx
heap:00B8C4BA mov edx, [eax]
heap:00B8C4BC call dword ptr [edx+0Ch] ; add eax. 48
heap:00B8C4BF push eax
heap:00B8C4C0 mov eax, [ebx]
heap:00B8C4C2 call dword ptr [eax+10h] ; mov eax, 10
heap:00B8C4C5 mov ecx, eax ; ecx = 10
heap:00B8C4C7 mov eax, esi ; eax : offset of some procedure
heap:00B8C4C9 pop edx ; edx = hashstring
heap:00B8C4CA mov edi, [eax]
heap:00B8C4CC call dword ptr [edi+14h] ; call B89418
heap:00B8C4CF mov edx, [ebp+var_8]
heap:00B8C4D2 push edx ; length to be decrypted
heap:00B8C4D3 mov eax, [ebp+var_4]
heap:00B8C4D6 mov ecx, eax ; area to be decrypted
heap:00B8C4D8 mov edx, eax
heap:00B8C4DA mov eax, esi

All the call [edx]. [edx+4].. etc .. [edi+14] looks suspicious, so i traced into each of them... looking at the first 3 call we'll see familiar MD5 routines again.. so it is MD5?

The next 2 calls dont do anythiing important.. i am not sure what the last call do sp forget it first...

After that, a few instruction down, you should see this,
heap:00B8C4D6 mov ecx, eax ; area to be decrypted
heap:00B8C4D8 mov edx, eax
heap:00B8C4DA mov eax, esi
heap:00B8C4DC call sub_B89050

yeah, step over this call and you will our data in the area to be decrypted change, so trace again and step into it... keep pressing F8 and watch out for "non-linear operation", then you will see how teh decrytion is done.... fairly simple, the algo is posted in the other thread... figuring out the algo, we can identify that it is TEA.... Chained Block Cipher with 2 keys... again refer to the thread "stream cipher??" for details on the algo...

Okie, i hope you have read the other thread, now we need to identify how the Dynamic Key is initialise... do a bpm on the key and re run the app, it will break and after a bit of tracing you will realise that you are in the call [edi+14] (mentioned above)... so now we can identify this last call as DynamicKeyGeneration...

Tracing through the routine on DynamicKeygen, you will realise that it is similiar to the TEA decryption routine with some small changes, work out the algo we can see that it is the reverse of the MAIN TEA decryption, so this is the TEA encryption routine...
So now we know that dynamic key is generated by encrypting "FFFFFFFFFFFFFFFF" with out Static key... so all we need to find is the static key and everything is solved... note this static key is an MD5 hash of part of our serial...

So now, the task lay out to you, FIND the TEA key... with this key we can decrypt the code, and dump our decrypted code to obtain a fully functional version...

You have 2 choices, bruteforce it... the key pattern for version 1.0 is ARCHPR-xxxxxxxxxxxx-xxxxxxxxxxxxxx :>>> I tried bruteforcing 6 digits and it took me 14 mins, imagine 12 or 16 digits :>... OR... the second solution is to find the TEA key that the forgetful programmer "FORGOT" to remove from the exe file :>>>

That is it... i dont know if any one actually bother read all this crap... think i dont know how to explain myself well... but HEY, i try :>... so keep cracking... :>

Cheers,

P.S : Off for a beer, i neva typed so much before, i am doing all this because i am grateful to all teh dudes here who have helped me so much in the last 1.5 months since i joined this forum... all i can tell u, RCE Forums rocks... no where else ppl are friendlier and more helpful... ok, enuff.... see ya around

Woodmann
March 29th, 2002, 00:12
Howdy Binh81,

Now you can re-write it and have it published
for a tutorial

Peace, Woodmann

radiant
April 18th, 2002, 23:31
Hi binh81,

I've been following all this very closely, printed both threads (stream cipher) and read over them more times than I care to admit. There is one thing that still doesn't make sense to me.

You said:
****************************************************
"The app is AsProtected so we have to unpack it first... After unpacking, trying to make the app register..."
****************************************************

When I "unpack" ASProtected programs, I normally remove the ASPR code completely. The "heap:00??????" code you posted really has me wondering if ASProtect has really been removed?

The documentation for ASProtect states it has features to create encrypted portions of code that are only decrypted in "Registered User" mode. The real question is if the encrypted portions of the program are being decrypted by ASProtect or being decrypted by custom code written by the application programmers?

As fox3 pointed out in the RSA thread, the typical key for an ASProtected program is a Base64 encoded string that yeilds 129 bytes (decimal) when it's decoded. A portion of the 129 bytes must be used to store text information like user name, email or whatever.
Unless there is a way to actually generate the "text" from a PK signature, then I doubt it's a signature like he suggests. On the other hand, if the code snippet you posted is actually ASProtect doing the work, then another portion of the ASProtect key should be used to generate the MD5 hash creating the "static" TEA key.

In other words, I'm betting the "Registration Code" (a.k.a "serial" box in the program is actually expecting an ASProtect key.

For some strange reason I want to remember you stating that only a portion of the "serial" was used in the MD5 to create the TEA key but I can't find the exact statement... -Maybe it's just wishful thinking.

If I wasn't dreaming and you really did say only a portion of the "serial" is used in the MD5 hash to generate the TEA key, could you tell me what part of it was used and whether or not it was Base64 decoded first?

Anyhow, if you look at the project file created by ASProtect when it's used to protect programs, you'll see a set of keys:

[Keys]
A=xKjEkcBL7nU1LRb8wP5yyQ==
E=d7D30I9tXwsrvS <snip> 5Qnoyzs=
D=B4/7wm5aaC6rx <snip> ERsbK8=
N=tXM5G3gsRSyp4 <snip>noaLp64xMrrA=

All of them are Base64 encoded and the latter three are long (probably 129 bytes but I haven't checked) like the registration keys. The "A" key is short and yields 16 bytes on decode... possilbly used for the hash genereating 128-bit TEA key?

Thanks to everyone for the great threads.

radiant

crUsAdEr
April 19th, 2002, 02:32
Hi radiant,

the Heap was memory dump i dump and inserted into the end of my dumped file so that i can idsassemble and annalyse the decryptiong routine... of course this is only for analysing, the final dump will be free of AsProtect... I use LordPE "partial dump" to dump these code section from AsProtect high memory... i advise u to do so as well cos IDA definitely helps a lot... (Heap was the name of the new section i appended to the dumped file :>

+++++++++++++++++++++++++++++++++++++++++
As fox3 pointed out in the RSA thread, the typical key for an ASProtected program is a Base64 encoded string that yeilds 129 bytes (decimal) when it's decoded. A portion of the 129 bytes must be used to store text information like user name, email or whatever.
Unless there is a way to actually generate the "text" from a PK signature, then I doubt it's a signature like he suggests. On the other hand, if the code snippet you posted is actually ASProtect doing the work, then another portion of the ASProtect key should be used to generate the MD5 hash creating the "static" TEA key.
+++++++++++++++++++++++++++++++++++

I am not quite sure what you mean here... I have seen this implementation MD5-TEA in a reget deluxe, so i think this is one of the option AsProtect provide... it might provide another scheme for RSA maybe?


++++++++++++++++++++++++++++++++++++
For some strange reason I want to remember you stating that only a portion of the "serial" was used in the MD5 to create the TEA key but I can't find the exact statement... -Maybe it's just wishful thinking.

If I wasn't dreaming and you really did say only a portion of the "serial" is used in the MD5 hash to generate the TEA key, could you tell me what part of it was used and whether or not it was Base64 decoded first?
+++++++++++++++++++++++++++++++++++

LOL :>
Yeah if my memory doesnt play trick on me, you are right... very early in the serial checkign routine, (in program code section, not my Heap :> you will see that the serial length must be even, then a substring starting from the 3rd byte, length if half of full serial length - 2 (or was it 4)... some thing like that...

The serial format is something like
ARCHPR-xxxxxxxxxxxxx-xxxxxxxxxxxxxxxx
i think the part CHPR-xxxxxxxxxxxx- is used to calculate the TEA key but i might be worng, cant recall at the moment...

Nope, i doubt there was any Base64 involved in this, not that i can remember.... cos i did not manage to find the original serial :<<... tried bruteforcing MD5 with 7 digits and it took me like half a day or something... i only found the TEA key and use it to decrypt the code section, then patch the registration routine and that is it... a full working version to be uninstalled :>>

Hope this clarify some doubt, :>
Regards

nikolatesla20
April 19th, 2002, 05:18
Dang this stuff is above my head yet......

WOw binh81, you actually were able to unencrypt some code..I could use that in my latest dumps of **protect and **pack

Running fine, just a few encrypted code fragments still in there tho...

Always the option of adding the missing functionality myself too. Code snippet creator works wonders.

-nt20

radiant
April 19th, 2002, 06:02
As for your appended heap, I was pretty sure it was ASProtect code but I just wanted to make sure before I go leaping to incorrect conclusions. I've got some good news for you. Last weekend a new feature was added to WinHex (v10.45) that allows filling unallocated memory with "?" (0x3F)... As you can guess, this is very helpful for dealing with polymophic code, protectors and what ever other forms of nonsense you might come accross because by keeping the addressing correct with the padding, it allows you to disassemble a "complete" dump including heap, stack and all. If you've got the disk space you can dump the entire VM to a file. With ASProtect and others, just toss the program into an endless loop with sice, pull the whole thing from memory and disassemble in IDA without dealing with trying to find and dump then properly load every allocated chunk of memory used back into IDA. It's a real time saver.

I'm not after learning how to crack programs, instead I'm learning to do security audits on them. Unfortunately, ASProtect doesn't leave any choice but to remove the protection, so in spite of not being a cracker, I'm still learning how to crack. -I should have taken up a hobby like knitting rather than security.

"! 1, perl 2..."
-Ooops, that's supposed to be "knot one, pearl two..." :-)


Anyhow, I'm auditing a program very similar to yours. It uses ASProtect and it's typical "Disabled-And-Encrypt-Function" feature. In your case you have a "serial" entry box created by the programs authors. If you look at the ASProtect API you'll see they are probably calling the SetDecryptionKey function someplace in their code. More details on this function (and code snippets) are in the help file for ASProtect.

In my case there is no place to enter a serial, so the protection is done entirely through ASProtect features and ASProtect registration keys. In other words, it's a major pain. The typical ASProtect registration keys take the form of a *.reg or *.key file (an associtation is made to regedit for the *.key extension) but there are acutally four acceptable ways to format them (see ASProtect help file) and a couple places in the registry where they can be stuffed (HKCU, HKLM...).

These ASProtect keys, when Base64 decoded, yield 129 bytes of data. Some of the bytes are user info like name, email and whatever so they are obviously variable. But some portion of these bytes must be "fixed" so they can be used to create the 128 bit TEA key for decrypting the encrypted portions of the program. I had hopes you had figured out what bytes in the ASProtect keys are the "fixed" portion and only now realized I had my apples and oranges confused.

Yep, I tried making multiple keys, decoding them and using bin compare but as expected, it was a fruitless effort.

It's obvious there is a flaw someplace in ASProtect. There is no way that SAC of the UnpackingGods group could have performed the cryptanlysis necessary to actually break TEA and register the ASProtect program. The bytes necessary for creading the MD5-TEA key are probably stored someplace in the program, just like they are in your situation. All the same, it still doesn't make sense why the creator of ASProtect would do such a thing but who knows...

radiant

radiant
April 22nd, 2002, 10:23
Hi binh81,

Is there any chance you'd post the disassembly of the modified TEA encrypt and decrypt routines? I've managed to find all the Base64 and MD5 code but I'm not sure what I'm looking for with this modified TEA algorithm (assuming this is using the same algorithm).

Thanks,
radiant

crUsAdEr
April 22nd, 2002, 11:18
Hi radiant,

Well... i dont think it is modified TEA... it thought it was cos i was learning and i did not know how TEA looks like... so some of the earlier statement were rather inaccurate... It is just usual TEA i think... like i said somewhere in my post (if i remmeber correctly), when u are done with the MD5 algo, do a bpm on the MD5 hash output then u should be in the middle of the TEA routine....

Hope this helps, if not you should try to trace with F8 into the first call after the routine i posted in the in the third post of my long "explanantion" in this thread... the first "normal" call after all the MD5 call [edx], call [edi+4], call [edx+8],.. call [edi+14]

Cheers,
Binh

P.S : i will post the offset here if u still cannot find it, right now i am not at home....