Log in

View Full Version : Toughts on code de-obfuscator(s)


SiNTAX
September 17th, 2002, 08:23
Has anybody done some work on code deobfuscators (ie. for the anti-disassembly stuff SafeDisc and the likes add to code).

I have Imhotep from AntharXes (did I spell this correct). It does a decent job, but not a perfect one.

Just wondering if this can simply be done with some crafty IDA scripting?!


In the safedisc unwrapper from R!sc there was a cleaned dump of the SD dll's, I doubt this was all done by hand..

[yAtEs]
September 18th, 2002, 14:20
risc got a maid to clean his dlls (-;

It just takes a little time and analyse, dead list something
for example the safedisc dlls and define into bytes what you
think is junk and refined the code after, after awhile you'll
spot the patterns, obfuscation general works by adding
certain bytes to after certain opcodes, for example CALL EAX
is FF E0 stick an EB on it, EB FF E0 makes a jmp -1
so it jumps back to the FF E0 and is disassembled wrong
etc

yates.

SiNTAX
September 18th, 2002, 15:06
Yups.. I know how this stuff works, but I doubt anyone sane of mind would do it completely manually

Imhotep does a decent job at it, but not a perfect one... I think using a decent disassembler like IDA with a script file could do wonders.

The 100% automatic tools usually fail in various places.


(It would be nice if OllyDbg had something like that built-in... Now THAT would be cool!)

.. for the record ... been playing a bit with a v2.60.52 SD file (maybe get it to run in WINE one of these days)

Sure miss PROTECT / Frogsice under WinXP

cyberheg
September 18th, 2002, 17:31
I once tried a safewrap protected program which I assume uses same type of obfuscation as safedisc itself. However I was supprised how 'stupid' the obfuscation was. If the same thing goes for safedisc I am sure it's possible to write a good anti tool for it.

Maybe one of you could post more examples of this type of obfuscation.
In most cases obfuscation is applied on compiled code or to-be-compiled code and assuming most engines aren't smart enough to track which registers are in use then mostly obfuscation are made of "null instructions". From safewrap I remember stuff like xchg eax, edx; xchg edx, eax and push/pop series.

However there exists ways to make obfuscation harder to make it harder to remove again.

// CyberHeg

bsod
September 18th, 2002, 22:05
well, many protectors simply use the same obfuscator code again and again, so we simply search for those bytes over the whole code section and nop them out..

bye,
bsod

SiNTAX
September 19th, 2002, 08:23
SafeDisc doesn't use the exact same sequence over and over again, that would be too easy.

As for an example:

[correctly decoded version]
07E6F loc_10007E6F: ; CODE XREF: .txt2:10007E76j
10007E6F mov ebx, ebx
10007E71 jg short loc_10007E79
10007E73 nop
10007E74 jle short loc_10007E79
10007E76
10007E76 loc_10007E76: ; CODE XREF: .txt2:10007E6Dj
10007E76 jmp short loc_10007E6F
10007E76 ; ---------------------------------------------------------------------------
10007E78 db 2Bh ; +
10007E79 ; ---------------------------------------------------------------------------
10007E79
10007E79 loc_10007E79: ; CODE XREF: .txt2:10007E71j
10007E79 ; .txt2:10007E74j
10007E79 js short loc_10007E84
10007E7B
10007E7B loc_10007E7B: ; CODE XREF: .txt2:10007E86j
10007E7B nop
10007E7C xchg eax, eax
10007E7E jg short loc_10007E89
10007E80 xchg ebx, ebx
10007E82 jle short loc_10007E89
10007E84
10007E84 loc_10007E84: ; CODE XREF: .txt2:10007E79j
10007E84 jz short $+2
10007E86 js short loc_10007E7B
10007E86 ; ---------------------------------------------------------------------------
10007E88 db 22h ; "
10007E89 ; ---------------------------------------------------------------------------
10007E89
10007E89 loc_10007E89: ; CODE XREF: .txt2:10007E7Ej
10007E89 ; .txt2:10007E82j
10007E89 jnz short loc_10007E96
10007E8B push ebx
10007E8C call nullsub_23
10007E91 ; ---------------------------------------------------------------------------
10007E91 pop ebx
10007E92 jz short loc_10007E96
10007E94
10007E94 ; ¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦ S U B R O U T I N E ¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦
10007E94
10007E94
10007E94 nullsub_22 proc near ; CODE XREF: sub_10007E20+1Dp
10007E94 retn
10007E94 nullsub_22 endp
10007E94
10007E94 ; ---------------------------------------------------------------------------
10007E95 db 38h ; 8
10007E96 ; ---------------------------------------------------------------------------
10007E96
10007E96 loc_10007E96: ; CODE XREF: .txt2:10007E89j

And this is how it normally looks like:


10007E6F ; ---------------------------------------------------------------------------
10007E6F
10007E6F loc_10007E6F: ; CODE XREF: .txt2:10007E76j
10007E6F mov ebx, ebx
10007E71 jg short loc_10007E79
10007E73 nop
10007E74 jle short loc_10007E79
10007E76
10007E76 loc_10007E76: ; CODE XREF: .txt2:10007E6Dj
10007E76 jmp short loc_10007E6F
10007E76 ; ---------------------------------------------------------------------------
10007E78 db 2Bh ; +
10007E79 ; ---------------------------------------------------------------------------
10007E79
10007E79 loc_10007E79: ; CODE XREF: .txt2:10007E71j
10007E79 ; .txt2:10007E74j
10007E79 js short loc_10007E84
10007E7B
10007E7B loc_10007E7B: ; CODE XREF: .txt2:10007E86j
10007E7B nop
10007E7C xchg eax, eax
10007E7E jg short near ptr loc_10007E88+1
10007E80 xchg ebx, ebx
10007E82 jle short near ptr loc_10007E88+1
10007E84
10007E84 loc_10007E84: ; CODE XREF: .txt2:10007E79j
10007E84 jz short $+2
10007E86 js short loc_10007E7B
10007E88
10007E88 loc_10007E88: ; CODE XREF: .txt2:10007E7Ej
10007E88 ; .txt2:10007E82j
10007E88 and dh, [ebp+0Bh]
10007E8B push ebx
10007E8C call nullsub_23
10007E91 pop ebx
10007E92 jz short loc_10007E96
10007E94
10007E94 locret_10007E94: ; CODE XREF: sub_10007E20+1Dp
10007E94 retn
10007E94 ; ---------------------------------------------------------------------------
10007E95 db 38h ; 8
10007E96 ; ---------------------------------------------------------------------------



Fixing this in IDA is as simple as going to the adress of the label with the +1 (in this case loc_10007E88), pressing U for undefined code, then go 1 down and press C for code.

So a simple script that does that sequence will clean up the decode.

Imhotep works a bit different, it finds null instructions like mov ecx,ecx and NOP's them out.

SiNTAX
October 8th, 2002, 01:17
Ahh finally took the time to code something up... anyway it's actually not that hard to do.. should have done it sooner