PDA

View Full Version : pediff - show code changes between two pe files


diablo2oo2
October 17th, 2011, 14:32
i would like to share here my new little project "pediff". it's based on an idea of the googles courgette project and zynamics bindiff.
pediff can compares two PE files and tries to match identical functions inside the code section. matching is done over the levenshtein distance between the opcode instructions bytes. zynamics bindiff does not use this method, but i think its a very good way to match functions.

http://img854.imageshack.us/img854/4356/pediffv01.png (http://imageshack.us/photo/my-854/pediffv01.png/)

http://img845.imageshack.us/img845/4109/pediffwinampexample.png (http://imageshack.us/photo/my-845/pediffwinampexample.png/)

so far you can just compare the code section of two PE files for changes

Code:
Download pediff for testing
http://diablo2oo2.di.funpic.de/downloads/pediff.rar

Mirror: http://free.pages.at/downloads/pediff.rar

http://dev.chromium.org/developers/design-documents/software-updates-courgette
http://www.zynamics.com/bindiff.html

diablo2oo2
October 25th, 2011, 06:30
made some code changes and added a GUI

Code:
http://diablo2oo2.di.funpic.de/downloads/pediff.rar

OHPen
October 25th, 2011, 16:39
i like your project!!! tell us a bit more about levenstein distance.

regards,
OHPen

diablo2oo2
October 25th, 2011, 18:58
Quote:
[Originally Posted by OHPen;91279]i like your project!!! tell us a bit more about levenstein distance.

regards,
OHPen


hey OHPen! ...has been a long time

the levenshtein distance is mostly used for string matching i guess. but why not using it for matching opcode sequences? it works fine for smaller PE files with about 1000 - 3000 functions like "kernel32.dll".

but analysing for example googles "chrome.dll" with over 70,000 functions its a nightmare.

i use the levenshtein distance for matching functions which could not match over the opcode hash (CRC32), so if there are just small code changes then pediff works fast.

two functions are "matched" in pediff if the levenshtein distance is better than 50%.

http://en.wikipedia.org/wiki/Levenshtein_distance

too bad this algorithm is very slow for long patterns.

OHPen
October 26th, 2011, 02:21
@diabolo: Yeah, you are right, but I'm still here and reverse engineering ;D Are you still using Delphi !??? ;DDDDDD

I read the algo description and i agree that you can use it for such purpose. I think what makes other tools so fast is that they combine different approaches depending on the structure of the binary. There is a lot of interesting theory out there!

Few years, maybe already 10 ( omg ), i wrote a tool together with a friend, which we called patterndiffer. Therefore we wrote an own algorithm ( my friend was studing math at that time ) to detect extact matching patterns in black box binaries of non matching size. Cool was that is was working quite well for binaries up to 300 KB, mabye a bit more, but when it comes to a large amount of data, the analysis took way to long...

Keep on working on that stuff, its quite interesting!!

Regards,
OHPen.

diablo2oo2
October 26th, 2011, 09:18
did i ever used delphi ? i code with MASM only

i made some optimations today. the function matching works much faster now.

as next step i could use multiple threads.

diablo2oo2
October 26th, 2011, 18:48
more optimizations done...
now its really fast! (only need now multithread support)

OHPen
October 28th, 2011, 03:57
@diablo: ah, there is so much time gone that my memory about your favourite coding language seems to be corrupted anyway i remember that i always was supprised that you didn't use c as language, but....but this is purely a matter of taste, right ? ;DD

So when do you plan to release the next version, i would like to test it for you, if you want!

Regards,
OHPen

diablo2oo2
October 28th, 2011, 06:51
i always upload the latest version in the same place. check it out.

diablo2oo2
November 1st, 2011, 20:45
working on a GUI for pediff now. the console version will be replaced by a DLL.

http://img811.imageshack.us/img811/1678/pediffgui.png

for testing:
http://diablo2oo2.di.funpic.de/downloads/pediff.rar

Kayaker
November 1st, 2011, 21:48
Very nice diablo2oo2, this could prove to be a very useful tool. No problems with detecting changed functions with even only a single nop difference. And it is quite fast.

I know this is a still a work in progress so I hate to bring up comments on the interface. One thing I noticed is that the dialog window is cut off at 800x600 resolution. Not too many people still work at that resolution on their main system, but under a VM window one might for display purposes. If you don't want to optimize for 800x600, even if the disassembly windows were adjustable for size, or had the horizontal scroll bar property enabled, it would help that problem. Also, synchronizing the two disasm windows vertical scroll would be useful too.

Keep up the good work. I tested the console version too, but the gui is a nice improvement

Regards,
Kayaker

OHPen
November 2nd, 2011, 09:17
Nice GUI ;D

How do you disassemble ? I mean without advanced detection of what is code and what not you want find all pieces of code....
At the beginning you usually have only a view points where to start, like entry point, exports, TLS and so on. You can track all calls, but oviously this will not be successfull when the code is "handmade" using computational jumps like add eax, xxxx, sub eax, yyy, jmp eax. If you want to follow such control flow transfers as well you will need an sophisticated code analysis. Also relocations are not that helpful, as you cannot reverse disassemble the code but correct me if i am wrong

I'm interested in how you do your disassembly because this can become quite complicated. this is btw one reason why ida is still disasm no 1!!!! it nearly catches all parts of code.

Regards,
OHPen.

Orkblutt
November 2nd, 2011, 09:18
Looks to be a very nice tool!
The link seems to be broken at the time I write this comment.

regards,

orkblutt

diablo2oo2
November 2nd, 2011, 12:56
the GUI is just for testing. i will make it resizable later, when i find a good resizing library for MASM.

of course pediff can not be good as IDA. but i think i get very usefull results already.

this are the analyzing steps:

1. find functions from export table
1.1 get function end (last ret command)
2. find functions in code section by analyzing call instructions
2.1 get function end (last ret command)
4. entry points is also a function
5. find hidden functions (like threads, callback functions,...)
5.1 get function end (last ret command)

pediff tries to find the hidden functions between the already found functions.


my mainwebsite is currently down. but you can visit my mirrorsite for downloading:
http://free.pages.at/d2k2/downloads/pediff.rar

GEEK
November 9th, 2011, 03:08
works really good!

Great work diablo2oo2

blabberer
November 11th, 2011, 04:10
hi Diablo
pediff means only pe ? no com ??

i was trying to diff tow .coms and got your app to crash

see if !analyze -v is of an use to you

Code:

0:001> !analyze -v
*******************************************************************************
* *
* Exception Analysis *
* *
*******************************************************************************


FAULTING_IP:
pediff+2308
10002308 8b5c3010 mov ebx,dword ptr [eax+esi+10h]

EXCEPTION_RECORD: ffffffff -- (.exr 0xffffffffffffffff)
ExceptionAddress: 10002308 (pediff+0x00002308)
ExceptionCode: c0000005 (Access violation)
ExceptionFlags: 00000000
NumberParameters: 2
Parameter[0]: 00000000
Parameter[1]: 82e20017
Attempt to read from address 82e20017

FAULTING_THREAD: 00000bc4

DEFAULT_BUCKET_ID: INVALID_POINTER_READ

PROCESS_NAME: pediff_gui.exe

ERROR_CODE: (NTSTATUS) 0xc0000005 - The instruction at "0x%08lx" referenced memory at "0x%08lx". The memory could not be "%s".

EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - The instruction at "0x%08lx" referenced memory at "0x%08lx". The memory could not be "%s".

EXCEPTION_PARAMETER1: 00000000

EXCEPTION_PARAMETER2: 82e20017

READ_ADDRESS: 82e20017

FOLLOWUP_IP:
pediff+2308
10002308 8b5c3010 mov ebx,dword ptr [eax+esi+10h]

MOD_LIST: <ANALYSIS/>

NTGLOBALFLAG: 0

APPLICATION_VERIFIER_FLAGS: 0

PRIMARY_PROBLEM_CLASS: INVALID_POINTER_READ

BUGCHECK_STR: APPLICATION_FAULT_INVALID_POINTER_READ

LAST_CONTROL_TRANSFER: from 10001753 to 10002308

STACK_TEXT:
WARNING: Stack unwind information not available. Following frames may be wrong.
02e0f350 10001753 02e20020 00000000 7c919daa pediff+0x2308
02e0f388 1000251e 02e20020 06f30020 00000000 pediff+0x1753
02e0f3a4 004013fe 00900650 00000000 445c3a43 pediff!pediff_compare_start+0x2e
02e0ffb4 7c80b729 00000000 7c919daa 00000309 pediff_gui+0x13fe
02e0ffec 00000000 00401330 00000000 00000000 kernel32!BaseThreadStart+0x37


SYMBOL_STACK_INDEX: 0

SYMBOL_NAME: pediff+2308

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: pediff

DEBUG_FLR_IMAGE_TIMESTAMP: 4eb1b3d7

STACK_COMMAND: dt ntdll!LdrpLastDllInitializer BaseDllName ; dt ntdll!LdrpFailureData ; ~1s ; kb

BUCKET_ID: APPLICATION_FAULT_INVALID_POINTER_READ_pediff+2308

IMAGE_NAME: C:\Documents and Settings\Admin\Desktop\pediff\pediff.dll

FAILURE_BUCKET_ID: INVALID_POINTER_READ_c0000005_C:_Documents_and_Settings_Admin_Desktop_pediff_pediff.dll!Unknown

Followup: MachineOwner
---------

Harakiri
November 24th, 2011, 16:05
This would be a killer app if it could retrieve the symbol information from an ida db and update another one with it. e.g. for version x you have debug information, for a later reversion of the exe they are missing