View Full Version : pediff - show code changes between two pe files
diablo2oo2
October 17th, 2011, 14:32
i would like to share here my new little project "pediff". it's based on an idea of the googles courgette project and zynamics bindiff.
pediff can compares two PE files and tries to match identical functions inside the code section. matching is done over the levenshtein distance between the opcode instructions bytes. zynamics bindiff does not use this method, but i think its a very good way to match functions.
http://img854.imageshack.us/img854/4356/pediffv01.png (http://imageshack.us/photo/my-854/pediffv01.png/)
http://img845.imageshack.us/img845/4109/pediffwinampexample.png (http://imageshack.us/photo/my-845/pediffwinampexample.png/)
so far you can just compare the code section of two PE files for changes
Code:
Download pediff for testing
http://diablo2oo2.di.funpic.de/downloads/pediff.rar
Mirror: http://free.pages.at/downloads/pediff.rar
http://dev.chromium.org/developers/design-documents/software-updates-courgette
http://www.zynamics.com/bindiff.html
diablo2oo2
October 25th, 2011, 06:30
made some code changes and added a GUI
Code:
http://diablo2oo2.di.funpic.de/downloads/pediff.rar
OHPen
October 25th, 2011, 16:39
i like your project!!! tell us a bit more about levenstein distance.
regards,
OHPen
diablo2oo2
October 25th, 2011, 18:58
Quote:
[Originally Posted by OHPen;91279]i like your project!!! tell us a bit more about levenstein distance.
regards,
OHPen |
hey OHPen!

...has been a long time
the levenshtein distance is mostly used for string matching i guess. but why not using it for matching opcode sequences? it works fine for smaller PE files with about 1000 - 3000 functions like "kernel32.dll".
but analysing for example googles "chrome.dll" with over 70,000 functions its a nightmare.
i use the levenshtein distance for matching functions which could not match over the opcode hash (CRC32), so if there are just small code changes then pediff works fast.
two functions are "matched" in pediff if the levenshtein distance is better than 50%.
http://en.wikipedia.org/wiki/Levenshtein_distance
too bad this algorithm is very slow for long patterns.
OHPen
October 26th, 2011, 02:21
@diabolo: Yeah, you are right, but I'm still here and reverse engineering ;D Are you still using Delphi !??? ;DDDDDD
I read the algo description and i agree that you can use it for such purpose. I think what makes other tools so fast is that they combine different approaches depending on the structure of the binary. There is a lot of interesting theory out there!
Few years, maybe already 10 ( omg ), i wrote a tool together with a friend, which we called patterndiffer. Therefore we wrote an own algorithm ( my friend was studing math at that time ) to detect extact matching patterns in black box binaries of non matching size. Cool was that is was working quite well for binaries up to 300 KB, mabye a bit more, but when it comes to a large amount of data, the analysis took way to long...
Keep on working on that stuff, its quite interesting!!
Regards,
OHPen.
diablo2oo2
October 26th, 2011, 09:18
did i ever used delphi ? i code with MASM only
i made some optimations today. the function matching works much faster now.
as next step i could use multiple threads.
diablo2oo2
October 26th, 2011, 18:48
more optimizations done...
now its really fast! (only need now multithread support)

OHPen
October 28th, 2011, 03:57
@diablo: ah, there is so much time gone that my memory about your favourite coding language seems to be corrupted

anyway i remember that i always was supprised that you didn't use c as language, but....but this is purely a matter of taste, right ? ;DD
So when do you plan to release the next version, i would like to test it for you, if you want!
Regards,
OHPen
diablo2oo2
October 28th, 2011, 06:51
i always upload the latest version in the same place. check it out.
diablo2oo2
November 1st, 2011, 20:45
working on a GUI for pediff now. the console version will be replaced by a DLL.
http://img811.imageshack.us/img811/1678/pediffgui.png
for testing:
http://diablo2oo2.di.funpic.de/downloads/pediff.rar
Kayaker
November 1st, 2011, 21:48
Very nice diablo2oo2, this could prove to be a very useful tool. No problems with detecting changed functions with even only a single nop difference. And it is quite fast.
I know this is a still a work in progress so I hate to bring up comments on the interface. One thing I noticed is that the dialog window is cut off at 800x600 resolution. Not too many people still work at that resolution on their main system, but under a VM window one might for display purposes. If you don't want to optimize for 800x600, even if the disassembly windows were adjustable for size, or had the horizontal scroll bar property enabled, it would help that problem. Also, synchronizing the two disasm windows vertical scroll would be useful too.
Keep up the good work. I tested the console version too, but the gui is a nice improvement
Regards,
Kayaker
OHPen
November 2nd, 2011, 09:17
Nice GUI ;D
How do you disassemble ? I mean without advanced detection of what is code and what not you want find all pieces of code....
At the beginning you usually have only a view points where to start, like entry point, exports, TLS and so on. You can track all calls, but oviously this will not be successfull when the code is "handmade" using computational jumps like add eax, xxxx, sub eax, yyy, jmp eax. If you want to follow such control flow transfers as well you will need an sophisticated code analysis. Also relocations are not that helpful, as you cannot reverse disassemble the code but correct me if i am wrong
I'm interested in how you do your disassembly because this can become quite complicated. this is btw one reason why ida is still disasm no 1!!!! it nearly catches all parts of code.
Regards,
OHPen.
Orkblutt
November 2nd, 2011, 09:18
Looks to be a very nice tool!
The link seems to be broken at the time I write this comment.
regards,
orkblutt
diablo2oo2
November 2nd, 2011, 12:56
the GUI is just for testing. i will make it resizable later, when i find a good resizing library for MASM.
of course pediff can not be good as IDA. but i think i get very usefull results already.
this are the analyzing steps:
1. find functions from export table
1.1 get function end (last ret command)
2. find functions in code section by analyzing call instructions
2.1 get function end (last ret command)
4. entry points is also a function
5. find hidden functions (like threads, callback functions,...)
5.1 get function end (last ret command)
pediff tries to find the hidden functions between the already found functions.
my mainwebsite is currently down. but you can visit my mirrorsite for downloading:
http://free.pages.at/d2k2/downloads/pediff.rar
GEEK
November 9th, 2011, 03:08
works really good!
Great work diablo2oo2

blabberer
November 11th, 2011, 04:10
hi Diablo
pediff means only pe ?

no com ??
i was trying to diff tow .coms and got your app to crash
see if !analyze -v is of an use to you
Code:
0:001> !analyze -v
*******************************************************************************
* *
* Exception Analysis *
* *
*******************************************************************************
FAULTING_IP:
pediff+2308
10002308 8b5c3010 mov ebx,dword ptr [eax+esi+10h]
EXCEPTION_RECORD: ffffffff -- (.exr 0xffffffffffffffff)
ExceptionAddress: 10002308 (pediff+0x00002308)
ExceptionCode: c0000005 (Access violation)
ExceptionFlags: 00000000
NumberParameters: 2
Parameter[0]: 00000000
Parameter[1]: 82e20017
Attempt to read from address 82e20017
FAULTING_THREAD: 00000bc4
DEFAULT_BUCKET_ID: INVALID_POINTER_READ
PROCESS_NAME: pediff_gui.exe
ERROR_CODE: (NTSTATUS) 0xc0000005 - The instruction at "0x%08lx" referenced memory at "0x%08lx". The memory could not be "%s".
EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - The instruction at "0x%08lx" referenced memory at "0x%08lx". The memory could not be "%s".
EXCEPTION_PARAMETER1: 00000000
EXCEPTION_PARAMETER2: 82e20017
READ_ADDRESS: 82e20017
FOLLOWUP_IP:
pediff+2308
10002308 8b5c3010 mov ebx,dword ptr [eax+esi+10h]
MOD_LIST: <ANALYSIS/>
NTGLOBALFLAG: 0
APPLICATION_VERIFIER_FLAGS: 0
PRIMARY_PROBLEM_CLASS: INVALID_POINTER_READ
BUGCHECK_STR: APPLICATION_FAULT_INVALID_POINTER_READ
LAST_CONTROL_TRANSFER: from 10001753 to 10002308
STACK_TEXT:
WARNING: Stack unwind information not available. Following frames may be wrong.
02e0f350 10001753 02e20020 00000000 7c919daa pediff+0x2308
02e0f388 1000251e 02e20020 06f30020 00000000 pediff+0x1753
02e0f3a4 004013fe 00900650 00000000 445c3a43 pediff!pediff_compare_start+0x2e
02e0ffb4 7c80b729 00000000 7c919daa 00000309 pediff_gui+0x13fe
02e0ffec 00000000 00401330 00000000 00000000 kernel32!BaseThreadStart+0x37
SYMBOL_STACK_INDEX: 0
SYMBOL_NAME: pediff+2308
FOLLOWUP_NAME: MachineOwner
MODULE_NAME: pediff
DEBUG_FLR_IMAGE_TIMESTAMP: 4eb1b3d7
STACK_COMMAND: dt ntdll!LdrpLastDllInitializer BaseDllName ; dt ntdll!LdrpFailureData ; ~1s ; kb
BUCKET_ID: APPLICATION_FAULT_INVALID_POINTER_READ_pediff+2308
IMAGE_NAME: C:\Documents and Settings\Admin\Desktop\pediff\pediff.dll
FAILURE_BUCKET_ID: INVALID_POINTER_READ_c0000005_C:_Documents_and_Settings_Admin_Desktop_pediff_pediff.dll!Unknown
Followup: MachineOwner
---------
Harakiri
November 24th, 2011, 16:05
This would be a killer app if it could retrieve the symbol information from an ida db and update another one with it. e.g. for version x you have debug information, for a later reversion of the exe they are missing
Powered by vBulletin® Version 4.2.2 Copyright © 2020 vBulletin Solutions, Inc. All rights reserved.