Log in

View Full Version : IDA Pro - Automated Function Recognition on large binaries


OHPen
November 5th, 2012, 10:08
Hi,

I'm currently working on very large binary round about 40 MB. Because I'm facing frequent updates on the binary I have to write a plugin which automatically identifies all the functions I already have reverse engineered ( sure, I mean only the unchanged ones... ). I don't want my comments and changes to be applied manually on and on and on, all the time. I know there are free tools out for binary diffing, but i need a custom tool. My approach would be to retrieve all relevant data of a certain function and hash whatever all parts which are somehow static ( excluding opcodes which have relocations applied on and so on... ). Afterwards I would store the signature + information in a database, so that pattern can be used for searching the next time. If a special signature is no longer found ( lets say after the 3rd update of the binary, i discard it, because i can assume that this code was either heavily changed and thus have to be reverse engineered manually again or it was removed )

I've never done this but now i need it because of the size and the frequent changes of the binary.

What do you think ?

What is the most efficient way to approach this ? I have no problem to invest few weeks on this, so even more complex ideas are welcome. Looking forward to your replys !!

Thx in advanced!

Regards,
OHPen

Aimless
November 7th, 2012, 00:54
IDA Signature tools (ones creating the .sig files) would be your best bet. While not really able to take care of relocated code, it can easily work with identification of bytes in the functions that remain static (mostly).

For adding comments and identifying parameters (or naming them) creating .til files will help you there.

Have Phun

OHPen
November 7th, 2012, 02:45
hey aimless,

i used flirt stuff before, but this is not exactly what I need. Sure it is widely used for signature creation, but as you already mentioned not a "complete" solution. I have to admit that i already forgot about the til export feature , but nevertheless i want a combined solution..

I will have to continue my thinking about my problem. I'm pretty sure that I will end-up with coding something custom ;D

Thank you anyway!

Regards,
OHPen

Aimless
November 7th, 2012, 04:02
Alternatively,

http://old.idapalace.net/

There are plugins that convert IDB to SIG and vice versa. Should be useful?

Have Phun

OHPen
November 7th, 2012, 08:07
That might be interesting, thank you!

I will have a look this evening.

Regards,
OHPen

disavowed
November 17th, 2012, 09:14
http://recon.cx/2012/schedule/attachments/51_recon-crowdre-final-120621174609-phpapp02.pdf

Aimless
November 17th, 2012, 11:17
Too much glitter.

Not enough gold.

Presentation leans towards "CORPORATE" language. Corporate-hep, I call it.

CrowdRE indeed. Sounds more like a wannabe Microsoft in the making.

I'll pass, personally.

The consideration, however, did not go wasted. Thanks, as always Disa. Always ready to help. A bow of the hat to you.

Have Phun

Sirmabus
December 15th, 2012, 06:22
Zynamics/Google BinDiff is a IDA diffing tool that has a "port" feature where you can take names, labels, comments that it finds matches from one IDB to another. AFAIK it's currently the only one that has such a feature.
BinDiff used to choke on anything but small IDBs; it would take maybe a day or more to process a medium to large IDB if it didn't crash first leaving behind tens of thousands of little temp files behind.
Then around the time that Jeong Wook's DarunGrim (another diffing tool but free and probably better) BlackHat USA 2010 talk, BinDiff mysteriously got fixed quite a bit.
I tried a later BinDiff ver 4.0.1 and while they did fix the weird temp file allocation architecture for general diffing, I'll be damned but they left the same crazy machination for the port feature. Try it (with Process Explorer or something) and you see it R/W who knows how many thousands of temp files per second. My hard drive clatters off sounding like a A-10 Warthog chaingun. I thought it would seriously lead to premature HD death letting it do it's thing for hours.
Please Zynamics, it's okay to use memory; most users PC's will have gigabytes of free memory. You don't have to use an archaic 1970's architecture design with tiny ~1KB file buffers and work in ten zillion little temp files. It's also terribly slow, as hello! - such file I/O can be a major bottleneck.
Maybe it can be workable (and save your drives) if you can somehow make it use a RAM disk for it's temp file space.

DarunGrim unfortunately has no "port" feature, but I'm currently rewriting a version of it that is faster (to better facilitate large IDBs) that has one.
You could do the same using the DarunGrim, or Turbodiff, source et al to make your own with a porting feature that you want.

If you want to go the SIG route. My plug-in will make a .PAT file that you can make a signature of:
"IDA2PAT Reloaded" http://www.macromonkey.com/bb/viewtopic.php?f=65&t=710 ("http://www.macromonkey.com/bb/viewtopic.php?f=65&t=710")