Is there such a tool? [Archive] - RCE Messageboard's Regroupment

View Full Version : Is there such a tool?

homersux

June 7th, 2004, 10:46

listfunc win32exe

1. func_a(DWORD para_a, BYTE para_b....)
2. func_b(....)

Basically it will try to analyze the binary and produce a list of functions with its
arguments. This would be something very useful imo.

sgdt

June 7th, 2004, 12:42

Quote:

[Originally Posted by homersux]listfunc win32exe

1. func_a(DWORD para_a, BYTE para_b....)
2. func_b(....)

Basically it will try to analyze the binary and produce a list of functions with its
arguments. This would be something very useful imo.

Have you tried IDA? It's a bit more work than simply firing off a command line, but I can assure you it'd provide much more accurate results...

homersux

June 7th, 2004, 13:22

I am sure you can do that with IDA and a script, but it'd be like shooting a mosquito with a cannon. IDA is kinda too big for such a job and a lot of resource is wasted. Something simple and elegant would be much better.

doug

June 7th, 2004, 14:00

to be able to do what you want, there should be some sort of symbols file associated with the executable (or via export table). Otherwise can it possibly figure out the parameters?

sgdt

June 7th, 2004, 14:54

Figuring out CDECL functions wouldn't be an issue (never is). You probably could easily make something that could recognize such functions. But FASTCALL, on the other hand, really does require looking at the caller and the callee simutaniously.

IDA, as long as you set your compiler type, does a great job of figuring out most of the grunt work. But it's almost always manual (for good reason) when it comes to the FASTCALL routines.

Writing such a utility wouldn't be particularly hard, but it also wouldn't be particularly accurate. You can usually guess somethings FASTCALL by looking at whos calling and if the instruction before modifies ECX, it is a good canidate. Additionally, if it uses ECX before setting it, it's probably FASTCALL. Same goes for EDX, but I usually look for ECX first.

CDECLs almost always have an "add esp, <arg_count * 4> after the call, but that's not a rule. Additionally, you can have situations where the callee is a jump to another function. And then there's exception handlers!!!! That's enough to drive anyone writing an analyzer batty...

Hope this is usefull.

homersux

June 7th, 2004, 16:50

sgdt, I have thought about those types of function signatures. My question reminds that
if there is such a tool available or maybe one has to start from scratch? I hope not.

Although I firmly believe this would be something that would help debug/etc tremendously (with more features, such as jump table look up into imported apis etc). I am willing to start a project
if nobody can think of anything remotely related to this and not as bloated as ida or the like. The accuracy of this analysis tool does not have to be 100% because by nature this is a static analysis tool and should be combined with a dynamic analsysi tool such as olly or sice to look for interesting function entry points.

sgdt

June 7th, 2004, 17:59

A number of years back, I wrote a program to hunt for functions and look for ways to make them faster. Kind of like a peephole optimizer. The targets were overbloated many-megabyte exes.

The first trick I did was data. Because the amount of pointers I was dealing with was very large, I needed a way to add to a list without re-allocating a ton of data. So I had a series of arrays that would get re-allocated, and the array was chosen based on a hash (I had 32 arrays, the hash would return a value between 0-31). This allowed me to instantly (by next part of hash) determine if a pointer already existed and add it if not.

Second trick was to look for pointers. The first thing I did was look at the relocation table. This gave me all the physical pointers, they were added. Then, I would look at the code and using a MMX sliding window, determine if something could be a relative pointer (jumps/calls/etc.). These were added.

Then, I would transverse each pointer with a simple emulator that looked for sane code. The SID byte was processed as a array of pointers-to-functions, because it was too expensive to parse otherwise. Entries that were deamed non-functions were null-ed out (again, to avoid moving mass qty of data). In the end, the arrays were collapsed.

Anyway, it identified functions right up there with IDA, and had a run time of a few seconds for a 13MB exe. However, it had problems (which is why I went to IDA). The first problem was that it couldn't see jump tables. This could have been fixed by handling jump *pointer, but I never got that working correctly. It also didn't see VMTs, and functions ending in jmp.

But it was very fast... Adding the MMX sliding window search brought the spead up to the point it was calculating all it did in roughly the same amount of time it took to bring up the target program.

Basicly, the idea was to make the bulk of the searching as fast as possible, even at the expense of false positives. Then, looking only at the subset, determine what was real and what was artifact. A MMX sliding window, complete with bounds checking, is only a couple clocks. Being able to eliminate pointers to data in-mass via a known-bad list helped too. But even with all my tricks, I wasn't able to get it to do what I needed, and ended up using IDA pretty much exclusvely when reversing (as apposed to cracking, where Olly and SIce reign supreme).

It was a very long time ago, I wish I still had it because of the months of work I poured into it. Some of it is still around, one of the pattern matchers it had became a wicked data parser for the company I work at, dealing with data about 12 times faster than their old hashed string list code.

Oh well. Anyway, about the biggest piece of advice I can give is "realloc is evil" when dealing with lots of data. Find other ways.

Aimless

June 7th, 2004, 23:38

There are 2 parts to your question.

1. Can we output a list of all functions from an .EXE - Answer: YEs you can. Load it in IDA, write an IDC script that parses the entire disassembly and output the file. Alternatively, search for the CALL opcode in Ollydbg and output the same. Results are the same.

2. It is very difficult to write the parameters of a function from the file, as it is impossible to determine the number of parameters used by a *user* function. WinAPI is easy, of course.

Have Phun

homersux

June 8th, 2004, 10:08

It's probably faster with olly in a semi-dynamic-static way. Only drawback is that olly has to be able
to load it to work.

Clandestiny

June 10th, 2004, 10:41

Hiya,

I assume you are proposing to build a dynamic list of functions, a sort of coarsed grained profiler / tracer at the function call level. This is indeed doable. I have written a program which does exactly this. It is written as an IDA plugin, but it has the functionality of a simple debugger. An IDA plugin is really an ideal solution to this type of problem because it allows you to combine IDA's great static analysis data with a stand alone program capable of using it to build a runtime analysis. The list of functions in the target executable is obtained from the IDA database (let IDA do the grunt work

and the target is then loaded under the control of my debugger. Before the process is set to the running state, int 3 style breakpoints are set on each of the function addresses which are hit and logged dynamically as the program executes. The number of arguments for each function can also be obtained from the IDA database so it is a simple matter to extract them from the stack in the breakpoint handler as they're hit. As such it can function as a generic function monitor, capturing local programmer defined functions, static library functions, and the usual api calls.

There are also quite a few additional possiblites for automation regarding input tracing across function parameters / returns (can we say "bringing serial sniffing into the 21'st century", heh, heh

Some type of scriptability would also be possible (ie. tagging the output of a specific function and tracing it as input to subsequent functions). These type of techniques for automated reverse engineering seem to be underutilized in the protections reversing community, but they are quite advanced in the vulnerability reversing community. Consider the possiblites for discovering something like a heap overflow if you can dynamically script a trace to tag the output of any heap creation functions and then trace the input parameters of subsequent functions for that tag to determine where the heap handle is being used. In this manner, one can quickly narrow down the vulnerability space to a subset of the program state space.

Cheers,
Clandestiny