Log in

View Full Version : Industrial-Grade Binary-Only Profiling and Coverage


OpenRCE_RolfRolles
February 16th, 2008, 23:51
There are a few options for profiling or performing code-coverage analysis on a per-module binary level:

* Run traces (very slow and generate a huge amount of uninteresting data, but it works);
* MSR tracing (strengths and weaknesses remain to be seen, but seems fairly promising);
* BinNavi/CoverIt/PaiMei/presumably Inspector: put a breakpoint on every function you found in a static disassembly (doesn't work in general; I explained why here ("http://www.openrce.org/forums/posts/716"))

There are more options rooted in academia, the most practical of which being dynamic binary instrumentation (DBI), the technology behind tools such as valgrind ("http://www.valgrind.org") and DynamoRIO ("http://www.cag.lcs.mit.edu/dynamorio/"). The inner workings of this technology are very interesting, but they are rather involved and their precise technical details are beyond the scope of this entry. Informally speaking, they disassemble a basic block, convert the instructions into an intermediate language like the ones you find inside of a compiler, and finally re-compile the IL with the "instrumentation" code baked directly into the new assembly language. For more information, read the original Ph.D. thesis describing Valgrind ("http://valgrind.org/docs/phd2004.pdf") and then read the source to libVEX, a component thereof. Valgrind is slow and linux-only, but DynamoRIO was specifically designed with speed in mind (hence the "Dynamo" and runs on Windows.

Here ("http://www.openrce.org/repositories/users/RolfRolles/DRProfileCoverageTool.rar") I present a DynamoRIO extension for code coverage and profiling. It works on a function-level (although block-level support could be added easily -- the source weighs in at a measly 70 lines in 2kb, so if you want some other feature, just code it), and it can either be a profiler or a code coverage analyzer. All it does is instrument the code such that each call instruction, direct or indirect, will write its source and target addresses into a file. This data can then be used for either profiling or code coverage purposes: simply discard all of the duplicates for the latter, and use the data as-is for the former. This is just the back-end, but I imagine that this could be easily integrated into PaiMei's front end to provide an industrial-grade coverage and profiling tool.

Strengths of DynamoRIO:
* speed (you might not even notice the slowdown);
* stability (there used to be a commercial security product based on this technology -- it is literally industrial grade);
* trivial to code extensions for (70 lines, 2kb for this simple yet powerful extension).

Weaknesses:
* definitely won't work with self-modifying code
* probably won't work with obfuscated or "self-protecting" code (there's particularly a problem with so-called "pc-relative" addressing, such as call $ / pop ebp).

Studious readers may note that automatic indirect call resolution is exceptionally useful for C++ reverse engineering; comment out the direct call resolution, recompile, write a quick IDC script to add the x-refs to the disassembly listing, and you've got a killer C++ RE tool. Credit goes to spoonm for having and implementing this idea initially.

https://www.openrce.org/blog/view/1061/Industrial-Grade_Binary-Only_Profiling_and_Coverage

dELTA
February 17th, 2008, 06:15
Very nice contribution RolfRolles, and also very relevant to the current discussions within this topic in the thread at:

http://www.woodmann.com/forum/showthread.php?t=11306



CRCETL:
http://www.woodmann.com/collaborative/tools/Profile_Coverage_Tool