PDA

View Full Version : Dynamic Binary Code and Data Flow Analysis Instrumentation.


BanMe
July 31st, 2010, 15:58
So I've been integrating Boomerang into Sin32 and I am releasing all future code under BSD and GPL licenses references therein.

In doing this I dont want to use the GC stuff or the wierd LOG class provided to do the logging of all this important information that is gleaned out of this project, so a reimplementation of that is needed( all 367 or so calls that I commented out) as well as the reimplementation of the GUI..removing QT was fun.. But reworking the controller GUI to also view output of server..is primary goal. But as seen with my post in rekindled hope(maybe) I'm trying to probe for remote console allocation for output as well as input commands.

For the Most Part I am done with getting it to compile correctly, now I have to make the code not examine 'Binary Files' and examine 'mapped Binary portions' which isnt anything 'really different' from what it does anyways, my method is just runtime based ...

But I know the benefits from the inclusion of the marvelous little tool,will be great. but there is so much to be done..But I will give you the source and the 'complete compiling project' on 2k5 vs.. This update is only running what has been released in the past for the 'LPC Server portion of this maybe with minor updates' expect a BIG update on that regard soon.

BanMe
August 9th, 2010, 11:12
I am adding a console for testing purposes.. This is only for testing of the commandline Interface(looking for testers to btw), but it is still being refactored to do both static and dynamic analysis..it currently only does static and it requires a path to the exe.. so some of this will be recognizable and small portions of this are obsolete.. the portion that I am focusing on is changing the app into something similar to Command.com or cmd.exe..so much still to do..
Code:

int Boomerang::commandLine(int argc, const char **argv)
{
char line[1024];
printf("Sin32_Boomerang %s\n", VERSION);// Display a version and date (mainly for release versions)
printf("Sin32_Boomerang: ";
while (fgets(line, sizeof(line), stdin)) {
int argc = splitLine(line,(char***)&argv);
// if (parseCmd(argc, (const char **)argv) == 2)
// return 2;
if(ParseInputCmds(argc,(const char**)argv))
{
printf("Sin32_Boomerang: ";
fflush(stdout);
}
}
return 0;
}


Dont expect much change to the interal commands(beyond better detail or naming them.. source is below...

regards BanMe

BanMe
August 12th, 2010, 16:07
Code:

int Boomerang::ParseInputCmds(int argc,const char**argv)
{
char CmdIntuit[256];
int kmd = 0;
/*keeping this for later :]
case 'g':
if(argv[I][2]=='d')
dotFile = argv[++i];
else if(argv[I][2]=='c')
generateCallGraph=true;
else if(argv[I][2]=='s') {
generateSymbols=true;
stopBeforeDecompile=true;
}
break;
*/
if(strlen(argv) < 256)
{
strcpy(&CmdIntuit,argv[0]);
}
else
{
return 0;
}
switch(CmdIntuit[0])
{
//alphabetical lower or upper(must be one or other..for now ;P)
//case handler for commands.
case 'a':
{
if(CmdIntuit[1] == 'd')
{
if(argc <= 1)
{
usage();
return 0;
}
else if(argc <=2)
{
usage();
return 0;
}
else
{
if(strcpy(CmdIntuit,argv[1]) == 'e')
{
if(Cmdintuit[1][0] == 'n')
{
noDecodeChildren = true;
ADDRESS addr;
int n;
decodeMain = false;
if(argv[2][0] == '0' && argv[2][1] == 'x')
{
n = sscanf(argv[2], "0x%x", &addr);
} else {
n = sscanf(argv[2], "%i", &addr);
}
if (n != 1)
{
std::cerr << "bad address: " << argv[I] << std::endl;
return 0;
}
entrypoints.push_back(addr);
return 1;
}
}
}
}
usage();
return 0;
}
case 'A':
{
if(CmdIntuit[1] == 'D')
{
if(argc <= 1)
{
usage();
return 0;
}
else if(argc <=2)
{
usage();
return 0;
}
else
{
if(strcpy(CmdIntuit,argv[1]) == 'E')
{
if(Cmdintuit[1][0] == 'N')
{
noDecodeChildren = true;
ADDRESS addr;
int n;
decodeMain = false;
if(argv[2][0] == '0' && argv[2][1] == 'X')
{
n = sscanf(argv[2], "0x%x", &addr);
} else {
n = sscanf(argv[2], "%i", &addr);
}
if (n != 1)
{
std::cerr << "bad address: " << argv[I] << std::endl;
return 0;
}
entrypoints.push_back(addr);
return 1;
}
}
}
}
usage();
return 0;
}
case 'b':
case 'B':
case 'c':
case 'C':
case 'd':

{
if(CmdIntuit[1] == 'f')
{
dfaTypeAnalysis = true;
return 1;
}
}
case 'D':
{
if(CmdIntuit[1] == 'F')
{
dfaTypeAnalysis = true;
return 1;
}
}
case 'e':
case 'E':
case 'f':
case 'F':
case 'g':
case 'G':
case 'h':
{
switch(CmdIntuit[1])
{
case 'e':
{
if(argc !> 1)
{
help();
return 1;
}
else
{
helpcmd();
return 1;
}
}
default:
return 0;
}
}
case 'H':
{
switch(CmdIntuit[1])
{
case 'E':
{
if(argc !> 1)
{
help();
return 1;
}
else
{
helpcmd();
return 1;
}
}
default:
return 0;
}
}
case 'i':
case 'I':
case 'j':
case 'J':
case 'k':
case 'K':
case 'l':
case 'L':
case 'm':
case 'M':
case 'n':
case 'N':
case 'o':
case 'O':
case 'p':
case 'P':
case 'q':
case 'Q':
case 'r':
case 'R':
case 's':
case 'S':
case 't':
{
switch(CmdIntuit[1])
{
case 'c':
{
conTypeAnalysis = true; // -Tc: use old constraint-based type analysis
dfaTypeAnalysis = false;
return 1;
}
default:
return 0;
}
}
case 'T':
{
switch(CmdIntuit[1])
{
case 'C':
{
conTypeAnalysis = true; // -Tc: use old constraint-based type analysis
dfaTypeAnalysis = false;
return 1;
}
default:
return 0;
}
}
case 'u':
case 'U':
case 'v':
{
switch(CmdIntuit[1])
{
case 'e':
{
vFlag = true;
return 1;
}
default:
return 0;
}
}
case 'V':
{
switch(CmdIntuit[1])
{
case 'E':
{
vFlag = true;
return 1;
}
default:
return 0;
}
}
case 'w':
case 'W':
case 'x':
case 'X':
case 'y':
case 'Y':
case 'z':
case 'Z':
default:
return 0;
}
/*

case 'o': {
outputPath = argv[++i];
char lastCh = outputPath[outputPath.size()-1];
if (lastCh != '/' && lastCh != '\\')
outputPath += '/'; // Maintain the convention of a trailing slash
break;
}
case 'n':
switch(argv[I][2]) {
case 'b':
noBranchSimplify = true;
break;
case 'c':
noDecodeChildren = true;
break;
case 'd':
noDataflow = true;
break;
case 'D':
noDecompile = true;
break;
case 'l':
noLocals = true;
break;
case 'n':
noRemoveNull = true;
break;
case 'P':
noPromote = true;
break;
case 'p':
noParameterNames = true;
break;
case 'r':
noRemoveLabels = true;
break;
case 'R':
noRemoveReturns = true;
break;
case 'g':
noGlobals = true;
break;
case 'G':
break;
default:
help();
}
break;

case 'E':
noDecodeChildren = true;
// Fall through
case 'e':
{
ADDRESS addr;
int n;
decodeMain = false;
if (++i == argc) {
usage();
return 1;
}
if (argv[I][0] == '0' && argv[i+1][1] == 'x') {
n = sscanf(argv[I], "0x%x", &addr);
} else {
n = sscanf(argv[I], "%i", &addr);
}
if (n != 1) {
std::cerr << "bad address: " << argv[I] << std::endl;
}
entrypoints.push_back(addr);
}
break;
case 's':
{
if (argv[I][2] == 'f') {
symbolFiles.push_back(argv[i+1]);
i++;
break;
}
ADDRESS addr;
int n;
if (++i == argc) {
usage();
return 1;
}
if (argv[I][0] == '0' && argv[i+1][1] == 'x') {
n = sscanf(argv[I], "0x%x", &addr);
} else {
n = sscanf(argv[I], "%i", &addr);
}
if (n != 1) {
std::cerr << "bad address: " << argv[i+1] << std::endl;
exit(1);
}
const char *nam = argv[++i];
symbols[addr] = nam;
}
break;
case 'd':
switch(argv[I][2]) {
case 'a':
printAST = true;
break;
case 'c':
debugSwitch = true;
break;
case 'd':
debugDecoder = true;
break;
case 'g':
debugGen = true;
break;
case 'l':
debugLiveness = true;
break;
case 'p':
debugProof = true;
break;
case 's':
stopAtDebugPoints = true;
break;
case 't': // debug type analysis
debugTA = true;
break;
case 'u': // debug unused locations (including returns and parameters now)
debugUnused = true;
break;
default:
help();
}
break;
case 'm':
if (++i == argc) {
usage();
return 1;
}
sscanf(argv[I], "%i", &maxMemDepth);
break;
case 'i':
if (argv[I][2] == 'c')
decodeThruIndCall = true; // -ic;
if (argv[I][2] == 'w') // -iw
if (ofsIndCallReport) {
std::string fname = getOutputPath() + "indirect.txt";
ofsIndCallReport = new std:fstream(fname.c_str());
}
break;
case 'L':
if (argv[I][2] == 'D')
#if USE_XML
loadBeforeDecompile = true;
#else
std::cerr << "LD command not enabled since compiled without USE_XML\n";
#endif
break;
case 'S':
if (argv[I][2] == 'D')
#if USE_XML
saveBeforeDecompile = true;
#else
std::cerr << "SD command not enabled since compiled without USE_XML\n";
#endif
else {
sscanf(argv[++i], "%i", &minsToStopAfter);
}
break;
case 'k':
kmd = 1;
break;
case 'P':
progPath = argv[++i];
if (progPath[progPath.length()-1] != '\\')
progPath += "\\";
break;
case 'a':
assumeABI = true;
break;
case 'l':
if (++i == argc) {
usage();
return 1;
}
sscanf(argv[I], "%i", &propMaxDepth);
break;
default:
help();
}
}

setOutputDirectory(outputPath.c_str());

if (kmd)
return cmdLine();
*/
return decompile(argv[argc-1]);
}


new parser..still more to do.. as can be seen...

BanMe
August 13th, 2010, 13:32
So with the new Parser I needed to modify the help for commands

This parser only reads the first 2 letters of each word and goes off of that.. this can be easily expanded but no need yet.. so keep in mind instead of 'add entry 0x07c904020', can be written as 'ad en 0x7c904020' as a shortcut, here is the current help as I've modified it.. :]

Changes to 'wording' and 'Ideas'(I dont need code..just a idea and a direction) for commands one would want to have.
please post your proposed changes or ideas here...

Code:

void Boomerang::help() {
std::cout << "Symbols\n";
std::cout << " add symbol <addr> <name> : Define a symbol\n";
std::cout << " ADD SYMBOL <addr> <name> : Define a symbol\n";

std::cout << " load symbols <filename> : Read a symbol/signature file\n";
std::cout << " LOAD SYMBOLS <filename> : Read a symbol/signature file\n";

std::cout << "Decoding/decompilation options\n";
std::cout << " add entry <addr> : Decode the procedure beginning at addr, and callees\n";
std::cout << " ADD ENTRY <addr> : Decode the procedure at addr, no callees\n";
std::cout << " decode indirect calls : Decode Indirect Calls\n";//ic
std::cout << " DECODE INDIRECT CALLS : Decode Indirect Calls\n";
std::cout << " trace : Trace (print address of) every instruction decoded\n";
std::cout << " TRACE : Trace (print address of) every instruction decoded\n";
std::cout << " type constraint analysis :Use constraint-based type analysis\n";
std::cout << " TYPE CONSTRAINT ANALYSIS :Use constraint-based type analysis\n";
std::cout << " data flow analysis : Use data-flow-based type analysis\n";
std::cout << " DATA FLOW ANALYSIS : Use data-flow-based type analysis\n";
std::cout << " -a : Assume ABI compliance\n";
std::cout << " -W : Windows specific decompilation mode (requires pdb information)\n";
// std::cout << " -pa : only propagate if can propagate to all\n";
//std::cout << "Output\n";
std::cout << " verbose : Set verbose output\n";
std::cout << " VERBOSE : Set verbose output\n";
std::cout << " help : This help\n";
std::cout << " HELP : This help\n";

std::cout << " -o <output path> : Where to generate output (defaults to ./output/)\n";
std::cout << " -x : Dump XML files\n";
std::cout << " -r : Print RTL for each proc to log before code generation\n";
std::cout << " -gd <dot file> : Generate a dotty graph of the program's CFG and DFG\n";
std::cout << " -gc : Generate a call graph (callgraph.out and callgraph.dot)\n";
std::cout << " -gs : Generate a symbol file (symbols.h)\n";
std::cout << " -iw : Write indirect call report to output/indirect.txt\n";
std::cout << "Misc.\n";
std::cout << " take command : Activate Command mode, for available commands see help command\n";
std::cout << " TAKE COMMAND : Activate Command mode, for available commands see help command\n";
std::cout << " -P <path> : Path to Boomerang files, defaults to where you run\n";
std::cout << " Boomerang from\n";
std::cout << " -X : activate eXperimental code; errors likely\n";
std::cout << " -- : No effect (used for testing)\n";
std::cout << "Debug\n";
std::cout << " -da : Print AST before code generation\n";
std::cout << " -dc : Debug switch (Case) analysis\n";
std::cout << " -dd : Debug decoder to stdout\n";
std::cout << " -dg : Debug code Generation\n";
std::cout << " -dl : Debug liveness (from SSA) code\n";
std::cout << " -dp : Debug proof engine\n";
std::cout << " -ds : Stop at debug points for keypress\n";
std::cout << " -dt : Debug type analysis\n";
std::cout << " -du : Debug removing unused statements etc\n";
std::cout << "Restrictions\n";
std::cout << " -nb : No simplifications for branches\n";
std::cout << " -nc : No decode children in the call graph (callees)\n";
std::cout << " -nd : No (reduced) dataflow analysis\n";
std::cout << " -nD : No decompilation (at all!)\n";
std::cout << " -nl : No creation of local variables\n";
// std::cout << " -nm : No decoding of the 'main' procedure\n";
std::cout << " -ng : No replacement of expressions with Globals\n";
std::cout << " -nG : No garbage collection\n";
std::cout << " -nn : No removal of NULL and unused statements\n";
std::cout << " -np : No replacement of expressions with Parameter names\n";
std::cout << " -nP : No promotion of signatures (other than main/WinMain/\n";
std::cout << " DriverMain)\n";
std::cout << " -nr : No removal of unneeded labels\n";
std::cout << " -nR : No removal of unused Returns\n";
std::cout << " -l <depth> : Limit multi-propagations to expressions with depth <depth>\n";
std::cout << " -p <num> : Only do num propagations\n";
std::cout << " -m <num> : Max memory depth\n";
}

Also be aware, some of these are now obsolete.. I just haven't done the commenting here yet.. ;p

BanMe
August 14th, 2010, 00:23
heres the source code from todays workings.
enjoy the small update.. still need to fix this issue..

1>basicblock.obj : warning LNK4210: .CRT section exists; there may be unhandled static initializers or terminators
1>dataflow.obj : warning LNK4210: .CRT section exists; there may be unhandled static initializers or terminators
1>dfa.obj : warning LNK4210: .CRT section exists; there may be unhandled static initializers or terminators
1>exp.obj : warning LNK4210: .CRT section exists; there may be unhandled static initializers or terminators
1>proc.obj : warning LNK4210: .CRT section exists; there may be unhandled static initializers or terminators
1>signature.obj : warning LNK4210: .CRT section exists; there may be unhandled static initializers or terminators
1>sslparser.obj : warning LNK4210: .CRT section exists; there may be unhandled static initializers or terminators
1>type.obj : warning LNK4210: .CRT section exists; there may be unhandled static initializers or terminators


im gonna start in basic block.. and work my way down..also this project is dependent still on the dll Win32Binary and some other one..so the 'true' internal dont work yet.. I've just got to add in the class and modify a few bits of code to make it not use the external dlls..but after this is complete, and we are finally able to have a lil 'public' testing.. the output to the console is going to be immense, until I implement my own logger...