hey ben,
yes, i take care of indirect jumps, but not yet jump tables.
i even think of implementing analysis to check for call [eax]

- not always possible of course.
some function scanning and disassembling of caves for most executables yet creates same results as using objdump which is probably more luck at the moment.
Quote:
recursive flow is good way, but this takes x amount of time. |
no, this is not the case. at least i do not understand why it should? i on the fly update a map where i immediately see, if i already disassembled areas.
so 3 calls to the same functions: it is only once stepped into.
also call function+27: the destination address if it is marked as already disassembled i do not need to go there anymore.
i create a buffer with the size of the executable. for each instruction i disassemble i set at the address of the instruction (offset) the first byte to eg 1. for the remaining bytes of the instruction i set their correspond buffer values to 2. so if i disassemble offset 345, and my buffer[345]==1,
then i was already ecactly there.
using the map i can see even more:
i see if i want to disassemble at a point where i previously have disassembled, and the byte where i want to disassemble previously was a byte in the middle of an instruction.
meaning if the buffer[345]==2 then i attempt to disassemble the "middle" of a previous processed instruction.
example:
addr001 instruction -> buf[001]=1,buf[002]=2,buf[003]=2
addr004 instruction -> buf[004]=1,buf[005]=2
addr006 label1: instruction -> buf[006]=1,buf[007]=2,buf[008]=2
addr009 instruction
at another place:
jmp label <------- here i immediately see that i have already dis-
assembled exactly this address :
buf[006] == 1
....
....
jmp label+1 <-------- here i immediately see, that i have already
disassembled the address, but the concerned
instruction i have disassembled before started
one byte before, so i am disassembling
the middle of an instruction
buf[007] == 2
cool, ey
Quote:
sometimes the recursive flow can go deep soo much that the stack could not bear it and will crash (theory, but its possible if i am not mistaken) |
hehe. you are right. i tested it with nesting of over 5000 levels, and i only once segfaulted, because i was using a temp buffer as local variable, whcih i made to a global var, since then no problems.
but in theory it is possible of course!
in praxis i did not find any executable yet where it happens.
once i have enhanced the algorythm enough i will take time to build
up own "recursive" data structure for the local vars.
i really see no more comfortable way to disassemble than recursive.
i think you need to do so.
the problem is: i can not create a "complete" list of destination addresses, even in multiple runs - if i do not have recursion. i can not know where jumps/calls are located. anyway i find it the best technique for me, as it takes same time as plain disassembling from start to end.
at least same number of calls to "disassemble_address".
the good thing nevertheless is that even now without jump tables, you get
perfect results for nearly all "standard applications, gcc binaries" - and for any situation - you get the unrecognized data displayed as
DB xx, xx, xx
from there you can look at the data and disassemble from any address you want. just "d address" "d address+1"... and you see, if there is code hidden.
so the goal IS to disassemble as accurate as possible, but maybe there is always a situation where i can not recognize code areas.
sthg like
call label1
label1:
call function
; after here lets say eax contains a calculated value
pop ebx
add eax, ebx
call [eax] ->>>>>>>> where to go from here

??
therefor function scanning would cathc the destination if it uses stack setup prologue. if not - it would be in the middle of
DB xx, xx, xx, xx, xx
using the debugger you will see the destination(s) and can then use
"d address"
to update your deadlisting. this is how i planned to use lida
cheers, 0xf001