Log in

View Full Version : new linux disassembler


0xf001
August 2nd, 2004, 16:54
hi!

i am currently writing a new linux disassembler - based on mammon_s libdisasm.
currently it is in very early stage - but actually already can be useful. there are some interesting features to come (at least i think so, haha).

you might have a look at hxxp://lida.sourceforge.net

comments are very welcome. interested ppl of course can join!

cheers, your 0xf001

homersux
August 3rd, 2004, 16:39
looks good, I hope it's not too much to ask for a console CLI version of this tool.

OorjaHalT
August 3rd, 2004, 20:08
The download link seems down

homersux
August 3rd, 2004, 20:28
I downloaded it and tried it, it works ok but it got some bugs.
there is a cli backend that works on a console.

the bug is easy to reproduce,
as root,
dd if=/dev/hda of=mbr count=1 bs=512 (replace /dev/hda with the device that has the boot record, normally it's /dev/hda)
then
lida_end mbr -d 0 512
this doesn't seem to work.

I tried it with another file, it worked. so there is some bug in it.

0xf001
August 4th, 2004, 04:55
hi all!

thanks for your replies!

[OorjaHalT]
the download link has been corrected, anyway it is the standard SF location

i know there are bugs in it - i am working on them. I also yesterday got libdisasm to segfault. damn. i will now update the libdisasm (it uses 0.16, but only disassemble_address of it), this requires some rewrite.

[homersux]
looks good, I hope it's not too much to ask for a console CLI version of this tool.

ahm. hmmm. originally it is intended to be gui based. as you have the navigation there. nevertheless i plan to move much of the perl stuff to the backend. this is CLI based
as you already tried this will serve the most tasks.
do you have some comments how you would like to work with a cli version?

To your masterboot
lida_end mbr -d 0 512
this doesn't seem to work.

I tried it with another file, it worked. so there is some bug in it.

currently RAW files are not yet supported. this is a minor effort - I will include it today evening.
you see that if you use the gui and try to open a raw file

------

thank you for trying out, definitely there is much to do and I will update as fast as I can. It is 0.1 and not intended to be stable or safe to use.
i have already a long todo list. next to come is some cryptoanalysis, automatic disassembling of data section regions and finaly flow analysis.

thank you again and please keep posting everything you encounter - this helps a lot!

lifewire
August 4th, 2004, 07:09
the program looks nice. what kind of cryptoanalysis can we expect?

0xf001
August 4th, 2004, 10:01
thanks lifewire

basically in the first run there will be a pattern based "heuristic" scanning like kanal does. This should find common patterns which are used as array initialization for certain algorithms. i am trying to make the search more "fault tolerant" or "fuzzy", so that slight changes to the standard values are recognized. Researching these patterns takes some time of course

on the other side - what is important?
you want to know which algorythms are used. and where. on linux - most likely programs will not implement algorythms - they will more likely use openssl functions. So possibly the above mentioned scanning is not as valuable as I initially thought. But anyway - I think in the future there will be more efforts put into SW security on linux side - for eg commercial products.
PPl will probably try to make finding algorythms harder by coding themselfes, or at least change the default arrays. This is where the above mentioned scanning should help.
Anyway I also ould like to provide an automated "summary overview" of functions in which you are interested.
So even if a program just plain functions like md5_init, ... linked to openssl, it should be displayed.
What I also plan is to try to do some heuristic fingerprint scanning for typical algorithm implementation. This will analyze the code sequences, not the datablocks.

Any further comments - please let me know! As mentioned interested people are welcome to overtake certain parts , or submit ideas, fingerprints, ...

cheers, 0xf001

Polaris
August 4th, 2004, 11:52
Quote:
[Originally Posted by 0xf001]Any further comments - please let me know! As mentioned interested people are welcome to overtake certain parts , or submit ideas, fingerprints, ...

cheers, 0xf001


Your projects looks really good and well-promising... Please include the possibility to have plugins and a SDK, it will make easier to expand and develop new ideas.

Byez,

Polaris

0xf001
August 4th, 2004, 19:16
heya!

i have uploaded a new version. this includes bugfixes, mainly:
libdisasm segfault when decoding certain instruction and
displaying the disassembly:
now there is a difference between
mov eax, 80808080
and
mov eax, [80808080]
sorry for that of course in second case the value stored at the memory address is displayed.

btw did you realize that ldasm does not make a difference between both instructions?
this is really annoying.
even objdump does
(mov $0x80808080, %eax and mov 0x80808080, %eax)

cheers, 0xf001

0xf001
August 6th, 2004, 06:47
hi again!

i have now uploaded lida-0.1.4.

this includes the cryptoanalyzer which detects currently typical implementations of
ripemd160, md2, md4, md5, blowfish, cast, des, rc2, sha(1+2)
algotithms

it is definately now getting more and more nice to work with it

cheers, 0xf001

Bengaly
August 6th, 2004, 16:16
hey 0xf,

does lida have first pass analyzer? (aka, code-flow simulation)

0xf001
August 7th, 2004, 06:46
hi ben,

unfortunately not yet, it currently only plain disassembles forward the whole section and remembers certain addresses as jmp/call destinations, and exported symbols. at least for usual gcc created executables this is usable, but definately i need to include the control flow analysis ...
besides that also the next step should be together with flow analysis - the seperation between code and data - and to somehow let the user mark address ranges to set the "type" or similar.
i am currently starting to implement it, but have not much time. for the
next version (except bugfixes) i hope to have a first basic implementation
but to make it intelligent i think that requires a lot of work ... we will see

cheers, 0xf001

0xf001
August 9th, 2004, 20:09
hi ben!

short update: lida now does code flow analysis
it works great (i am impressed by myself, haha)
it traces during disassembly (which recursively goes through all possible branches) and keeps track (remembering start of instructions + their memory usage) of what it already disassembled, ... also if possible in this run indirect addressing is covered. so pass 1 == disassembly in this case
therefore i found a very efficient method (no stucts, or storage of address values or similar needed, hehe!)
second "pass" is to scan for function prologues and for each found one repeat pass1 starting at this address.
third pass is to examine the left holes
currently i am thinking of how to best find unreferenced "code regions" / "functions" as this i see as the major key for "not forgetting" to disassemble certain ranges. i know which ranges i have not yet processed, but just disassembling them could result in disassembling "data blocks". so i want to put there "some more analysis" - before attemting to disassemble.

the nature of my "pass1" implementation also immediately tells if there is some "jump into the middle of a previous processed instruction" - which is used as old antidisassembly "trick".
so i am also implementing a logic which tries to automatically overcome that
and let you probably view both disassemblies, somehow specially marked.

while implementing i find more and more fun on it, and cool it is still extremely fast (i never repeat any already processed address), oltough algorithm has totally changed

greets, 0xf001

Bengaly
August 10th, 2004, 16:01
hi 0xf,

great!
if i had linux i could test it, unfortunately i don't.
but keep up the great work!

0xf001
August 10th, 2004, 17:41
hey ben!

i have released now v 00.02

this includes the control flow disassembling engine, and also the gui is pretty much updated, so i have now the typical seperate windows for
strings and symbols where you can click on the list items to jump to the address
also in sum i am running over 5 passes now. basically this is

1 - recursive disassembly from entry point (following all branches, stepping into calls)
2 - a "heuristic" scan finds the main() function, for glibc binaries, repeats
pass 1 from there.
3 - repeat pass 1 from the start of all executable sections
4 - scanning for function prologues and repeat pass 1 for each
5 - (optional) disassembling of "caves", this disassembles all bytes between
already known code blocks
6 - for all caves that are still existing (this can be when disassembly of the
end of the cave would overwrite a prev disassembled instruction)
display the bytes in DB xx, xx, xx ... form
if 5 is not done, the whole cave is displayed so.

cheers, 0xf001

btw i have new screenshots for you to see the new gui if you are interested.
getting linux on a computer btw is very easy nowadays - hehe
i myself switched totally now - at least for work. but that is another topic.
once you have linux, i hope to already have a good disassembler for you

cheers, 0xf001

Bengaly
August 11th, 2004, 04:15
hi 0x0f,

"i hope to already have a good disassembler for you"
hrm, for me?.. for the linux community

do you trace control flows of indirect jumps, jump tables (i.e: jmp eax, jmp [eax+xx]).
recursive flow is good way, but this takes x amount of time.
sometimes the recursive flow can go deep soo much that the stack could not bear it and will crash (theory, but its possible if i am not mistaken)

i knowi can install linux, but i dont have hdd space for it.
keep up the good work.

0xf001
August 11th, 2004, 07:14
hey ben,

yes, i take care of indirect jumps, but not yet jump tables.
i even think of implementing analysis to check for call [eax] - not always possible of course.

some function scanning and disassembling of caves for most executables yet creates same results as using objdump which is probably more luck at the moment.


Quote:
recursive flow is good way, but this takes x amount of time.


no, this is not the case. at least i do not understand why it should? i on the fly update a map where i immediately see, if i already disassembled areas.
so 3 calls to the same functions: it is only once stepped into.
also call function+27: the destination address if it is marked as already disassembled i do not need to go there anymore.
i create a buffer with the size of the executable. for each instruction i disassemble i set at the address of the instruction (offset) the first byte to eg 1. for the remaining bytes of the instruction i set their correspond buffer values to 2. so if i disassemble offset 345, and my buffer[345]==1,
then i was already ecactly there.
using the map i can see even more:
i see if i want to disassemble at a point where i previously have disassembled, and the byte where i want to disassemble previously was a byte in the middle of an instruction.
meaning if the buffer[345]==2 then i attempt to disassemble the "middle" of a previous processed instruction.

example:

addr001 instruction -> buf[001]=1,buf[002]=2,buf[003]=2
addr004 instruction -> buf[004]=1,buf[005]=2
addr006 label1: instruction -> buf[006]=1,buf[007]=2,buf[008]=2
addr009 instruction
at another place:
jmp label <------- here i immediately see that i have already dis-
assembled exactly this address :
buf[006] == 1
....
....
jmp label+1 <-------- here i immediately see, that i have already
disassembled the address, but the concerned
instruction i have disassembled before started
one byte before, so i am disassembling
the middle of an instruction
buf[007] == 2

cool, ey

Quote:
sometimes the recursive flow can go deep soo much that the stack could not bear it and will crash (theory, but its possible if i am not mistaken)


hehe. you are right. i tested it with nesting of over 5000 levels, and i only once segfaulted, because i was using a temp buffer as local variable, whcih i made to a global var, since then no problems.
but in theory it is possible of course!
in praxis i did not find any executable yet where it happens.
once i have enhanced the algorythm enough i will take time to build
up own "recursive" data structure for the local vars.
i really see no more comfortable way to disassemble than recursive.
i think you need to do so.
the problem is: i can not create a "complete" list of destination addresses, even in multiple runs - if i do not have recursion. i can not know where jumps/calls are located. anyway i find it the best technique for me, as it takes same time as plain disassembling from start to end.
at least same number of calls to "disassemble_address".

the good thing nevertheless is that even now without jump tables, you get
perfect results for nearly all "standard applications, gcc binaries" - and for any situation - you get the unrecognized data displayed as
DB xx, xx, xx
from there you can look at the data and disassemble from any address you want. just "d address" "d address+1"... and you see, if there is code hidden.
so the goal IS to disassemble as accurate as possible, but maybe there is always a situation where i can not recognize code areas.

sthg like
call label1
label1:
call function
; after here lets say eax contains a calculated value
pop ebx
add eax, ebx
call [eax] ->>>>>>>> where to go from here ??

therefor function scanning would cathc the destination if it uses stack setup prologue. if not - it would be in the middle of
DB xx, xx, xx, xx, xx

using the debugger you will see the destination(s) and can then use
"d address"
to update your deadlisting. this is how i planned to use lida

cheers, 0xf001

0xf001
December 5th, 2004, 17:22
hi!

there's a new version of lida available. i have added

- bookmarks
- dump disassembly to text file
- representation of data sections

and removed
- bugs hehe

hxxp://lida.sourceforge.net

check it out

btw you probably might find the information i give on the website stone-age old about the objdump disadvantage, but - i have been asked that really