Log in

View Full Version : Backwards disassembling


squidge
June 15th, 2003, 17:00
Anyone know a good way to disassemble in reverse, as in, from (say) 402000 backwards, but keeping as close to the intended disassembly as possible ?

I've tried a few methods, but none of them are particularly brilliant and almost always get out of sync very easily.

The first approach I've tried is going back from the current address by 16 bytes (max instruction size), and then coming forwards, attempting to disassemble, until I get a valid disassembly which is the same size as the number of bytes I've gone back to try and find it. Eg. I go back 16 bytes, don't find a 16 byte opcode, so go back 15 bytes instead. Keep going until I eventually go 2 bytes back and find a two byte opcode. However, this can get out of sync easily as a trailing null on one instruction can screw up a push that comes after it for example.

The second method was to go 16 bytes back and try and disassemble from there, keeping a list of all the addresses seen up until the current address, and then simply take the last seen address as the previous address. Doesn't work too well, as if the instruction 16-bytes back is some data from a command, then the last disassemble is likely to be bad too. It gradually gets insync as you go further up, but it does look ugly as it does so.

Does anyone have any alternatives to these ?

Thigo
June 15th, 2003, 18:34
I think I remember an article on www.codeproject.com about the stack. It did backwards diassembly to find the 1st instruction of the current call. You can find it easily I think.

tgodd
June 15th, 2003, 23:15
In order to get the return address then fetch the called address from the disassembly from before the return address would be ideal only if you knew the stack adjustment size.

How can you tell parameters (ie. Pointers) from return addresses?

Sorry squidge.

I do not know that there is any truely reliable way to do what it would appear that you are trying to do.

You will notice that even softice has it's problems with this when you page up several times.

However.

Assembly language is like any other.
There are patterns.

If you check back 1 byte you may or may not have a valid opcode, go back 2 bytes again the same may or may not be valid.
Let's say you went back 16 bytes. Not all 16 possible opcodes would be valid, and out of the ones which are valid some may have a greater likelyhood of being the correct opcode based upon the preceding known opcodes.

This would require a statistical study of opcode pairing and combinations, and writing alot of code to accomplish what still may not be 100%.

Processors where never designed to be clocked backwards.

Let me know what you turn up on this one.

Good Luck!


tgodd

Maldoror
June 16th, 2003, 02:33
Yes, Thigo is right
I remember the article too.
Here it is:
http://www.codeproject.com/tips/stackdumper.asp

Greetings!
Maldoror

dELTA
June 16th, 2003, 11:46
Can you tell us more exactly what the situation is squidge? Maybe we'd be able to help you better then.

Is this in the code section of an executable? One little trick could be to search backwards for the typical signature of a function prologue, and then disassemble forward from this one until you (hopefully) reach the target address. This will of course only work if the code is located inside a function with such a prologue.

I would think that searching backwards for common instruction opcodes like different kinds of jumps, xor reg,reg and similar and then disassembling forward from these to the target address will be quite a simple but yet very efficient method, which will work most of the time. If at some point the disassembly doesn't make sense, go back to the next such opcode, and try the same with that one. The statistical chance of getting two such subsequent opcodes that are really just operators to another instruction is not big at all I'd say, and it should be fairly easy to see if the disassembly makes sense or not too.


dELTA

squidge
June 16th, 2003, 14:15
Thanks for the replies. The code at CodeProject is almost identical to my own, but they don't go back as far, so more mistakes.

The code is required so that the user may view a disassembled listing on screen and be able to scroll backwards and forwards through that listing.

I've got a reply from bitrake over at the Win32asm forums, and his suggestion just going back further and letting it synchronize itself seems to work quite well.

Here's what he said for anyone interested:

Quote:

Given a long enough string of bytes to disassemble the algo will stablize to actual instructions, but this is assuming all the bytes are to instructions. A simple way is what you are already doing, but I'd back up further. Assuming the average instruction length is 8 bytes (pessimistic on purpose) then your only backing up 2-3 instructions -- not enough room for mis-alignment to syncronize. Go back 64 bytes. Begin disassembling and if the bytes at the present position is not an instruction then increment the pointer and try again. This will syncronize to the instruction boundaries.


It certainly seems to work, and since it's a user viewable thing, I've just added some keyboard shortcuts for inc/decrementing the instruction pointer by one byte just incase the result looks garbage.

dELTA
June 16th, 2003, 18:13
Cool, what kind of program is this for anyway? Any tool you will publish later?


dELTA

squidge
June 17th, 2003, 02:28
It's for the much redesigned version of RTA, and will certainly be released through this board later.

sgdt
October 18th, 2003, 19:53
Here's my 2 cents, coming from a complete newbie. YMMV.

Most compilers start functions on 0x10 boundries, and have a lot of jump targets at least on 4 byte boundries. It might pay off to do a quick check for push opcodes in those areas.

Second, you can easily do a scan for numbers that represent addresses in processes space, keeping a list of them. do a quick trace for jump/ret termination of each number, tossing ones that would generate bad opcodes in the trace. If the programs '.reloc' section is in tact, you can pull this list from here. Anyway, this gives tons of good positions for code starts, and from there, when you rip a function, add any branch/call/jump targets you find.

With this approach, there are two keys to speed:
1. How fast you can add a target to a list. Sounds simple, no? Lots of potential target addresses mean you don't ever want to "realloc" your list, or if you do, make it for like a 1000 or so address target at a time. What *I* did was process the file twice: First, to get an approximate number, and then the real deal.

2. How fast you can trace a routine. The most expensive, for me, part was looking at the SID byte. For me, I found a look-up-table worked best. Again, YMMV.

Anyway, MMX compare byte will allow you to instantly detect potential striings and their lengths, if you decide to add that functionality. Agner Fog has a very nice article on the subject, albeit dated.

Another thing to look at is IDA. The 'DOS' version isn't that big. Running Intels VTune against it (in sampling mode) will point out all the cool parts almost instantly.

Oh well, it's just a thought. Sorry for butting into the conversation.

EDIT: SID is typo, should be SIB. www.sandpile.org for invaluable info.

Aimless
October 20th, 2003, 23:46
You might try reading the source code for BORG, by Cronos.

Have Phun

Zwyzum
October 31st, 2003, 19:35
Hi. I'm pretty new here.

I guess the problem should be seen from an user view.

As a user I would like to view the disassembled code from the address I choose. I'm sure we're all intelligent enough to understand if the first instruction we see on the screen makes sense or not. Also, should the first instruction be disassembled the wrong way or not disassembled at all, the rest of the code would be completely silly and any decent reverser will scroll the code up a bit (or better a byte).

I know this is not a solution, but I think you shouldn't neglect this opportunity.

Bye
Zwyzum