Log in

View Full Version : Opcode length recognition


Latigo
April 25th, 2001, 12:44
Hi!
Currently im developing some kind of tool that has to parse opcodes.
When i started studying the subject i found that many disassemblers make intense use of look-up tables to find out about the which opcode is the one being read.
Now my question is..
Are these tables necessary? Isn't there any way to know , by examining the opcode encoding, how many bytes an opcode takes?
All i pretend is it's length!
Btw when i say 'opcode' i mean the opcode byte + the other bytes which make the complete instruction.

Thanks in advance!

Latigo

mammon_
April 25th, 2001, 13:22
You would do well to check Appendix B of the Intel Arch Software Developer's Manual, Vol II [which, if you're doing any sort of disasm work, you should have readily available by now ]. Personally I found that Intel's "method" for "organizing" their instructions had too many exceptions to be used reliably ... in my opinion the only way to get an accurate reading is to disassemble the entire instruction.

Anyways, Appendix B gives a bit-by-bit listing of the opcodes, and points out which bits are set to indicate the presence of a modr/m byte, disp, etc.

Now, at the risk of sounding horribly out of touch, why would anyone care about the length of an instruction and not about disassembly of the same? You just counting the number of instructions in a program for the sheer, unmitigated hell of it?

_m

MAK
April 25th, 2001, 14:16
I guess the lenght of an upcode is nice to know if u want to make intelligent tracers

Her is a package that contains what u need..
Credits goes to he author...

mo k
April 25th, 2001, 15:46
>Currently im developing some kind of tool that
>has to parse opcodes.
aren't we *all* ; )

>When i started studying the subject i found >that many disassemblers make intense use of >look-up tables to find out about the which >opcode is the one being read.
many? it is about the only fscking way,
dynamic programming beats the pants off
nested conditions, both interms of speed
and size, when it comes to parsing languages
with small constructs --integers in this case.

>Now my question is..
>Are these tables necessary? Isn't there any >way to know , by examining the opcode >encoding, how many bytes an opcode takes?
Unlike those programing languages where the
lenght of a string is encoded in the *header*
of the string, the intel instruction format
does not have an easy way to determine the
ending or the begining of a given instruction.
This is called a varibale lenght instruction.
Some CPUs have instructions of a predetermined
lenght, RISC comes to mind.
In intel, in order to do any analysis on the
instruction stream, you have to implement
all the logic necessary for a complete
disassembler. You cann't just wiz around
looking for a hot-byte, ala boyer-moore.

>All i pretend is it's length!
No, you either do it all the way, or never.
You will get, the lenght, the type, the
operands, their locations, their sizes, etc. etc.

Sir do you want to super-size your happy meal? ;P

Go read what mammon told you to read -trust him- and ask whatever other problems
you have at the newbies board,
the answers you will draw will be helpful to
them aswell.

P.S. Latigo, from what you have been asking
lately, i sensed that you are finally seeing the
light, keep up, coding is the way to go.

penfold
April 26th, 2001, 09:31
you could search the internet for

; LDE32 -- Length-Disassembler Engine
; FREEWARE
;
; programmed by Z0MBiE, http://z0mbie.cjb.net

that urls dead now , but the source code is availble somewhere ..

its a good engine, will open up the door into decoding opcode lengths for you .. only works with normal x86 opcodes, not fpu, not mmx, not 3dnow, etc .. and a couple of x86 opcodes it screws up on (test word ptr [xx], test word ptr [reg?], i think are the only ones)

just study intel docs, long and hard :-)

later, latigo

penfold (co cracker of codesafe :-)

Latigo
April 26th, 2001, 10:05
Hey guys thanks for the replies! Such an honour to get advice from leet chaps like you .

Mammon_: I'm coding a tool which adds self-modifying capabilities to the .code section of a given PE file. That is why i need to recognize opcodes (lengths). Or at least that is why i _believe_ i need to know

Mo k: It's good to know i'm finally seeing the light! Now that you say it, i'll trust m_

penfold: great thing! i'm gonna study that source.

Mak: you talking about what penfold pointed?

Bye and thanks.

Latigo

MAK
April 26th, 2001, 15:18
Yep
I´ve got lde32
But how the hell do u attach files?

tsehp
April 26th, 2001, 17:40
Quote:
MAK (04-26-2001 05:18):
Yep
I´ve got lde32
But how the hell do u attach files?


I made a recent global reinstall, so some settings disappeared, now you have 500k for one file.

my two cents : not unlike arthaxerxes who gave me the idea, I didn't wanted to reinvent the whell and used the available nasm c libraries to disassemble programs into my exe's . it's using some simple lookup tables for each opcode , args, etc... maybe you could also check this.

Latigo
April 27th, 2001, 09:41
Quote:
+Tsehp (04-26-2001 07:40):
Quote:
MAK (04-26-2001 05:18):
Yep
I´ve got lde32
But how the hell do u attach files?


I made a recent global reinstall, so some settings disappeared, now you have 500k for one file.

my two cents : not unlike arthaxerxes who gave me the idea, I didn't wanted to reinvent the whell and used the available nasm c libraries to disassemble programs into my exe's . it's using some simple lookup tables for each opcode , args, etc... maybe you could also check this.


Thanks Tsehp!
Where can i get these nasm c libraries you mention? in the nasm package?

tsehp
April 27th, 2001, 14:40
here they are, but you'll have to modify them a little to make them usable in c++

regards,

+Tsehp

Latigo
April 28th, 2001, 15:03
Quote:
+Tsehp (04-27-2001 04:40):
here they are, but you'll have to modify them a little to make them usable in c++

regards,

+Tsehp



Thanks mate!

MAK
April 30th, 2001, 14:00
well if u still need LDE32 here it is...