PDA

View Full Version : a note on signed and unsigned byte arrays


sna
06-06-2003, 09:04 AM
when converting from assembler into anything else we need to maintain the byte sign handling. the elements that form a string or simple array are usually byte-sized (8 bits). these elements will be treated as either signed or unsigned.

for example, a string with bytes treated as unsigned:
where esi is a base pointer and ecx is the index.

movzx eax, byte ptr [esi+ecx]
the movzx instruction (move with zero-extend) extends an 8-bit value to a 16-bit value, or an 8-bit or 16-bit value to a 32-bit value by padding the high-order with zeros. the result in this case is that al holds the source byte and the rest of eax is cleared.

on the contrary, when the source is treated as signed, the msb (most significant bit) of the source is used to extend the source value.

movsx eax, byte ptr [esi+ecx] ; move with sign-extend

now, had the source byte been signed, the result would have been that al is the source byte unchanged, and the rest of eax's bits are set to 1. had the source byte not been signed, the result would have been the same as if movzx had been used.

we'll look at a couple of actual cases to help clarify this further:

1) movzx eax, byte ptr [esi+ecx]
al will always hold the source byte and the rest of eax will always be cleared.

2) movsx eax, byte ptr [esi+ecx] * *; source byte is <= 127 dec
al will hold the source byte and rest of eax will be cleared.

3) movsx eax, byte ptr [esi+ecx] * *; source byte is > 127 dec
al will hold the source byte and the rest of eax's bits will be set to 1.

hope this makes sense and helps someone out there..

w00tz`
06-19-2003, 10:40 AM
Actually, that is pretty interesting now that you mention it, but have you noticed how OllyDBG handles the displaying of the bytes in stack ? it doesn't handle that very well,

for instance



* *movsx eax, byte ptr [esi+ecx] * * * * * * * * ;will move only to the lower parts of the register



so if you say wanted to move the ASCII character 'w' into the register eax, and in eax it holds

FFFFFFFF , OllyDBG will move the hex ascii converstion (77) into eax and the remaining is

FFFFFF77 <-- //did not clear out the register...its a bug but it can be patched, good thing you mentioned that though in your article about the register has to clear otherwise some person might be confused :-)

ciao

andyistic
01-30-2004, 02:11 AM
The notion of signed characters is annoying.
Why must we have them?

Bytes used for characters should always be unsigned.
Signed bytes is a common source of negative results leading to headaches.
Just wastes time trying to cast and arrange everything so that we retain positive results.
Just say NO to signed characters.

kw
01-31-2004, 06:35 PM
The notion of signed characters is annoying.
Why must we have them?

Simple, because char's are just byte sized integer values, and not necessarily characters as such. I agree with you though, that char is usually used to indicate letters, so in that sense it would be nicer to have it default to unsigned. Problem is though, that would be VERY inconsistent with something like int, which defaults to signed, and needs you to specify 'unsigned' to overwrite that default. char is another basic filetype, thus for consistency reasons it should probably remain the way it is. Otherwise you'll have people that forget to specify 'unsigned' for ints, for example ;)
anyway, if you find it easier, use:
typedef unsigned char uchar;
and only use uchar from then on, while coding.. You'll have no further trouble ;)

Greets,
kw