Log in

View Full Version : how to distinguish a function from the calling program in assembly


cse_india
May 30th, 2006, 07:44
1 question please:

how to distinguish a function from the calling program in assembly.
means are there any specials symbols for the starting and beginning of the function when we see a particular program disassembled?
i can understand functions and the calling program by the special braces{ } before and after the function in c++.
is there in assembly like this too?

sorry if the question is stupid.but but even an advanced cracker would have started from a silly question.

dELTA
May 30th, 2006, 09:36
Short answer:
No, there isn't.

Somewhat longer answer:
If the program is compiled from a normal higher-level language source code, there are several relatively good signs, e.g. that a place targeted by a "call" instruction is probably the beginning of a function, the function prologue/epilogue sequences for setting up the stack in the beginning/end of some/most functions, and return opcodes (although these can occur at many places inside a function, so only the last one can be the end of the function). Also, functions can sometimes be aligned in memory, and the space between the end and beginning of aligned functions might then be padded e.g. with int3:s or other recognizable bytes.

None of these signs are 100% sure though, and when the program is written in pure assembly to begin with, all bets are off.

Now, please someone feel free to fill in the even longer answer.

naides
May 30th, 2006, 10:22
A little more detail:

X86 Asm included two opcode instructions that were suppossed to be the ASM equivalent of the C { and }

those were ENTER for {
and LEAVE for }

but they are not very efficient so rarely do you see them produced by optimized compilers

if a program was compiled froma a high level source, the begining of a function call is so distinctive that a good disassembler can point it out for you

The instruction series:

push ebp
mov ebp, esp

is typical of a function start

followed by opening space in the stack for local variables

sub esp, C

;which would open three dword space for three 32bit variables


the epilog of a function call is usually marked by something like


mov esp, ebp
pop ebp
ret

but as Delta said, there are many variations on the theme, and there is no 100% certain way to "bracket" the code belonging to a function in a disassembly listing.

in OO programs, callback structures, virtual calls, code structure can get quite more abstruse. . . Not to mention when the programmers want to obscure the flow of the program to hide protection schemes.


I guess if you proposed a more specific example of what you are looking for, someone could provide a more enlighting answer

LLXX
May 30th, 2006, 20:38
IDA recognises function extents quite well, based on the idea that the destination of a CALL instruction should be the start of a function. This works for most compiler-generated code, as compilers are quite stupid and only generate code in fixed patterns.

As noted above, a typical entry sequence is push ebp | mov ebp esp, but that is only if the compiler is using a standard ebp-based stack frame. Most newer compilers don't use ebp-based stack frames, they access via esp. In that case, ebp will not be pushed, and the start of the function can be identified by a sub esp xxxx to reserve room for local variables. However, if the function does not have local variables, there is going to be no clear start - though one can usually determine the function's start since it follows a RET instruction and is the destination of one or more CALLs.

In summary, compiler-generated code follows fixed and easily recognisable patterns, which both humans and machines can understand well.

If the program is written in Asm, then there is no guarantee of anything - this is because functions are more a feature of HLLs than Asm, as in most HLLs like C/C++ one cannot jump from one function into another, whereas in Asm there is no such restriction. In such a case, attempting to find functions may be equivalent to trying to decompile HLL code from any binary program, an impossible task. A simple example illustrates well:
Code:
call $+3
xchg al, ah
cmp al, 10
sbb al, 105
das
int 29h
ret
The above code is IMPOSSIBLE to represent in any language of a higher level than Asm. (Extra credit for figuring out what it does )

WaxfordSqueers
May 31st, 2006, 13:41
Quote:
[Originally Posted by LLXX]
Code:
call $+3
xchg al, ah
cmp al, 10
sbb al, 105
das
int 29h
ret
(Extra credit for figuring out what it does )


I'm curious:
the CALL $+3 seems to be calling the xchg al, ah and the ret seems to be returning to the same place, So, it seems we have a perpetual loop. The int 29h is an undocumented DOS instruction that gives fast output to the screen and whatever is in ah is output.

The cmp al, 10 would seem to be comparing the ASCII character 10, which is a -> (right arrow). If that's right, I'm lost from there on. Obviously, a cmp should have a decision after it and the sbb is looking for a carry flag.

The other option is that 0x10 is decimal 16. The cmp subtracts 0x10 from al. If the value in al is less than 0x16, you'll have a carry.

Whatever is in al gets 105 subtracted from it. I don't see how you can subtract 0x105 from al, which is an 8-bit register. If it's subtracting from ax, the only reason I can see for that in relation to ASCII characters is to convert from Unicode, or another standard. Then again, how would you break out of the loop? It seems like a perpetual loop set up to affect the carry flag, that will affect other code outside the loop.

The 0x105 subtracted from ax would clear ah and subtract 5 from al. In that case, it might be converting hex to decimal....a wild guess.

The das is converting to BCD. Is this a clock, or a counter?

Admiral
May 31st, 2006, 14:09
It's not a perpetual loop. The CALL returns to the XCHG, so the code will execute twice then return to wherever happens to be sitting at the top of the stack. Also, the first two hardcoded values are decimal, so the ASCII byte is a line-feed and the 105 becomes 0x69

The three flags modified by CMP are altered by the following SBB and DAS so it stands redundant. It looks like we're 'fast console' outputting two arbitrary bytes (as EAX is undefined at the start of the code) depending on the initial value of AX, then returning.

Hence I conclude that LLXX has just thrown together some instructions that can't be produced by any known compiler and sent us on a wild goosechase

Regards
Admiral

LLXX
June 3rd, 2006, 03:16
I'll just repost again, the forum seems to have deleted my previous post.

I was missing an AAM 16 before the code. The actual sequence I had in mind was
Code:
aam 16
call $+3
xchg al, ah
cmp al, 10
sbb al, 105
das
int 29h
ret
This is actual code, a routine used to output the value of AL in two hex digits (higits) to the standard error device. It is a common idiom in the demoscene and Asm competition, where code efficiency is highly desired. Similarly,
Code:
push ax
call $+5
pop ax
db 169
xchg al, ah
aam 16
xchg al, ah
cmp al, 10
sbb al, 105
das
int 29h
ret
outputs AX in four higits, but by calling directly at the aam 16 instruction it duplicates the functionality of the fragment above, thus hex-byte-out and hex-word-out are both possible with the same routine

I chose these two examples because they are practical real-world code (although I'm a bit saddened by the fact that it wasn't used as much as it should've been in programs that needed this functionality - I've reversed a few hex editors, and have yet to find one that does this) that also shows how the concept of "functions" is not present in pure Asm; they are only a feature of the HLLs.