blind code analysis [Archive] - RCE Messageboard's Regroupment

ExOrienteLux

July 2nd, 2009, 09:57

Hi ppl,

Maybe there already was a topic about this. But i am lacking terms to search for this.

Brainstorm:
Say we want to analyse a memory region and we know there is executable code somewhere in there. But we have no reference points like, exported functions or an entrypoint. NOTHING. we have no idea, how the code and data is aligned in there.
What do we do? What do debuggers like OllyDbg, IDA do?

How do we know where a basic block of instructions begin?

My idea: Kind of brute force all possibilities and look which code makes the most "sense": Like relative jumps always have valid destinations and such.

BanMe

July 2nd, 2009, 10:04

NtQueryVirtualMemory...
MemoryBasicInformation..
read more about the ldr routines. search these terms please

regards BanMe

ExOrienteLux

July 2nd, 2009, 10:09

How would that help me? :S
This is what i have:

The base address of a loose memory region with executable access (and there is NO module connected to it or whatever)

Now i want to analyse this region. Want to find out if code is there or not, and whats the data.
I know that THEORETICALLY the case is undecidable (it could all be data or all be code or whatever), but i want some ideas for heuristic-methods on what would most likely be a good guess of a disassembly.

BanMe

July 2nd, 2009, 10:18

well there are vaious methods to discern if code or data is there you could use a breakpoint R/W on the entire region of memory.. you could precheck EIP in all threads and compare see if the address lands in the region a sign there is code there

. you could use the tool on uninformed.org memalyze to create a memory mirror for that region of memory.. you could read articles about "heuristic data detection" and "heuristic code detection" there are many things "you" could do, and as to how it would help you, learn to help yourself..

regards BanMe

ExOrienteLux

July 2nd, 2009, 10:34

Quote:

[Originally Posted by BanMe;81491] you could precheck EIP in all threads and compare see if the address lands in the region a sign there is code there

The downside of course: The code doesn't have to be called frequently. So "hard to catch".
Another idea was to disassemble everything with reference points and see if any executional path lands there. But then again, this is time expensive. Still im considering this.

Quote:

[Originally Posted by BanMe;81491]
. you could read articles about "heuristic data detection" and "heuristic code detection"

As i said. I am lacking of search terms. I am only finding articles for heuristic detection of malware...

Quote:

[Originally Posted by BanMe;81491]
there are many things "you" could do, and as to how it would help you, learn to help yourself..

Come on. Cut the newbie-greeting crap. :/
My question was not thaaaatt obvious or easy to research.
I think it's still an interesting and legit question. Maybe others had put some thought on this problem and can say what their basic idea is.
I am not asking for spoon-feeding.

BanMe

July 2nd, 2009, 10:44

It does not matter if the information desired is hidden under malware crap read through it and rip out what you need, and to patronize me while im helping you "brainstorm" is quite disconcerning..answers to questions and knowledge are not always one in the same. and if you are lacking in search "terms" I think you are lacking in searching technique..(I was also lacking in this area .. so no ill meaning intended) although there is a remedy http://www.searchlores.com a site by the late fravia+ that still holds valuable information and will probably continue to do so for a long long time. I suggest you refine your technique there and continue to read the "malware" articles.

ExOrienteLux

July 2nd, 2009, 10:51

Well don't get me wrong. But "malware heuristic detection" is not really about "code detection analysis". What i read there is more like behavioural analysis and api usage and all that stuff.
I shall just issue this bullet-sharp question: "How does OllyDbg conduct code detection on loose pages". OllyDbg does it, it works, and i want to know how. (i hope this question is permitted)

And it's not like i didn't look around. As to code detection heuristics i only came to this simple idea: "Kind of brute force all possibilities and look which code makes the most "sense": Like relative jumps always have valid destinations and such."
I found nothing else. So while i am searching around, i leave this question here. Maybe someone wants to answer, maybe someone doesn't.

And no offence. You didn't really answer anything, you just pointed me to research more. Ok, you mentioned some methods, but not that kind of method i wanted to use: A heuristic method. We have a block of bytes and i want to pull information out of this. No Environment, no realtime-analysis, just this.

BanMe

July 2nd, 2009, 11:03

Do you think I got answers to my questions when I wanted them, no I didnt. I had to read and read,then think and reread. So on and so forth until I was able to put things together myself.. answer will not come easily.. but if you work at it and show that you work at it, or are trying to work at it, more people are likely to give there opinions and input on it.

oh and btw why not use olly on this region and just cut the bullshit..

ExOrienteLux

July 2nd, 2009, 11:17

Quote:

[Originally Posted by BanMe;81497]Do you think I got answers to my questions when I wanted them, no I didnt.

Well, why do you answer in the first place? You could just have stayed silent. Eventually i would get no answer and i will resort to a solution for myself.
I was just taking a shot, jesus christ....

Quote:

[Originally Posted by BanMe;81497]
I had to read and read,then think and reread. So on and so forth until I was able to put things together myself.. answer will not come easily.. but if you work at it and show that you work at it, or are trying to work at it, more people are likely to give there opinions and input on it.

gosh, i am just trying to peek in. Maybe someone is like: "Hey, i had the same problem, this is my basic idea: bla bla"

THAT kind of answer would be interesting to me. And it's not all i am relying on. :S

Quote:

[Originally Posted by BanMe;81497]
oh and btw why not use olly on this region and just cut the bullshit..

Why learning maths when you have a calculator...

Well now this thread is completely ruined...
Maybe a mod can delete all posts except for the starting post? And you just stay quite. :S What about this?

darawk

July 2nd, 2009, 11:48

The simplest answer is that code doesn't spontaneously begin executing. Which means that somehow the processor has to be directed to it, which means you need to figure out how the processor is getting there. Of course it begins with the loader mapping whatever executable and running the TLS callbacks and the entry-point, after that you just need to trace the execution path until it gets to the region you're looking for.

You can write a mini-debugger that single-steps the process, or you can use branch tracing to make it faster or w/e. If done properly this method cannot fail because if the code is executing, then something, somewhere must have directed it to and there is no way around that.

ExOrienteLux

July 2nd, 2009, 12:19

Hi darawk

Quote:

[Originally Posted by darawk;81500]The simplest answer is that code doesn't spontaneously begin executing. Which means that somehow the processor has to be directed to it, which means you need to figure out how the processor is getting there. Of course it begins with the loader mapping whatever executable and running the TLS callbacks and the entry-point, after that you just need to trace the execution path until it gets to the region you're looking for.

Yeah, thats what i thought. :/
That's how OllyDbg does it though, right?

Quote:

[Originally Posted by darawk;81500]
You can write a mini-debugger that single-steps the process, or you can use branch tracing to make it faster or w/e. If done properly this method cannot fail because if the code is executing, then something, somewhere must have directed it to and there is no way around that.

Well to be accurate, i guess we would have to emulate. It could be some tricky method like calculating an address then doing "CALL EAX".
Simple branch tracing will get us nowhere in this case. :/
Of course you won't see this very often. But still, to be complete.

dELTA

July 2nd, 2009, 15:52

ExOrienteLux, the rules of this board are that you should state everything you have done yourself when you first post a question. Everything else will be assumed that you didn't do, and then you will both get suggestions that you might think are obvious, and waste the time of the people trying to help you with these.

So, follow the rules from now on, and drop the attitude when it comes to problems that are a direct result of your rule breaches...

naides

July 2nd, 2009, 16:11

My answer may be naive, or I do not get the point of your question, but there it goes. . .

If the code author, on purpose, made the "code" contained in this area of memory obscured, obfuscated, encrypted with decrypt-on-the-fly, I would venture to say that you have no hope of analyzing to code in static mode: There are way too many ways to hide code and data including virtual machine, p-code with arbitrary translation etc, so your only hope is catching the CPU reading the memory in question and hoping that EIP lands into your memory address at least once. Remember that with p-code and virtual machine interpreters, the line between code and data becomes pretty blurred. . .

On the other hand, if you assume (I would) that there is real, plainly executable code in there, I would start by looking for frequent "signatures" of code: For instance: Function call frame prolog:

PUSH EBP
MOV EBP, ESP
SUB ESP-. . .

Bracketed by Functions EPILOG

POP EBP
RET

and the variations peculiar to each of the common compilers.

Another heuristic is looking for system calls that land into the OS API, so searching for the byte pattern that translate into

PUSH E?X
PUSH E?X
CALL 77812354 ; An address that lands into the memory of the OS API (I assume windows)

may start helping you locate code stretches.

I have also read "The IDA PRO Book" by Chris Eagle, where he goes into some detail about using IDA to analyze files (I gess file in this case will be a dump of the memory in question) with very little initial information about the structure of the code contained in there, mostly for malware analysis purposes.

OR maybe,

I completely missed the spirit of your question. . .

BanMe

July 2nd, 2009, 16:17

ah a very fine point naides

To improve upon this point I would only add the SigSeek.inc
by Opcode0x90 is fine tool to be used in finding signatures.. it is easily translatable to C/C++ and im pretty sure opcode0x90 used a C++ version of it on a post he made on www.rootkit.com..I could be wrong but I know a example is there in the usage of it, It could be in asm though :O

here is the asm link
http://code.google.com/p/opcode0x90/source/browse/trunk/snippets/SigSeek.inc

regards BanMe

ExOrienteLux

July 2nd, 2009, 16:58

Thanks for your answers.

Quote:

[Originally Posted by naides;81511]
OR maybe,
I completely missed the spirit of your question. . .

No, you didn't.

darawk

July 3rd, 2009, 02:23

Quote:

[Originally Posted by ExOrienteLux;81501]Hi darawk

Yeah, thats what i thought. :/
That's how OllyDbg does it though, right?

That's a way that you could do it with ollydbg, I don't know that olly does that automatically. It'd be much more efficient to write your own little debugger to do it though.

Quote:

No, you wouldn't have to emulate it. I'm almost 100% positive that a CALL EAX would be considered a 'branch' by the processor's branch tracing mechanism. Now, a theoretical way to defeat a branch tracer would be to use code that just runs straightforward via self-modification, but my guess is that such code is incredibly unlikely in any real-world scenario. So, for all practical applications a branch tracer is probably a very effective way to solve this problem, but if you're worried about something obscure fucking with your trace you can do a true single-step with partial emulation. The only thing you would need to emulate is the popf instruction (to prevent the code from unsetting the trap flag) and any instruction that modifies the SS register (because traps are disabled for the instruction immediately following an SS modification - try this in olly, create a push ss / pop ss pair and hit single-step 2 times and 3 instructions will execute, and if the 3rd instruction is a branch that goes beyond the instruction immediately following it, olly will lose control of the thread).

disavowed

July 3rd, 2009, 18:56

ExOrienteLux, I would recommend two additional approaches:
1. OfficeMalScanner from http://www.reconstructer.org/code.html seems like it has the ability to heuristically find code in large regions of data. This seems like it's what you want. I haven't tried it myself, but I suggest you take a look (and let us know if it helps).
2. Load the memory region into IDA Pro, select all of it, then press 'C' ("interpret as code"

, then 'F' ("force"

. IDA will then try to disassemble the entire memory region for you. Once it's done, you can manually look at the code by scrolling up and down, looking for what appears to be valid code.

BanMe

July 3rd, 2009, 19:15

__asm
{
lea esi, RegionAddr
mov edi,0xcc
mov ecx,RegionSize
rep movsb
}
Do a Call to GetCallerAddress..in the Exception Handler..
.... //maybe... ;d
though.. that does ruin it..you find what calls it ;]

SiGiNT

July 8th, 2009, 23:10

Am I missing the point????? - f you want to see if it's actually code, couldn't you just double clck on the section, or one of the sections, in Olly's Memory window, right click on the dump and pick "disassemble" - but, that's way too simple to be the answer.

SiGiNT

DeepBlueSea

July 9th, 2009, 05:15

Quote:

[Originally Posted by SiGiNT;81645]Am I missing the point????? - f you want to see if it's actually code, couldn't you just double clck on the section, or one of the sections, in Olly's Memory window, right click on the dump and pick "disassemble" - but, that's way too simple to be the answer.

SiGiNT

Yes you are missing the point. The reversers eye can pretty easily spot code.
But he wanted to KNOW some automatic heuristic methods for own tool-writing purposes maybe!

I would go with complete disassembly of the whole process memory and see if some executable path leads to the section that should be analysed.

SiGiNT

July 9th, 2009, 09:09

Quote:

[Originally Posted by ExOrienteLux;81487]Hi ppl,

Maybe there already was a topic about this. But i am lacking terms to search for this.

Brainstorm:
Say we want to analyse a memory region and we know there is executable code somewhere in there. But we have no reference points like, exported functions or an entrypoint. NOTHING. we have no idea, how the code and data is aligned in there.
What do we do? What do debuggers like OllyDbg, IDA do?

How do we know where a basic block of instructions begin?

My idea: Kind of brute force all possibilities and look which code makes the most "sense": Like relative jumps always have valid destinations and such.

I see nothing here that mentions heuristic, or reversing for potential use in writing a tool, simply a request to find a way to analyze an unknown section, as for finding import tables, there are at least 3 ways I can think of, an OEP might be somewhat more difficult. Reading thru the thread, a lot of second guessing was going on, this grew from a simple request to a perceived greater intention.

SiGiNT - just the way it looks to me

My suggestion is simply a quick check, without a good reference alignment point the dis-assembly in Olly could be garbage.

DeepBlueSea

July 9th, 2009, 09:36

Ye well, i figured because he said: "heuristic method" and that means the theoretical logic behind an approach.

And he asked how debuggers do it and he gave an theoretical idea. So that kinda rules out that he wants just a practical method.

SiGiNT

July 10th, 2009, 09:21

I was in a nasty mood when I questioned the rapidly expanding scope of the original request. of course all of the suggestions are valid, it just seemed to me to be to be simpler then it was being expanded to - there are many signatures to use as heuristic approach, IE: "cannot be run in DOS" would indicate the file header, a mess of Unicode strings would indicate the resources section, and in between would be either code, data, or imports - for imports search in hex for recurring "FF25", of course if it's .NET, VB, Java or Python then it's much more difficult. even though It's been a while since I've been actively reversing due to my lack of time, (contract work sucks), these should still be a valid starting point. Hope I've been some help!!

SiGiNT