Log in

View Full Version : Good binary code profilers?


dELTA
February 11th, 2008, 13:27
When the subject of profilers came up briefly in some discussion here on the board a while ago, I remember catching myself feeling surprised that they are practically never mentioned in reversing contexts. Coverage tools like Paimei/pStalker are sometimes (but rarely) mentioned in reversing contexts, and I guess that compared to the more complex profilers, these code coverage tools are also the most natural (and quite efficient too, check out Paimei/pStalker if you haven't already: http://www.woodmann.com/forum/showthread.php?t=10851), but for some purposes, a more profiler centric tool would be more efficient, e.g. in the event of pin-pointing some code that consumes lots of CPU power (e.g. a bug or other suspect piece of code like this one: http://www.woodmann.com/forum/showthread.php?t=11302) or when you want to efficiently pinpoint e.g. some encryption/decryption, checksum code or similar, where the same code blocks are hit a very high number of times during a short period of time. And of course, the target will be an executable for which we don't have the source code.

My Google searches about this subject have been hard to make good results of. Partly because of the ambiguous "profiler" word, but mostly because most profiler software seems to be primarily aimed and centered around analyzing programs that you already have the source code for. Also, the area of code profiling (let alone binary, source code-less, code profiling) is so small in relation to other areas of interest, that it is easily drowned even more in irrelevant search results, and this also makes it very hard to find out which, if any, products are popular or good within this field.

So, this is an excellent time to consult the vast experience in the areas of debugging, programming and analyzing code that is present on this board, by asking: Which tools do you use and/or recommend for binary profiling as described above?

To clarify: What I'm primarily looking for is logging of code execution hits on the basic block level, with hit counters and sorting in decreasing order of the most frequently hit code blocks (possibly of the approximate kind, i.e. it's not necessary that the hits are counted exactly by means of breakpoints, many profilers use sampling techniques too, to speed up the process at the cost of more approximative results).

Any good tips or ideas, anyone?

Admiral
February 11th, 2008, 14:50
I entirely agree that low-level profiling is something of an undiscovered gem in the RCE trade, but I think that this also explains the absence of any particularly strong tools for the job. I have been using AMD CodeAnalyst for a little while and it's pretty good at what it does, though its results are obviously at a loss compared to the equivalent generated with accompanying source-code. Nevertheless, the profiling method successfully performed a very tricky task in almost no time when I needed to isolate QuickTime's decryption routine, though admittedly this particular example is a large block of code that was quite well isolated from the remainder, making for easy identification using EIP samples alone.

It shouldn't be too difficult to produce a heuristic block-based profiler using a little static analysis, so I'd be disappointed, though not surprised, if none exists. If this is the case, it'd certainly a project to consider home-brewing. So many ideas, but so little time .

For the record, AMD CodeAnalyst only works with AMD processors. The equivalent for Intel is vTune. Each is available on the respective website for free limited use. It should also be pointed out that these two profilers differentiate themselves from the pack as they sample in kernel-mode, rather than using the Windows debugging API, which makes them indispensable for inter-process and driver-based work.

dELTA
February 11th, 2008, 15:58
Thanks for the tip Admiral! VTune was already in the profiler category of the CRCETL, and I now also added CodeAnalyst:

http://www.woodmann.com/collaborative/tools/Category:Profiler_Tools

I guess that should cover the initial needs for sampling-based profilers. Just like you though, I'd be very interested in getting my hands on some breakpoint-based (of INT3, or even better, memory access breakpoint type) basic block level profilers too, most likely available in the form of OllyDbg or IDA plugins. Hasn't anyone around here ever heard of such a tool? In that case please speak up!

Hmm, the Conditional Branch Logger OllyDbg plugin (http://www.woodmann.com/collaborative/tools/Conditional_Branch_Logger) should be relatively easy to turn into such a thing in the worst case...

Also Admiral, very cool to hear that you had already used the CodeAnalyst profiler for one of the first things that came to mind when I thought about possible areas of use for profilers in the reverse engineering field, i.e. pinpointing crypto code.

Aimless
February 11th, 2008, 17:16
When you say profiler, are you talking about TIMING or about COVERAGE?

I gather for most RCE purposes, COVERAGE is what we are talking about. If so, IDA -> Debug -> Trace works as a "crude" profiler.

Have Phun

dELTA
February 12th, 2008, 06:12
As mentioned above, I'm talking about coverage + hit counters per basic code block, which in some ways I guess could be considered as, well... timing.

The idea is to pinpoint:


First the exact code that was hit to begin with (coverage).
Then how many times each code block was hit, in order to be able to differentiate the interesting code even more, e.g. in order to be able to pinpoint intensely used code blocks like crypto code inside loops, or the other way around, code that you know will only have been called once during a certain period of time).


In the coverage step, filters like the ones in Paimei/pStalker are really great and useful, I'd just like to combine this coverage functionality with code block counters/timers too, see what I mean?

Or the really short answer I guess: Timing

dELTA
February 12th, 2008, 07:53
The following is a pure code coverage (i.e. one break/logging per code basic block) plugin for IDA, written by Ilfak:

http://www.woodmann.com/collaborative/tools/CoverIt

For the reasons mentioned in the the accompanying article (http://www.hexblog.com/2006/03/coverage_analyzer.html), it might not be very easy to convert into the counter-logging kind though:

Quote:
[Originally Posted by Ilfak]Since we do not have 'real' breakpoints that have to be kept intact after firing, the logic becomes very simple (note that the most difficult part of breakpoint handling is resuming the program execution after it: you have to remove the breakpoint, single step, put the breakpoint back and resume the execution - and the debugged program can return something unexpected at any time, like an event from another thread or another exception).
So please keep the tips coming for such a plugin/tool!

roocoon
February 12th, 2008, 14:25
If you're interested in non-free production tools, there's AutomatedQA'a AQTime, IBM's ProfilerPlus and a few more that aren't so mainstream.

dELTA
February 12th, 2008, 18:17
Thanks for the tips roocoon! I've actually looked at AQtime before, but didn't find any information at that point that indicated it would be good, or even useful, for programs that you don't have the source code for? Do you (or anyone else) know if it's useful for such a situation at all (which is, as stated above, the main objective for my inquiry here). While looking at it again now, I did find that it had some kind of "disassembler" feature, but nothing in that information really indicated that it was useful for profiling purposes, mostly looks to be for viewing assembly code at first glance (http://automatedqa.com/products/aqtime/aqtime5/disassembler_panel.gif). But it is of course very possible that is can be useful for this too, just that they don't push it in the marketing material because most people aren't primarily interested in that.

So, does anyone have experience with AQtime and can say if it's useful at all for non-source code situations?

About the IBM profiler, after starting to doubt my Google skills there for a moment (and being quite annoyed at Adobe for apparently choosing the same name for their color calibration whatever thingy ), I finally concluded that it was rather you who had mistaken its name, which should be "PurifyPlus" rather than "ProfilerPlus", and that in order to make things even worse, it was not created by IBM, but rather by Rational, and just included in the deal when Rational was acquired by IBM a while ago.

Anyway, I did at least find a well-hidden comment in the PurifyPlus marketing material saying that it "does not require access to source code and can thus be used with third-party libraries in addition to home-grown code", which bodes well, even though I could not find anything more about this feature or any related aspects at the moment.

So, here too, does anyone have experience with IBM/Rational PurifyPlus, and can say if it's useful at all for non-source code situations like the ones I inquire?

Also roocoon, I'd be glad to hear about all those other "not so mainstream" profilers that you are referring to! When it comes to our area of interest, such tools can often very well be the best ones.

Everyone else are of course still welcome to post any additional tips or suggestions related to my initial inquiry too!

Aimless
February 12th, 2008, 22:04
AQtime DOES NOT profile binary. At best, it takes the API dependencies... Ask someone who has tried it from its old v3.

Intel VTune v5 USED to have PURE ASSEMBLER profiling (Yeah!). However, since v6, they mysteriously discarded that feature. Don't know why.

Rational PurifyPlus NEEDS source code, OR an executable compiled in Visual Studio with the profiling information on.

Numega Truetime & Truecoverage (same - need source code or just view the APIs called)...

Have Phun.

roocoon
February 13th, 2008, 04:31
Sorry for the mistake. PurifyPlus it is.

I had a look at it and it needs debug information (pdb, dbg, or map). It run against a file with no debug info in it but didn't produce any output. (Wasn't there something that produced a map file out of a program or my memory starts failing?)

Aimless is right about AQTime. It needs source code.

The others I had in mind were a couple of older programs.
Turbopower's Sleuth QA (now extinct but some libraries are available for free - check at www.turbopower.com) also needed debug info.
One of Parasoft's programs that I had come across 15+ years ago. Their newer batches have some similar products like C++ Test but none of them is the one I remember. It had a distinct ugly gray screen with a couple of buttons that used to crash too often to be pleasant (then again, it could have been my patch ). But that used to run with plain binaries.
Borland too has Gauntlet but I'm sure this will have the same requirements.

I'll keep my eyes open.

Take care all.

dELTA
February 13th, 2008, 07:48
Thanks for all the useful info roocoon and Aimless!

And Aimless, do you think it's possible that Intel VTune or the Numega tools (nowadays rather being included in the group name "Compuware DevPartner Studio" also can handle things as long as they have available debug information (which can often be arranged after the fact with e.g. IDA Pro), contrary to the need for real source code, just like in the case of PurifyPlus?

So far, it would seem like AMD CodeAnalyst Performance Analyzer (http://www.woodmann.com/collaborative/tools/CodeAnalyst_Performance_Analyzer) is our prime candidate for a reverser's profiler tool anyway!?

Admiral also confirmed above that CodeAnalyst works fine, by letting us know he used it successfully to find crypto code in a bloated/complex target for which he definitely didn't have any source code or debug symbols.

Admiral, what version of CodeAnalyst was that? Is it possible that they also removed the binary profiling features in their most recent releases, in this seemingly widespread conspiracy to exterminate binary profiling? Luckily, the statements on their website indicate that they might not have caught on to that trend just yet, but it would be great with some confirmation anyway Admiral!

tHE mUTABLE
February 13th, 2008, 23:20
I remember I've wrote something similar to that, just as a proof of concept for something I'd like to see in the RCE world. But, later on...

dELTA
February 14th, 2008, 05:42
I'm not exactly sure what you mean by "but later on", but if there was ever a runnable result for that project, I'm sure many people here would be interested in seeing it, if possible?

tHE mUTABLE
February 14th, 2008, 12:17
Well, the problem is that this POC is part of a paper I wrote 7 months ago which is not published yet (It's about proposing a new MDL for Asm in RCE)... Please note, this POC is super dump but statefull and nothing special about it except It would be a breakthrough if I find something like that in the RCE community...

Sirmabus
February 15th, 2008, 12:15
(PART 1)
Sounds like exactly along the same lines of ideas I've had too.
And thanks for info here, some more avenues to investigate.

I'll share my history and research on it.

I thought one day about five years ago when playing with the memory hack utility "TSearch" (now of days it would be Cheat Engine, MHS, etc.)
that: "..wouldn't it be nice to have something for code as TSearch is to data/memory?"

Originally I was interested in just getting call hits and doing deleta operations
on them (again like TSearch did using it's various filters).
An example: Messing around with MORPG games to make private bots for them.
I found my self always trying to find particular functions in the client.
Like the loot/pick function. If I could take before and after snapshots of call hits, and applying delta filters, I thought I might be able to pinpoint these functions with less work..

See my blog post here for sort of an introduction of my research:
http://www.openrce.org/blog/view/838/Real_Time_Tracing


The first iteration I did a few years ago was to do the sort of the "break point on every function" approach like Pedram Amini's "Process Stalker", "CoverIt", etc.

FIRST TRY:
Meant to work in conjunction IDA, you had to first run a script that would go through the DB and create a simple list of every single function entry point.
The working components consisted of a GUI front end, that injected a service DLL into the target. I put either a JMP5 hook, or int3 hook on every function from the IDA list. This requires the creation of automatic stubs/detours for every function.
I would analyze the entry points and put a JMP5 where possible (5 bytes open), or a single byte int3 if there wasn't enough room . The int3 hooks had stubs too, to avoid the restore-single step-replace cycle. As long as the hook stubs were align16 and the majority of the hooks were JMP then it's very fast and makes for a real time tool.

If you try to do this thing in a debugger, and, or using the OS debug API's only the most simplest processes would work. Those functions (do to the OS IPC, and all the layers) are just to slow to use real time. With a DLL in the same process space I could maximize the speed since it's sharing the process'es space. The real acid test to try a multi-threaded resource hogging process like a video game, etc.

This BP on every approach works fairly well, but IMHO has too many problems to make it generically useful (as a tool). Mainly there are too many errors in trying to find every single function entry point. While IDA may typically find correctly 90% of them, it will either miss many entirely, or create functions at the wrong boundaries.
I'm not knocking IDA, I'm pretty sure the problem exists with any disassembler trying to find function boundaries. I considered doing runtime analysis (like Olly does) but I'm sure one will run into the same issue. Depends on the executable it's self, what language, what compiler used, etc. And after all, while there might be conventions in higher level languages on an actual function definition, there isn't one for binary code (AFAIK). In particular if you turn on full function link compile in compilers, take a look at the disassembly. You might find all kinds of disjointed functions. Half of something here, the other half in another place.
At best you could find most of them, but not all no matter how good the analyzer IMHO.

To get this method to work, I had to spend a lot of time hand fixing either the functions in IDA, and or, editing the function list to get it to work and not crash all the time (mostly because of bad/wrong entry points).
Also another downside of this is that it requires code modification. Most of the time not a problem, you could could shadow the code space to hide this, etc., but not ideal.

(continued in PART 2)

Sirmabus
February 15th, 2008, 14:51
(PART 2)

I had pretty much stopped working on the first attempt, not sure if it would see the light of day.
I'd kick my self (or slap my self in the head), thinking "there has to be a better way!".

Then, I ran into Pedram's blog (and sort of hijacked it) one day:
http://www.openrce.org/blog/view/535/Branch_Tracing_with_Intel_MSR_Registers

Wow! A way to do this in hardware you say?


SECOND TRY:
Used the "single step on branch", LBR ("last branch recording" method.
Now this tool is really taking shape!
Gone are the problems of "break point on every function" approach.
No need to attempt a preprocess to get function entry points, no code modifications, etc.
The hardware does many things for you. Looking at the branch recorders you can see the "to" and, "from" over every call.
It's in hardware, no need for a megadollar hardware ICE, etc., it's all there in the CPU. With the right tool and setup anyone with a modern PC can use this.

First here is a screen shot of a my current alpha tool.
As a working title I call it "CFSearch" after "TSearch" ("CSearch" is already taken):

http://img264.imageshack.us/img264/9900/cfsearchalphatest1vz9.jpg

What you see here is a list of call hits. On the tool bar, to the left is the "Save List", then "refresh", some filter, a pause/play button, etc.
You run the GUI front end and attach it to what ever process you want (tool can only run on one at the time right now).
The right panel is the "keeper list".
You can select which thread, or all to attach too.
Although it's "real time", a target process does slow down considerably while tracing (somewhere around 60 to 90% slower). If the process is multi-threaded (as my main test targets were), selecting only the main thread can really help speed wise.

The single step branch thing works on pretty much every Intel CPU from P2'ish to present, and on the AMD64 generation or better.

The setup steps (for user mode version):
1) Again using an injected into target DLL for maximum speed and versatility, I install my own exception handler to handle the "single step on branch" exception.
2) Set up the CPU MSR registers (for each logical core).
3) Turn on the trap flag for threads via "SetThreadContext()" to start tracing.

The heart of the action is in the exception handler.
For maximum speed I create shared linear buffers for each code section to act as 32bit hit counters. This is a big part of the design, maybe a perfect hash function would work, etc., to save memory.
The per call overhead this has to be minimal for a real time tool.

Another component of the exception handler (the worker) is a mini-code analyzer to reject all branch exceptions except for calls.
There is a little IPC between the front end the target DLL to synchronize some events, overall the DLL operates independently to again reduce overhead. The front end grabs a copy of the hits lists with some extra flags to apply delta/filter operations.


Improvements:
Having the exception handler inside the process space makes it faster then external, but still it could be much faster.
There is all that overhead in the OS from the hardware breakpoint, to the kernel, then the kernel dispatching the exception via LPC to the processes space, etc.
Also it crashed a lot from exception frame conflicts in "ntdll.dll",.
Probably the bigger (and continuing) issue is that Windows is not aware of the branch trace mechanism. It doesn't know you are using single step branch, it doesn't know you set those MSRs, it's pretty much dumb to the whole condition (understandably).

The next step, I removed DLL exception handler component and replaced it with a simple KMD. It does a hook on int1 to handle the single step branch exception directly in kernel space.
This gave the whole process a big speed boost. Gone is the the majority of user mode overhead, and now almost nonexistent crashing.
Plus it's nearly transparent to the target process, with the exception of having to set the trap flag(s) in it.

Some down side, for one it presently wouldn't work on Vista with out disabling patch guard since it's not "legal" in Vista to change interrupt vectors.
Another, is that branch single step it's on a global level. So you can't say, debug in Olly, and run CFSearch at the same time because Olly is expecting the default behavior for single steps.
But there are workarounds for these (some Olly setting to use HWBP step).
It really needs a kernel hook on context switching to turn on and off depending on what process the context is in, etc.


CURRENT STATE of research and the tool:
My alpha tool works pretty well, but it could be better.
When I run it on a big game, although workable, the game slows to a near crawl. Even using a KMD it's still relatively slow.
Each processor exception takes so many cycles reguardless of how well I optimize the KMD code.

If you look further in Intel manuals you will find the "data store" mechanism. Basically, the LBR can be set to be recored to a special buffer with out the need for an exception on every branch!
The buffer can either be polled, or setup to IRQ when it's near full.
This potentially could be a big speedup.
Although maybe not as good as it sounds because apparently (from the single post or two I can find on the subject) the CPU operates in a less optimal mode when DS store is turned on.
None the less, it has to be tried.

Some downside, It's more processor version specific and it's not supported on the AMD64 (at least not publicly documented).

This is where I am at now, and a bit stuck at the moment.
There is only a tiny amount of information available on the DS store mechanism. The most of which is the raw description in the Intel manual.
I can find no prior research on anyone attempting to do this on Windows.
There is a tiny amount of documentation and source for Linux "perfmon".

So far in my DS store attempts, even in the "polling" setup, it appears to only record the first branch and stop recording.
Although while I have DS store setup (with buffers, all the debug MSR flags, etc.) the PC does slow down about 20%.
That tells me it's at least partially turn on, but I must be missing something.

Only targeting WindowsXP 32bit at the moment. Once working perhaps it could be extended to work on everything from XP to Vista, both 32 and 64bit.

Note to have this working well, it will probably take some Windows kernel hacks to make it work right.
At the very least a will need a kernel context switch hook to have it ON only while in the desired code spaces.
Out of desperation I RE'ed VTune's driver a bit (not even sure it uses DS store yet although), and it indeed uses several kernel hooks.

Any information would be appreciated, in particular anyone from Intel, AMD, etc.

(Continued in part 3)

Sirmabus
February 15th, 2008, 15:14
(Part 3)

More thoughts on this.

#1 I mentioned game hacking. It's certainly not restricted to that, just something I do (and a lot of others) for fun.

My main idea for such a tool is a real time reverse engineering tool that could be used in several situations.

Some more uses:
1) Code profilers.
2) Extended real time debuggers.
3) EXE unpacking tools.
4) Virtualisation, sandboxing.
And more..

On the concept in general. So far I don't find the call hits as useful as I thought they would be. At looking at actual call chains (recording all the "from" and "to" would probably be more useful.
This would be whole different design, and I'm not so sure it could be done in real time. Probably would require dumping it all (the potential thousands of calls per per second) and running some sort of post process on the data to make sense of it.

A particular new product of interest is HBGary "Inspector":
http://www.hbgary.com/hbgary_inspector_datasheet.pdf

More of it here:
http://www.rootkit.com/blog.php?newsid=745

Not only looking at code in this way, but also at data, etc.
Unfortunately, I have no government contracts to pay for this research, I have to do it on my/our own.
Anyone know the price of "Inspector"?
(Hoglund, come to THAT IRC channel to talk)

Some data tracking methods here:
http://uninformed.org/index.cgi?v=7&a=1&t=pdf


Also mentioned above is "AMD CodeAnalyst" tool. Anyone know what mechanisms it's using?
Perhaps the undocumented AMD MSR registers here, and based on the name of some of the labels perhaps AMD has indeed their own DS store mechanism or similar:
http://cbid.amdclub.ru/html/undocmsrs.html



(End of Part 3, big post)

dELTA
February 15th, 2008, 15:25
Very interesting Sirmabus, I'm looking forward to part 2! ([EDIT] see my next post below)

What you say about only breaking on each function is true: It increases speed a lot and thus can allow for close-to-realtime tracing, but it is also highly likely to miss some (important) functions, and also likely to crash because of misidentified function entrypoints, and finally we also have the more general problem of code patching/modification having to be done.

So, this provoced an elaborated idea and possible solution to all this in my mind, although it is quite crazy and also has it's disadvantages (mostly speed, but also non-100% accuracy in the case of exceptions and code that is misidentified as data in the static pre-analysis of the executable), but other than that, it's pretty cool... So here it is:

First of all, the accuracy and stability of the whole thing could be increased by doing it all on basic block level instead of on the function level. This would of course be at the cost of execution time, but as soon as you have some good basic filters (i.e. non-logged basic blocks that you have already discarded as uninteresting) this might actually still be acceptable/useful under some conditions.

Then comes the really cool part:
Not counting the possibility of exceptions, a code basic block only has two possible exit-points at most, i.e. taking a conditional jump or not, and in many cases it only has one (an unconditional jump or a static call, while the target of ret instructions and dynamic calls/jumps may have to be resolved dynamically). Adding exceptions to this, it has yet one additional possible exit-target (dynamically speaking).

So, that leaves us with a maximum of three possible static exit targets from each basic block. Also, some instructions like ret or dynamic jumps/calls (jmp eax/call eax etc) has a dynamic exit-points which need to be taken care of, along with the quite dynamic exceptions handler destination (defined by FS:[0] etc) at any given point in the code.

Well now, how many hardware breakpoints do we have to play with? Yes, that's right, four! This means that if we pre-analyse the code and all its basic blocks in IDA, and dump all this information about the possible exit-points of all basic blocks to disk, we can make a hardware-breakpoint-only based debugger-parasite tool for the target application, which will act according to the following pseudo-code for each basic block (when we enter the code, a hardware breakpoint has just hit, and the first such breakpoint will of course be placed on the global entrypoint of the executable):


Was the just hit breakpoint at the beginning of a basic block? In that case, goto 2, else:

If we get here, the hardware breakpoint that was just hit was not at the beginning of a basic block, and thus it was rather intended to resolve a dynamic block exit target (e.g. a modified FS[0] target or the current ret target, or the current call/jmp eax target etc), so do that.
Place a new hardware breakpoint at the resolved dynamic block exit target.
Return control to the debugged application.

If we get here, the hardware breakpoint that was just hit was at the beginning of a basic block, so start out with dynamically resolving the current exception handler target, and put a hardware breakpoint on that.
For each possible exit target of the current basic block (with a theoretical maximum of two, in addition to the already resolved exception handler exit target), place a hardware breakpoint on it.
If the basic block ends with a "dynamic exit point", or contains a manipulation of FS[0] or on-stack EXCEPTION_REGISTRATION record somewhere in it (which would have to be detected statically before execution, which is the biggest flaw of this entire method of attack I think), put a hardware breakpoint on that too/instead, to be able to resolve it dynamically before it is executed.
Return control to the debugged application.


With the exception of sneakily modified exception handler targets, we can now trace the entire application by only using those four cheaply allotted hardware breakpoints!

So you say, why is this better than single-stepping?


First of all, no single stepping flag that can be detected by the target appliciation will ever be set.
It will be much faster than single stepping in most cases.
Contrary to single stepping, filters can be used to include or exclude any exact code areas that you want, and also in a fully dynamic fashion.
Yes, hardware breakpoints can be detected too, but they are much less likely to be detected than software breakpoints since they don't modify any code or data in the memory, AND if they are just removed by the target application when they are presumably detected, you will know in which exact code basic block this was done, and will most likely be able to easily and quickly "fix" this little problem and continue.


So, doesn't that sound at least a little cool, or what?

Sirmabus
February 15th, 2008, 15:44
Cool idea, but still I think the fundamental problem is accurate preprocess analysis.
It will be only as accurate as you can break apart the code before hand.
Imagine what if the code was obfuscated, etc., you will have to have a very good analyzer, and, or some sort of emulator.
And why have this extra step if you don't need it?

(sorry we're a little out of sync, I was writing part 2 or 3 when you posted).

If we could get the "DS store" working in Windows (and any other OS wanted) then it would be totally outside the process.
Note too this is a different dynamic. Since it would be buffered, you are getting the "call" (branches too if you want) after the fact.
With the exception way you could catch the call and do something else. Even doing a different type of hook for example, although not very practical.

dELTA
February 15th, 2008, 16:11
Aw crap, while I was posting my reply to Sirmabus' "part 1", he posted part 2 and 3 inbetween, and to make things even worse, he announced a dream tool in those posts that made my design above look pretty lame (but then again, you gotta give me some points for creativity, and also for doing it without any fancy schmancy extra custom processor features ).

But anway, back to Sirmabus' tool... OMFG that is so unbelievably cool!!!!!1111!!

Will you release this tool soon Sirmabus? Will you release its source code too? That would really be an extremely welcome contribution to the reversing community I think, and it would hopefully also result in more help for you to finish this project in the best way possible too!

I have already created a CRCETL entry for it in anticipation of its arrival:

http://www.woodmann.com/collaborative/tools/CFSearch



Please update it with any news.

And not to be too over-enthusiastic or anything, but it would really seem like YOU ARE DA FUXORING MAN!

Also, the following thread might be of interest, where L. Spiro (author of Memory Hacking Software (MHS)) mentions that he has had one similar feature almost ready in his excellent tool:

http://www.woodmann.com/forum/showthread.php?t=7363

Hey, L. Spiro, is that feature fully implemented and included in current versions of MHS? You mention that you had a problem before because your tool did not have any kernel components, but if I'm not mistaken, it does now, right?


[EDIT]
Oh, and for reference the following thread is also quite related (looking back at it now, Sirmabus even mentions this exact tool in it, but in more secretive words! ):

http://www.woodmann.com/forum/showthread.php?t=10178

That thread in turn references the following thread, which is also related to the same topic/issue:

http://www.woodmann.com/forum/showthread.php?t=9807

Sirmabus
February 15th, 2008, 20:34
Thanks for your encouragement.
I'll hit the Intel and AMD dev boards in hopes they'll give some answers.

JMI
February 15th, 2008, 21:30
As dELTA says, having such a tool would be a very good thing for RCE purposes.

Regards,

dELTA
February 16th, 2008, 05:59
Ok, sounds great Sirmabus, please keep us posted on the progress of this tool, and also please feel very free to return here with any questions that we can be of assistance with!

I'm actually pretty sure there are a bunch of people on this board who would be able to assist you with this problem if we can just find them and make them read this thread...

Come on people, anyone? I know you're there, so why don't you just be a good chap and lend a hand now!?

Kayaker
February 17th, 2008, 00:25
Quote:
[Originally Posted by Sirmabus;72709]It really needs a kernel hook on context switching to turn on and off depending on what process the context is in, etc.


Intriguing work Sirmabus. Re the context switch, there's probably a good reason why what I'm about to say wouldn't work, else KAV might have done it already instead of using a crappy SwapContext hook. Is there any way to safely set a hardware rw breakpoint on _KPCR+124 (or FS:[124] or _KPRCB.CurrentThread if you wish)? This being the field which is constantly updated on context switches.

I tried that in Softice actually, bpm FFDFF124 rw. It sort of worked, it would break within some ntice.sys function when the field was accessed, but after a few times I was greeted with the inevitable BSOD. I was just wondering if it might work without Softice in the picture.

Sirmabus
February 17th, 2008, 01:21
Humm, interesting. Don't see why a code HWBP in the kernel wouldn't work also.
But, maybe not a good idea to put an exception in an area that is probably very performance intensive.

I was just thinking of a typical binary code patch.
You can see some examples of a ntoskrnl.exe "SwapContext()" hook in the "Tron", and some ARTeam source code, etc.

Kayaker
February 17th, 2008, 02:33
If you can freely set and remove kernel breakpoints then you could also use the trick I mentioned in this thread. Some kind of SYM support would be needed to get the proper address unless you use a (possibly OS version dependant) pattern search.

http://www.woodmann.com/forum/showthread.php?t=11087

This being straight from the Softice manual itself (in the example 0xFF8B4020 is the ETHREAD you want to break on)

Watch a thread being activated:
bpx ntoskrnl!SwapContext IF (edi==0xFF8B4020)

Watch a thread being deactivated:
bpx ntoskrnl!SwapContext IF (esi==0xFF8B4020)

This works because of the calling function @KiSwapContext where you can see how EDI and ESI are changed.

Code:
:00404DB2 @KiSwapContext@4 proc near
...
:00404DC4 mov ebx, ds:0FFDFF01Ch ; PKPCR SelfPcr
:00404DCA mov esi, ecx
:00404DCC mov edi, [ebx+124h] ; Processor Control Region (KPCR) + 124h
:00404DCC ; aka FS:[124]
:00404DCC ; new (current) ETHREAD pointer
:00404DD2 mov [ebx+124h], esi ; old ETHREAD pointer
:00404DD8 mov cl, [edi+58h]
:00404DDB call SwapContext



The breaks would be your cue to turn your tracer on/off or whatever. It will of course break a lot

A regular inline code hook would seem to require less overhead (in some ways) and probably wouldn't slow down the system as much, but it too has drawbacks (OS version dependance if byte pattern search required, possible incompatibility with KAV SwapContext hook, can't unload your driver unless you can guarantee no thread switches occur during unhooking, etc.)

dELTA
February 17th, 2008, 06:05
Very interesting ideas, I hope they will lead to an optimal solution for this problem!

Btw Sirmabus, will you post the links to your Intel/ADM developer board threads about this issue here, for reference, and so that also possibly even more people can read them in their full and help?

Also, RolfRolles just posted a blog entry very related to this topic, introducing a cool tool. Still instrumentation based, so not playing in the same league as the Sirmabus tool currently being discussed here, but still cool, and a nice example of yet another tool in this area, DynamoRIO, and its plugin architecture!

http://www.woodmann.com/forum/showthread.php?t=11325

blabberer
February 17th, 2008, 13:32
well if you are a nix geek and can find the kernel components and compile kernel modules and do insmod sudo su install stuff you could take a look at some HITACHIS btracing implemtation stuff

Quote:

/*****************************************************************************/
/* The development of this program is partly supported by IPA */
/* (Information-Technology Promotion Agency, Japan). */
/*****************************************************************************/

/*****************************************************************************/
/* bt_main.h - branch trace module header */
/* Copyright: Copyright (c) Hitachi, Ltd. 2005-2007 */
/* Authors: Yumiko Sugita (yumiko.sugita.yf@hitachi.com), */
/* Satoshi Fujiwara (sa-fuji@sdl.hitachi.co.jp)


this was talked about in linux symposium
http://www.linuxsymposium.org/2007/view_abstract.php?content_key=129

the bunzip can be downloaded from sourceforge

also kernel vger mailing lists has a discussion on btrace implementation under linux ptrace apis (look for ingo molner and markus from intel gmbh 's discussion )

and if you are not averse to download beta nightly builds i think you can glean a few ideas from ptrace.c regarding btrace ds:save area setups etc

btw congrats for your posts in intel devp board google returns you in first page first hit first link or maybe for lack of information in subject matter that google has to rank your
forum question #1
if you query DS SAVE area

Sirmabus
February 18th, 2008, 18:48
Thanks for the info. Good to talk with you again about the subject.

I currently don't have a nix dev setup but contemplate building one.
Having the source for the kernel, being able to modify and build it could be a big help.

Hopefully, someone very knowledgeable in the area will show up.

Probably just have to get back in and play around until I find the hardware flag or setting I'm missing.
Reminds me, I used to work on console games back in the early 90's. What we had to do then, is read the bare tech manuals, and
play around with the settings until we got our hardware blitters, etc., working (any one remember those days?), I'm just getting too lazy :-P

blabberer
February 19th, 2008, 13:03
if you are contemplating setting up one and would prefer an almost clone of windows (i mean clickety click with 100s of preinstalled toys) i can suggest you ubuntu but getting ubuntu kernel sources (they are not vanilla kernel available at kernel.org is kinda tedious ) (it doesnt come with even gcc preinstalled (real doze style you need to apt get install gcc headers) to compile even a simple Hello World

and thats one dvd full of installable os (get alternate install cd or iso or dvd so that if the installer borks you can attempt to manually install it from console root using some virtual environemt (contrary to the claims of minimum requirement of 256 mb spare ram capacity with alternate install method you can allot and successfully install a working vm image with as low as 32 mb allocation)
and at the lowest extreme you can look at damn small linux at just ~ 50 mb os (fully functional and expandable )

every one of these distros do work fine in vmware or yes even microsofts virtual pc or on the open source alternatives like virtual??box?? etc

thanks and im also glad that we are talking again on the subject as well

L. Spiro
February 21st, 2008, 07:40
Sorry for my absence lately; I was shipped to San Francisco to attend GDC, which is where I am now as well.


Quote:
Hey, L. Spiro, is that feature fully implemented and included in current versions of MHS? You mention that you had a problem before because your tool did not have any kernel components, but if I'm not mistaken, it does now, right?

The feature is not implemented in any way in the current version. The current version is a complete rewrite of the old version which did have it nearly 100% done.
I do have kernel now but not kernel debugging. But there is a solution I can employ without the need of actually following SYSENTER/INT 2E.


Bad news aside, this is a feature I will 100% definitely implement, and not too far from now.

My implementation will:
1: Provide a graphical interface that allows viewing the whole control path as it was executed. It will be a grid with zoom capabilities and each call goes down one stack layer, allowing you to easily see how deep the call-tree goes and of course where it goes. It will be represented as a bunch of continguous chunks of code stacked on top of each other with the executed code highlighted, allowing you to also see the code that was skipped and all of the code’s locational relationship with all the other code.
2: Show how many times the code was hit (for loops that do not go down call depths).
3: Allow you to refollow the code from any point to any other point, even backwards. The context structures are logged for every single instruction so you can go anywhere and then restep forward/backwards to see what the registers were. This will update the current Disassembler window as if it was stepping in real-time.
4: Create multiple “logs” this way and compare them, showing code that was executed here, here, here, and here, but not here. As with searching, there will be many evaluation types to allow finding code various ways by various criteria.
5: Allow filtering. Showing code with greater than X hitcounts, for example.
6: Allow exporting to text or SQL or whatever else I can imagine.



Actual release time for this feature could be 6 or 7 months from now. I have a few things to finish first but I was just planning to get into this soon. This is a feature I really want to have and it was one of the key features I planned for the new rewrite. It will not be a simple side-feature, but one of the main attractions.


L. Spiro

JMI
February 21st, 2008, 12:57
Welcome to the neighborhood!

Regards,

dELTA
February 21st, 2008, 17:20
Quote:
[Originally Posted by L. Spiro;72848]Bad news aside, this is a feature I will 100% definitely implement, and not too far from now.

My implementation will:
1: Provide a graphical interface that allows viewing the whole control path as it was executed. It will be a grid with zoom capabilities and each call goes down one stack layer, allowing you to easily see how deep the call-tree goes and of course where it goes. It will be represented as a bunch of continguous chunks of code stacked on top of each other with the executed code highlighted, allowing you to also see the code that was skipped and all of the code’s locational relationship with all the other code.
2: Show how many times the code was hit (for loops that do not go down call depths).
3: Allow you to refollow the code from any point to any other point, even backwards. The context structures are logged for every single instruction so you can go anywhere and then restep forward/backwards to see what the registers were. This will update the current Disassembler window as if it was stepping in real-time.
4: Create multiple “logs” this way and compare them, showing code that was executed here, here, here, and here, but not here. As with searching, there will be many evaluation types to allow finding code various ways by various criteria.
5: Allow filtering. Showing code with greater than X hitcounts, for example.
6: Allow exporting to text or SQL or whatever else I can imagine.
Damn, that sounds so cool! This (group of) feature(s) would really cement the status of MHS as the total king of application stalker tools, and I'm really looking forward to getting my hands on it!

One question/suggestion:
When you say "filtering", are you referring to the same kind of filtering as pStalker has (see the following URL for a demo of this exact feature: http://pedram.redhive.com/PaiMei/docs/PAIMEIpstalker_flash_demo/index.html )?

That would be extremely useful I think, and especially in a tool like the one you describe above, since the foremost problem I immediately foresee with it is execution speed during tracing, which could indeed be vital for some situations. The data you describe above that your program will save is of incredible value, BUT, it will most likely also make the program execute incredibly slow during this "tracing", won't it? So, if you could just turn the tracing/logging on for just some exact parts of the code, it would be a total killer feature, and even a life saver for some hacks. The way that these kinds of filters can be dynamically defined in pStalker is incredibly efficient (again, please see the flash demo at the link above), so I really suggest you implement it in a way similar to this too (if you don't have an even better idea of how to do it of course, then never mind ).

Until this tool is released, I'm just gonna stand here jumping up and down, please just ignore that, I really can't help it.

L. Spiro
February 22nd, 2008, 02:21
The demo is good for GUI applications but not necessarily games which is my primary focus with my software. For example if you wanted to get just mouse-related code it would be virtually impossible to separate just that from all the start-up code, minimization code, maximization code, etc.

My plan is to simply allow the user to specify a starting address and ending address between which code will be logged. The user may set up a counter (pure hit-count based or tied into scripts) to start the log after some time or some number of hits or whatever, and end after some other criteria as well. This is how the previous version already worked as well.

Filters will simply eliminate code from being shown to reduce the clutter and scripts will be able to add criteria for eliminating code. Filters may show code that was only executed in one log and not others or whatever else you can imagine.

As for slowdown, it did get slow in the first version but since you usually only want to log from here to there it isn’t a problem.


L. Spiro

dELTA
February 22nd, 2008, 04:57
Yes, I understand that the feature I'm mentioning won't be good in all situations, and that your ideas will be very good in many situations and should of course be implemented first, please don't get me wrong about that! (btw, isn't Minesweeper a game too? )

There are situations though (also in "real" full screen games etc) where the "auto filter builder" I'm describing above will be of unbeatable value compared to specifying the addresses yourself, so all I'm asking is for you to consider implementing this as an additional feature, after all your other and original great features are in place.

And just to clarify things about this filter: It doesn't have anything directly to do with GUI applications, that's was just a (good) example, most likely chosen in that Flash demo because GUI/graphics code is often very messy, is often responsible for a large part of a program's execution time, and can often hide the small pieces of really interesting code (game logic code for example) that you are really looking for, in huge amounts of uninteresting information while tracing (and it can of course be very tedious to manually having to find out which is which beforehand in a disassembler, to be able to specify these exact, possibly intertwined, memory ranges before you can start tracing).

What the filter does is simply letting you first perform any uninteresting operations in the target application that are not the ones you are currently looking for at the moment, so that you are then able to exclusively focus on only the remaining parts when doing your common looking-for-the-needle-in-the-haystack work.

An example: Let's say you have a complex game like Half Life 2, where you want to find the code responsible for decreasing your energy counter when you get hurt in the game. Using a feature like this filter, you would find it like this:


Set the program in "code coverage mode" (contrary to "code profiling mode", where all executions/hits of any code are recorded, while in "code profiling mode" only the first execution/hit is recorded, after which tracing/logging is disabled for the just executed code block/instructions).
Start the game, run around a bit in it, shoot at things with all your weapons, blow up things with your grenades, perform all the common actions, attract the attention of some enemies, and let them shoot at you (but do NOT let them hit you so you decrease the energy counter!). This will of course also execute most of the common graphics/sound/mouse/keyboard code etc, which is exactly what we want.
Now, after doing this to a satisfying degree, tell the stalker program (MHS) to save all executed/covered code so far as a filter (i.e. memory ranges which should not be traced/logged again as long as the filter is applied). This will first of all make sure that the game runs at full speed as long as you don't perform anything that triggers code that has not already been executed (execution speed won't really be an issue as long as you are already operating in "code coverage mode" rather than "code profiling mode" though, since all code that has been executed once will already be automatically excluded from tracing). Secondly, this will start a new recording session that will be sure to not include any of your previous instructions/code hits that you got while performing all the uninteresting operations in step 1 above.
If you want a more detailed analysis of the energy counter decrease code, you can also now activate the "code profiling mode" instead of the current "code coverage mode".
Now, finally, let an enemy hit you, or harm yourself with your own grenade etc, so that your energy counter decreases.
Stop the tracing/logging.
You will now have a detailed trace (viewable in the wonderful GUI that you describe above) for ONLY the unique not-executed-before code that was executed in connection with you losing energy in the game!

I guess you can see the extremely efficient code pinpointing and data pinpointing possibilities that this opens up, which would also be incredibly useful for game hacking?

So again, please don't misunderstand me, what you are describing as the planned feature set is the ultimate binary code profiling tool from a stalker perspective, and that will be invaluable in many situations. What I'm suggesting is just some alternative modes of operation for the same great stalker code, which will be also be extremely useful in some other situations too, including games hacking indeed (like described above), and it would feel like such a waste to not implement this when you have already done all the hard work of implementing the excellent tracing/logging engine!

Either way, I'm really looking forward to your first code tracing features!

L. Spiro
February 22nd, 2008, 09:22
I can see the usefulness of this.

I can add it to the TODO list, especially since it is quite simple to implement. All it needs to do is start at the entry of the program and breakpoint all functions, then remove each breakpoint as hit.


L. Spiro

dELTA
February 22nd, 2008, 10:27
Great!

And that's exactly what I was hoping for, that it would be (in relative terms speaking) easy to implement this feature given all the other things you will be implementing already! Thanks!

naides
March 5th, 2008, 06:23
Just in case anyone is interested:

I have been playing with AMD CodeAnalyst and can tell you that it works (At least at first glance it works) inside a VMware virtual machine, but the profile of code coverage is substantially different from real machines. I found the most interesting results by running it in two VM clones simultaneously, in order to minimize meaningless differences, then tracking the behavior of an app with or without a license file, or before and after expiration date.

dELTA
March 5th, 2008, 10:32
Cool! Could you elaborate a little more on the "but the profile of code coverage is substantially different from real machines" part? Sounds interesting.

It would be really cool with a little blog post or possibly even a tutorial that sums up your most interesting experiences during this experiment, if possible!

I'm also really, now even more, looking forward to these features in MHS, since a specialized software could make these already-in-the-generic-form useful techniques extremely useful and efficient for reversing purposes I think, as discussed above!

naides
March 5th, 2008, 14:48
A few more interesting factoids:

Trying to figure out how the AMD CodeAnalyst would behave on a VM running under an (Host) Intel CPU, I found that, at least superficially, the program does not mind, even when the CPU, as detected from inside the VM is still read as Intel (Celeron 1700 in this case). VMware does NOT seem to emulate the CPU, as it does with most of the virtual hardware. What is even more curious, I installed the CodeAnalyst directly onto the Intel CPU/host system, and at least with the basic profiling, it works as expected, producing the results you should see as described in the demo tutorial. (??????).
Some more advanced features may be functional only in AMD machines, they even mention some AMD models. But the basic, raw functionality appears to work for Intel CPU's as well. (At least my crappy Celeron CPU, which is all I have available to test right now).

Correction:

Only certain profiling options (Time based profiling, Pipeline based profiling)are available for non-AMD based CPU. Instruction based profiling and Event Based profiling, which are potentially the most valuable for RCE only work for AMD based real or virtual machines.

dELTA
March 5th, 2008, 17:33
Very good things to know, I was actually wondering those exact things, and was hoping that someone would try/confirm them, thanks naides.

And yes, it is a known and official fact that the CPU is not emulated in VMware, the real processor just "falls through" to the guest.

naides
March 5th, 2008, 18:09
Quote:
[Originally Posted by naides;73108]Just in case anyone is interested:

. . ., but the profile of code coverage is substantially different from real machines.

Let me explain: Code Analyst takes a system wide profile, included all the active modules open in the system ring 0, ring 3. That includes invasive processes and services that show up in all user processes, such as firewall services, antivirus guards, video services, virtual disks, all the bells and whistles you'll have in your main system but are not installed into a bare-bones VM. So the profiles of an app running in the host system look much more complex (more modules, more calls, more time slices, more events than the profile generated by the same app inside a VM. There seems to exist ways to filter out the events related to your Module/Process alone, but I am only learning the ins and outs of the tool

I found the most interesting results by running it in two VM clones simultaneously, in order to minimize meaningless differences, then tracking the behavior of an app with or without a license file, or before and after expiration date.


Sure, I'll be glad to write a little tutorial when I iron all the wrinkles of the tool. Just one question, for me to use a simple example: Has any one seen a time limit crackme?? ( I don't want to use a commercial app in a tut).

dELTA
March 5th, 2008, 18:23
You mean like a "30 day trial period" crackme? I'll whip one up for you if no one has a ready made one (which I'm sure exist though), just let me know.

Looking forward to this tutorial indeed!

tHE mUTABLE
March 5th, 2008, 18:26
@naides. You can try geeko's Donald Duck at http://crackmes.de/users/geeko/donald_duck/

L. Spiro
May 7th, 2008, 22:36
I thought I would give an update since some people are waiting for features and I have been dormant for a while.
I stopped working on MHS in favor of drawing a picture for a while, but I have already gotten back to MHS and am already about 50% through the code profiler discussed above.
I had to make a design decision which slowed me down for a while but I have finally reached a decision and can proceed.

However I am seeing a Japanese girl now and do not work on MHS as much as I normally would, but I still make progress and expect it to get done within a few weeks.


L. Spiro

JMI
May 7th, 2008, 22:46
Good luck with both!

Some of us can wait patiently while life takes it course. Of course, some of us have been around for a long time and are not quite as impatient as some youngins can be.



Regards,

dELTA
May 8th, 2008, 02:35
Thanks for the progress report L. Spiro, sounds great! Really looking forward to that release.

Oh, and if you'd be interested in sharing the masterpiece that slowed you down (no, I'm not talking about the japanese girl, even though I'd be up for that too if you really insisted ), you'd be very welcome, the previous artwork you have shared with us has been quite breathtaking.

L. Spiro
May 8th, 2008, 04:00
This is what slowed me down: http://l-spiro.deviantart.com/art/Japanese-Model-WIP-1-82010882
It is not done but I need a break from it which is why I went back to MHS.


L. Spiro

dELTA
May 8th, 2008, 06:30
That's completely insane.

And for those not familiar with it, that's a DRAWING, not a photograph (read the text below the picture).

JMI
May 8th, 2008, 08:28
And it is a drawing which would put many photographs to shame. Great work again L. Spiro. It is always good to see people developing different talents and having the patience to put them to productive use.



Regards,

GEEK
May 10th, 2008, 06:10
astonishing stuff
had it not been for dELTA's comment i would not have realized it was a drawing and not a photo
Fascinating work L. Spiro!
i absolutely love portraits.

i did check out some more portraits and the following three are just mind blowing
http://shimoda7.deviantart.com/art/Audrey-Tautou-69472755
http://coffee-lin.deviantart.com/art/SEVEN-Mika-Nakashima-22571871
http://signalbox.deviantart.com/art/Window-To-The-Soul-1868888

L. Spiro
May 13th, 2008, 12:23
Thank you everyone.


Here is a teaser for my upcoming release:
http://www.memoryhacking.com/Pictures/CodeFilter.png
Obviously here I have just done the same Minesweeper example as in the video posted earlier. I found the clicking code in 2 minutes and 23 seconds (I wanted to take my time adding the GUI code).

I just need to add the whistles and bells and expect it to be released this week or next week.


L. Spiro

Externalist
May 13th, 2008, 20:34
This looks really neat! Reminds me of pStalker... Looking foward for the release!

dELTA
May 14th, 2008, 03:44
That's so cool, it's like pStalker but better, not malfunctioning every other time, and not depending on any Python crap, excellent! I'm so looking forward to this!

L. Spiro
May 19th, 2008, 21:20
It is released.
It sometimes crashes the target process but actually I am not sure if anything can be done about this. I will add options however to reduce the risk and improve the feature altogether.


L. Spiro

goggles99
May 20th, 2008, 00:55
Nice, but I like this one better...
http://www.100paperclips.com/pcfhacker.html

I think that since you tried to put all of that anti cheat detection crap in MHS, things have gone downhill with it significantly. It is obviously cat and mouse game since MHS is public (I warned you of this long ago) and you have wasted a lot of time and gotten off track of the original goal of MHS and to what avail? I see half of the posts now are complaints by nubs and script kiddies alike that x game detects MSH and when will it be "fixed"? It is obvious that all the complaints (plus the complaints with the new stability issues) have worn on you.

I think that the purpose of a software like this should be to help individuals find the proper things in memory to create hacks themselves with. If they don't want to learn how to program, there are plenty of trainer makers around. Why learn a script language and API that only works for making game hacks which eventually get detected anyway? If someone is going to learn to program, don't you think they should be using their time to do something useful in the real world too? Last time I checked, if you can't put L.Spiro script on your resume. People can learn on non-protected games to start, and after they get some experience, they can tackle the AC's themselves with their private hacks. that was my route, and it was the best route. I am now a well paid reverser/programmer. If I had only used MHS for everything, I would still be flipping burgers today. (where would you be?)

It's more than that though. You try to do too much, your focus is too broad IMO. You should specialize on less things and plan to be the best at them. I use several tools whose functionality also exist in MHS, they are all better at what they do than MHS, because they each were created with a specific function and focus. I have never seen a software that tried to "do-it-all" succeed at being the best or preferred product.

Features like this profiler are in the vein of the original intent of MHS. I applaud it's addition, and I hope that you forget all of this AC crap and get back to creating new, useful and innovating features (or at least making attempts at it).

L. Spiro
May 20th, 2008, 02:50
I appreciate your input but a few points should probably be taken in a different light.


Quote:
[Originally Posted by goggles99;74700]Why learn a script language and API that only works for making game hacks

This is quite a large misconception.
#1: The language is C with a few additions, so no one needs to learn anything new at all. I made it like C for that exact reason.
#2: While it has features that make it easy to hack games, actually the language could be used for anything. We use it at work for all kinds of automation, such as typing headers into new code files or adding JavaDOC headers to classes and functions. We use it to convert binary files to formats we use in our Nintendo DS and other games. We use it for all kinds of odd-jobs such as sorting files of text by our own custom criteria. My coworker made a script that copies and pastes a sprite in Photoshop in a perfect circular fashion to make a mathematically correct animated path for the enemy to follow in the game.
#3: The API has a few MHS-only features but is mostly the exact Windows API. Most code you write in L. Spiro Script can be copied into a new Windows project. This includes networking features, allowing MHS to help you automate the downloading of files or, in my case, checking from home to see if someone turned off BitComet on my work computer so that I can send a message to MHS on my work computer to tell it to restart BitComet.
Quote:
[Originally Posted by goggles99;74700]someone is going to learn to program, don't you think they should be using their time to do something useful in the real world too? Last time I checked, if you can't put L.Spiro script on your resume.

#4: Plenty of people have gotten into programming simply because of my language. It removes a lot of headaches, such as finding an IDE, linking issues, libraries/header files, and a few other things that confuse beginners, and there is a place where they know they can always get support. But the best selling point to them is that what they learn in L. Spiro Script can be used in real applications later. The syntax matches C.
#5: Not everyone is thinking about a packed résumé just because they want to get into programming or want to make a few hacks.
Quote:
[Originally Posted by goggles99;74700]which eventually get detected anyway?

#5: They currently work and always will work on all non-protected games (and even some that are protected), and that covers a much larger scope than you realize. For example, every emulated game in the world. And, considering that http://tasvideos.org/ considers my tool a must-have for their emulated needs, and esco is making a highly anticipated mod for Castlevania: Symphony of the Night using nothing but MHS and L. Spiro Script, there seems to be a much larger market for emulated exploration than you realize.
Actually, to be quite honest, I primarily hack emulated games as well (Perfect Dark/GoldenEye 007) and the results of my hacking proved a big help to the famous GoldenEye Source mod—using MHS and L. Spiro Script I hacked the maps from GoldenEye 007.



Quote:
[Originally Posted by goggles99;74700]I use several tools whose functionality also exist in MHS, they are all better at what they do than MHS, because they each were created with a specific function and focus.

On the contrary. MHS’s strongest selling point is the fact that it brings together all these features into one package, and, to be blunt, 90% of the time you don’t have to be the best to still get the job done. Getting the job done while not having to switch between applications all the time is more than enough to make up the differences.
On this point, there is no question of how many people would agree. The #1 praise MHS gets in this site alone is about the fact that it is a handy “Swiss army knife”.

Furthermore, some of the features in MHS are the best available. The searcher is the fastest by a noticeable gap and offers the largest set of options. I have not yet seen a better DLL Injector—one that manages the injected DLL for you and allows you to call any of its functions with any number of parameters. Not to mention that the parameters can be typed as any kind of valid C/C++ mathematical expression.



Quote:
[Originally Posted by goggles99;74700]Features like this profiler are in the vein of the original intent of MHS. I applaud it's addition, and I hope that you forget all of this AC crap and get back to creating new, useful and innovating features (or at least making attempts at it).

You can expect a lot more. The Code Filter is only 50% done, and I am disappointed at the software in your link because I was already planning all of those features and now it will seem as if I just copied from them. Luckily I have more features than those in mind as well, giving my Code Filter a chance at being the best at what it does.

I intend to give MHS the ability to load OllyDbg plug-ins and to perform all of OllyDbg’s tasks as well, giving people a reasonable alternative.
Ambitious, but fun. And that is really all that matters to me.


In the meantime, you may want to take a second look at those scripts. To be honest, hacking was one of my intentions, but my real goal was to give myself a convenient way to execute all kinds of odd-job tasks without having to start a whole new Visual Studio project each time or mess with makefiles.


L. Spiro

dELTA
May 20th, 2008, 04:32
Cool! I'm really gonna have some fun with this as soon as I get a free minute! And it sounds extremely interesting with the upcoming code filter features too, really looking forward to them!

Oh, I'm a little curious about why you are "not sure if anything can be done" about the target crashes you mention? Is it because you have to statically analyze the code in order to inject breakpoints at all basic blocks, and thus that the code analysis is sometimes not correct, or is it something else? Maybe someone here has good ideas if you just tell us a bit more about the details?

And keep all the other features coming too, it's just nice (as long as program stability isn't affected all too much ), I know exactly what you mean in your explanation above L. Spiro, I'm just like that too.

L. Spiro
May 20th, 2008, 04:55
I am not sure if the problems can be helped partly because I am not sure what the problem really is.
It seems to be a problem with Windows in how it handles the breakpoints. The high load of breakpoints suddenly being hit in the target process causes a hiccup and sometimes it can not recover, usually it can.

The other crashes are caused by code analysis, but I added options to reduce the risk of this. I recently uploaded a new version and I believe it has a Settings menu option which allows to turn off various sets of functions. Guessed functions can be risky, but Good Guesses are rarely risky.

Exported functions are actually the causes of most crashes. If you include a DLL which exports a global data value rather than a function (such as NtOsKrnl.exe does) the analyzer would have to be very advanced to tell the difference. For now, it would end up breakpointing the data, causing who-knows-what madness to happen afterwards.


L. Spiro

dELTA
May 20th, 2008, 15:40
Quote:
[Originally Posted by L. Spiro;74710]I am not sure if the problems can be helped partly because I am not sure what the problem really is.
It seems to be a problem with Windows in how it handles the breakpoints. The high load of breakpoints suddenly being hit in the target process causes a hiccup and sometimes it can not recover, usually it can.
Ok, I see. It would seem quite strange though if Windows itself had such breakpoint synchronization problems itself I think, since many tracers etc have been built before, out of which many depend on massive breakpointing. For example, during the development/testing of the Conditional Branch Logger Olly plugin (http://www.woodmann.com/collaborative/tools/Conditional_Branch_Logger), we never experienced anything like this as far as I know (even though I guess that breakpoints can be hit at a little higher rate than in that plugin, since Olly's code is in between in that case too). Still, it would be very strange if Windows itself was fully to blame.


Quote:
[Originally Posted by L. Spiro;74710]The other crashes are caused by code analysis, but I added options to reduce the risk of this. I recently uploaded a new version and I believe it has a Settings menu option which allows to turn off various sets of functions. Guessed functions can be risky, but Good Guesses are rarely risky.

Exported functions are actually the causes of most crashes. If you include a DLL which exports a global data value rather than a function (such as NtOsKrnl.exe does) the analyzer would have to be very advanced to tell the difference. For now, it would end up breakpointing the data, causing who-knows-what madness to happen afterwards.
If I don't remember incorrectly, pStalker has the ability to import IDA databases (or rather some kind of data file derived from an IDA database, called pIDA files or something like that I think), which gives it a lot more useful information about the target when profiling it like this. Adding something similar to MHS would be a great addition I think, since it would both make it possible to specify at a much higher level of detail which areas should be profiled and not (by e.g. selecting between all functions defined in IDA, by their names), and it would also make all information about exported data vs exported functions available from IDA, i.e. making errors and crashes caused by this much easier to avoid. You would have much of the power of IDA directly at your hands!

And this doesn't have to be as hard as having to parse the IDA databases yourself either, but rather, you can base it on simple MAP files etc exported directly from IDA, or in a somewhat more complex and powerful case, some kind of files created from IDA inside IDA by a custom IDC script or plugin that you provide. That would be some seriously powerful stuff, and might very well be one of those unique features that would push MHS a large step closer to that "undisputed best tool" position that you're mentioning above.

JMI
July 24th, 2008, 02:00
Just some quick news of a new update to L. Spiro's Memory Hacking Software, discussed in this thread. It's now at version v 5.001 Updated (10:29 PM 7/23/2008). The CRCETL entry has already been updated.

http://www.woodmann.com/collaborative/tools/Memory_Hacking_Software

The new features include:

The Sub Search dialog now allows all expressions as valid input.
Fixed a crash in the Code Filter “Highlight by Expression” feature related to using the [] operators.
Added support for Windows® Vista® SP1. Thanks to Napalm of http://www.rohitab.com/ for the EPROCESS definition.
Fixed the for ( ; CONSTANTVALUE; ) bug in the scripts.
Holding Shift while moving the caret with the arrow keys now causes the selection to change in the Hex Editor.
The Code Filter is more stable while single-stepping and opening a process for debug.
# prefix added to the Expression Evaluator to indicate a number should be evaluated as a decimal number. Applies to the Auto-Assembler, which defaults to treating all numbers as hexadecimal.
Added the CaptureScreen function to the scripts.
Added the CallLocalFunction, LoadLibrary, LoadLibraryEx, FreeLibrary, and GetModuleHandle functions to the scripts.

Regards,