Clandestiny
September 19th, 2003, 18:53
Hiya,
This is an interesting topic... You mentioned not being able to find a lot of information on heuristics... I would like to suggest an interesting avenue for research... Have you considered researching heuristic techniques as they relate to anti-viral cleaning? Incidentally, I'm considering exploring this topic for a project in a graduate course I'm taking and have just started some research into it. In doing this research, I came upon the realization that the obstacles faced in attempting to design a program to heurstically clean infected viral files bears a striking similiarity to the obstacles faced in attempting to design a heuristic OEP locator. Essentially, it is the same problem, albeit with a different objective (ie. ridding an .exe of a "protector" as compared to ridding an .exe of a "virus"

. Indeed, when you think about it, "protectors" behave similarly to many virii, albeit without the feature of "replication". They append themselves to PE files, redirect the OEP of the original program, encrypt and decrypt themselves dynamically during run-time, and ultimately often return control to their target so that it runs without alerting the user that there's a problem. A good heuristic viral cleaner then, would essentially be a very good generic unpacker and it would necessitate solving the same difficulties of locating the OEP, which may or may not be obsfucted by the virus. For example, no doubt, a good virus would attempt anti-anti techniques similar to the "stolen bytes" trick used in some packers.
As I understand it, heuristics, involves defining a solution based on probablities associated with a "rule-set" and it is this feature which makes it "generic" as opposed to non-heuristic methods which would attempt to identify the OEP with a specific "byte sequence". Each rule would be assigned a certain weight and the sum of the weights at the end of the analsyis would define a probability which may or may not cross a minimum threshold value. Consider the simple case of determining weather or not a program is "packed"... One may look it in a PE editor and observe that the entry point of an .exe is located in the last section in the file. From experience, one would know that this could indicate that the program might be "packed", but it's not enough to make a 100% definate conclusion. One may then note that the section names are not consistent with those generated by common compilers. This would lend a little more weight to the hypothesis that the program is "packed". Finally one might load it up in a debugger and note the absence of common APIs called during the startup of a normal windows program. Once again, this lends more weight. Finally, one may begin tracing and note the presence of self-modifying code and conclude that the program is almost certainly packed. Taking this thought process, we have some rules that we tend to apply when determining weather a program is packed.
1. Location of the entry point at end of PE file
2. non standard section names
3. absence of common APIS called during windows startup
4. presence of SMC
Clearly, some of the rules would hold more weight than others. The presence of SMC would probably hold a very high weight as opposed to the simple presence of non standard section names. Taken by itself, the weight of a single rule may not be enough to determine that a program is packed (or infected with a virus as the case may be), but taken together we may conclude with a high probablity that the program is packed (or infected).
So, as I understand it, the goal is to attempt to generically define common behaviors of how encryptors return control to their targets and then attempt to define a set of rules that will be used to locate potential OEPs. The OEP with the highest weight would then be the one returned as the highest *probable* location of the entry point. Of course, since you are doing an OEP locator and not a full unpacker, if you found more than 1 potential OEP, you could return them all sorted in order of likelihood. Using the signatures of common compilers is a great idea to use for the design of these rules.
Like I said, I just started researching this topic so I don't have any definitive answers. Your post just happened to catch my eye so I thought I would suggest the heuristic AV cleaners as an avenue of research.
Here are a couple of links I found so far:
vx.netlux.org/lib/static/vdat/tumisc72.htm
vx.netlux.org/lib/static/vdat/epheurs1.htm
www.extremetech.com/article2/0,3973,367051,00.asp
www.extremetech.com/article2/0,3973,1166167,00.asp
www.extremetech.com/article2/0,3973,1166168,00.asp
Who would've thought that "crackers" and AV people are working on the same side of the fence,
Cheers,
Clandestiny