Log in

View Full Version : Signature Scanning & Packers


Clandestiny
September 30th, 2003, 09:59
Hiya guys,

I am working on a very simplistic heuristic anti-virus scanner for a graduate course I'm taking. At present my focus is on static and statistical analysis of PE structure rather than dynamic analysis using emulation. As you all know, many packers make modifications to the PE structure similarly to viruses and there is a good possiblity for false positives from the packed / encrypted targets using only static analysis. I am thinking to incorporate a basic signature scanner for ruling out the false positives for packed programs. I realize that there are a number of packer identifiers out there and I was curious if there is a repository of packer "signatures" somewhere? I have to admit that its not particularly thrilling to me to contemplate d/ling and disasembling hundreds of packed programs looking for signature byte sequences

Thanks,
Clandestiny

volodya
September 30th, 2003, 15:10
Clandestiny

PE Sniffer contains all signatures in plain text format. You are allowed to use it (download the tool from wasm.ru - package is called PE Tools) but be nice and provide me with the YOUR source code for studying then

Nat
September 30th, 2003, 15:35
Hi Clandestiny,

Unfortunately, I do not know much about reverse engineering. I do know, however, that coding a good unpacking engine is quite difficult.

Let me show you an example:

Even Kaspersky AV (one of the most highly respected AV scanners) seems to use a static unpacking engine (i.e., there is apparently no emulation or, alternatively, the generic unpacking engine is not switched on before a compressor is statically identified). I am not 100% sure whether Kaspersky uses signatures or checksums in order to identify compressor/crypters.

However, VX guys & gals know quite well how to outfox Kaspersky"s detection algorithm. It is sufficient to slightly modify the unpacking stub of a well-known packer like UPX or Petite in order to prevent KAV from detecting a compressed malware sample.

Two methods are frequently used in order to camouflage well-known malware:

(a) Adding NOPs

For instance, a UPX-packed malware sample (e.g., a trojan) can be modified in the following way: Firstly, a few NOPs (90) are inserted with a
hex editor in front of the first jump (starting from the OEP).
Secondly, the first jump is corrected by subtracting a number of
bytes equal to the included NOPs.

//******************** Program Entry Point ********
:004C27F0 60 pushad
:004C27F1 90 nop
:004C27F2 90 nop
:004C27F3 BE00E04700 mov esi, 0047E000
:004C27F8 8DBE0030F8FF lea edi, dword ptr [esi+FFF83000]
:004C27FE C787D0240A00881C4006 mov dword ptr [edi+000A24D0], 06401C88
:004C2808 57 push edi
:004C2809 83CDFF or ebp, FFFFFFFF
:004C280C EB0C jmp 004C281A

... (the listing shows the stub of UPX-compressed Bionet 3.18 trojan)

Thirdly, two NOPs, which are located shortly after the first
jump, must be deleted.



(b) OEP Redirection

Another way to confuse the Kaspersky unpacking engine is to change the OEP of a malware sample which is, let's say, packed with PKLite. The OEP is simply changed with a hex editor. At the new OEP a jump to the original OEP is included.

**************
Program Entry Point = 004BB028


:004BB000 6880B04B00 push 004BB080
:004BB005 68B7685100 push 005168B7
:004BB00A 6800000000 push 00000000
:004BB00F E8A3B80500 call 005168B7
:004BB014 E9B742FEFF jmp 0049F2D0
:004BB019 40 inc eax
:004BB01A 2823 sub byte ptr [ebx], ah
:004BB01C 295067 sub dword ptr [eax+67], edx
:004BB01F 4C dec esp
:004BB020 49 dec ecx

* Referenced by a (U)nconditional or (C)onditional Jump at Address:
|:004BB090(C)
|
:004BB021 6745 inc ebp
:004BB023 3332 xor esi, dword ptr [edx]
:004BB025 20436F and byte ptr [ebx+6F], al

//******************** Program Entry Point ********
:004BB028 EBD6 jmp 004BB000

... (this is again a Bionet 3.18 trojan)




Since KAV is unable to detect the above trojan samples I conclude that it uses a strict, static unpacking routine (i.e., a minor modification of the OEP area will keep KAV from unpacking a known packer). Kaspersky has been notified about this issue.

If you want to code a better static unpacking engine you need to include an opcode filter which not merely analyzes the unpacking stub of a compressor but also detects such "dirty" tricks like OEP redirection or inclusion of NOPs.

I am not aware of any public signature databases for compressors/crypters but this does not have to mean anything. (You may want to start with procdump.)

Btw: Other scanners like McAfee do not need to unpack each and every compressed malware sample since they (additionally) use signatures from the .resc section of a file. The resource section is frequently not compressed at all. However, this detection method comes at a bitter price. Signatures taken from the resource section can be easily modified (e.g., with a tool like resource hacker) and are therefore not secure. For example, it frequently suffices to change the icon of a trojan in order to keep McAfee from detecting it. By contrast, KAV uses "strong" signatures taken from the code section of a malware sample. (Therefore, the Kaspersky signature database was "cracked" by SennaSpy...)

In summary, I feel that a person equipped with the knowledge of an experienced reverse engineer has the power to significantly damage the reputation of any AV/AT scanner. On the other hand, a reverse engineer who supports the development of unpacking routines can be extremely helpful to any AV/AT software producer (and their customers of course). It is really a matter of using or abusing knowledge.

Cheers,

Nat

Clandestiny
October 2nd, 2003, 22:29
Volodya:
A BIG thanks to you for those signatures There seem to have been some problems with the site for the past couple of days, but I finally got em' today The PE Tools package looks very nice. Let me just clarify my interpretation of the signature database though...

For example: [Crunch/PE=55E8000000005D83ED068BC5556089AD::::::::2B85]

Am I correctly interpreting the "::::" as representing "wild card" bytes?

BTW, I'll be happy to share my source with ya when its done although I'm not too sure you'll learn much new since this is going to be a very basic scanner. It has to be completed by the end of the course I'm taking in 2.5 months and has to be fit in with homework from several other graduate classes... Needless to say, the possiblites are limited by my time I am, however, interested in this area and have plans to continue research with the eventual goal to gain some understanding of AV emulation systems.

Nat:
Also, I wanted to thank you for your very thoughtful and informative reply. I understand that OEP obsfucation is a big issue, however, my present efforts are going to be in the basic realm and will most likely not address this complex issue. I am curious, however, if the OEP obsfucation problem can be handled with emulation? Say, you locate the general area of the OEP and then systematically run an emulator from potential OEP addresses. Could you trap the resulting faults from incorrect OEP guesses and restart the emulator systematically until it ran a significant ways without crashing and then based on that conclude the true OEP value???

Cheers,
Clandestiny

Nat
October 3rd, 2003, 06:08
Quote:

I am curious, however, if the OEP obsfucation problem can be handled with emulation? Say, you locate the general area of the OEP and then systematically run an emulator from potential OEP addresses. Could you trap the resulting faults from incorrect OEP guesses and restart the emulator systematically until it ran a significant ways without crashing and then based on that conclude the true OEP value???


@Clandestiny

(a)
In theory, I believe that an emulation can indeed handle "unknown" compressors or manipulated "known" compressors. However, generic unpacking is difficult to implement and very slow. For example, you need to make sure that an emulation is not outfoxed by a wait loop which takes advantage of the speed difference between a real computer and a virtual emulation.

There are only a few AV scanners which permanently use an emulation for detecting non-viral malware. One of them is NOD32 version 2. I understand that NOD32 supports static unpacking only for UPX and ASPack. Therefore, I believe that NOD32 is a good example for studying the modus operandi of an emulation (you need to enable "advanced heuristic" to activate the emu):

If you scan OEP obfuscated, non-viral malware with NOD32 you will see that some samples can be unpacked and correctly identified. Other samples cannot be unpacked at all (depending on the obfuscation method). Moreover, the NOD32 heuristic will frequently - but not always - detect an OEP obfuscated malware sample as an unknown virus.

In summary, you will see that an emulation is nice but not necessarily the ideal way to detect protected malware.


(b)
I am unable to explain the exact details of an emulation since I have not coded one. I believe that one of main problems connected with your suggestion ("restart the emulator systematically" is the speed penalty caused by an emulation.

(c)
You may be interested in the following links (unless you already know them):

http://www.extremetech.com/article2/0,3973,325439,00.asp

http://online.securityfocus.com/infocus/1552

http://www.nai.com/common/media/vil/pdf/imuttik_VB_%20conf_2000.pdf

http://www.sophos.com/virusinfo/whitepapers/savdetection.html

http://www.norman.com/documents/nvc5_sandbox_technology.pdf

http://securityresponse.symantec.com/avcenter/reference/heuristc.pdf


Cheers,

Nat

volodya
October 3rd, 2003, 09:55
Am I correctly interpreting the "::::" as representing "wild card" bytes?

Yes. Absolutely.

disavowed
October 3rd, 2003, 15:37
Quote:
[Originally Posted by Nat]You may be interested in the following links (unless you already know them):

http://www.extremetech.com/article2/0,3973,325439,00.asp

http://online.securityfocus.com/infocus/1552

http://www.nai.com/common/media/vil/pdf/imuttik_VB_%20conf_2000.pdf

http://www.sophos.com/virusinfo/whitepapers/savdetection.html

http://www.norman.com/documents/nvc5_sandbox_technology.pdf

http://securityresponse.symantec.com/avcenter/reference/heuristc.pdf
thanks for posting those great links!

v0kram
October 4th, 2003, 05:33
Hi Clandestiny,

If you are just going for static analysis which is more appropriate coz of your shortage of time, it is best to follow just the simple signature scanning techniques as most identifiers use nowadays. Use a list of most common packer/compiler signatures (given a short time again, using Mr. Volodoya's infamous list would be fine) Also in the course try finding out which bytes/ares are being scanned for the packers concerned. Most of the time it is the EntryPoint which is being scanned, but sometimes it might be wiser to scan other areas. For example some simple polymorphic code detection can be still be analysed by signature scanning instead of dynamic analysis, just look at the right places

Also you might like building up your own, semi-heuristic scanner by analysing other parts other than executable code. Some packers subtly change headers/import tables/section info.

All these can together give you a powerful scanner using just signature scanning with some extra tricks ( Two of the best PE detectors currently use this technique and are quite reliable IMHO )

I myself have been developing a similar scanner for sometime and would be happy to see more work in the area.

Also thanks to Nat for the wonderful links

Good luck

Clandestiny
October 4th, 2003, 23:04
Thanks Nat for the info and excellent links and thanks v0kram for the advice... The "building an AV engine" article was very helpful and definately gave me some insight into desigining my little scanner with modularity in mind. Also, the "sandbox" article was particularly intriguing, albeit shocking to realize the complex techniques some of these scanners use!

Currently, my plans are for a *basic* PE scanner that will use both simple heuristics and a traditional signature scan. (I found a reasonably up to date database of virus signatures from Clam Antivirus, an open source *nix antivirus package). I am thinking to use heuristics based on PE structural abnormalities that normally would occur in either packed or infected executables (ie. abnormal entry point position, raw and virtual section size inconsistancies, ect) to identify "suspicious" files and then send those files identified as "suspcious" into the signature scanner for a positive identification as "infected". That way, the scan would be faster as it would not have to waste time doing an entire signature scan on a file that had no other abnormal characteristics. However, I'm not actually certain if it would be possible to obtain positive identification of an infected file in the event it didn't match a known signature because it doesn't seem like simple heuristics based on structural abnormalities could give you enough data to make a reasonable elimination of "false positives".

Quote:
Most of the time it is the EntryPoint which is being scanned, but sometimes it might be wiser to scan other areas. For example some simple polymorphic code detection can be still be analysed by signature scanning instead of dynamic analysis, just look at the right places.


I'll admit that I don't know a whole heck of a lot about viral infection as I'm just coming from a general knowledge of the PE format as it relates to manual unpacking, but off hand I can see 3 routes of infection:

1. Code is appended to .exe in new section and entry point redirected to that last section where the code was appended. This case shouldn't be too hard to identify and it would make sense that you would scan the entry point containing section. I can see, however, that the real entry point might just be modified with a far jump or call to point to the appended malicious code so then it might make sense to disassemble the first few instructions looking at jmp and call instructions to identify the best area to scan.

2. I can also see where the virus might not want to change the file size and might just insert its code into some empty space at the end of an existing section. Neverthless, it seems this area to scan could be identified same as in the first scenario.

3. Inline patch scenario - the virus could insert a call at some arbitrary point in the file and redirect code flow to the inserted viral section. I don't know how many viruses might actually do this, but it seems this would make it quite difficult to narrow down a specific section to scan. I guess you would just scan the whole file in this case?

Am I missing any BTW v0kram, are you using "heuristics" in your scanner in the sense of regular "wild card" signature strings or are you actually trying to define generic "behaviors" with generic signatures... as in, say, identify a decryption loop by using generic signatures that could indicate the behavioral components of a decryption like initializing a memory pointer, initializing a counter, moving some memory, ect.

Clandestiny

v0kram
November 13th, 2003, 01:15
Hi Clan and the others, sorry for the late reply, was busy coding something

Anyways, what I meant by saying scanning other areas has been slightly misinterpreted by you...

Scanning code in PE files is always not the only way in signature scanning...Consider the case of any good/decent polymorphic code, I assume its obvious one would be lost with signature scanning. But some other aspects might remain the same( the Import Table would be a nice option I guess? )

I admit scanning PE headers etc. is usually a bad thing to rely on, but if you base your results partly on them and then do the other scans (whatever you have in mind) the results might be quite accurate...

Building a good heuristic engine is quite some work I'm sure, and getting ideas is probably the toughest part in it...

Hope your project is coming along good.

Peace

NoLoader
October 18th, 2007, 23:34
Hi Clandestiny,

> I am working on a very simplistic heuristic anti-virus
> scanner for a graduate course...
> ... packed / encrypted targets using only static analysis.
If you have a copy of Peter Szor's The Art of Computer Virus Research and Defense, he covers the Packer topic extensively.

He and I have bounced some emails regarding the topic. If I recall correctly, he stated that Symantec's scanner detects over 500 packers (the book was published in 2005, so I imagine it recognizes a bit more now).

> ...rather than dynamic analysis using emulation.
It's there in case you want it - Szor also discusses the emulator. Eugene Kaspersky (of Kaspersky Labs) lets the virus do the work in an encrypted/packed EXE using an emulator. After the virus completes its decompress and/or decrypt, the file is then scanned. I believe Symantec has adopted this approach.

> if there is a repository of packer "signatures" somewhere?
I am not aware, but hopefully one of the senior reversers can point you in the right direction. Perhaps there is a tool out there from which you can shamelessly rip code

Jeff
Jeffrey Walton

Quote:
[Originally Posted by Clandestiny;29414]Hiya guys,

I am working on a very simplistic heuristic anti-virus scanner for a graduate course I'm taking. At present my focus is on static and statistical analysis of PE structure rather than dynamic analysis using emulation. As you all know, many packers make modifications to the PE structure similarly to viruses and there is a good possiblity for false positives from the packed / encrypted targets using only static analysis. I am thinking to incorporate a basic signature scanner for ruling out the false positives for packed programs. I realize that there are a number of packer identifiers out there and I was curious if there is a repository of packer "signatures" somewhere? I have to admit that its not particularly thrilling to me to contemplate d/ling and disasembling hundreds of packed programs looking for signature byte sequences

Thanks,
Clandestiny

Kayaker
October 19th, 2007, 00:25
The additional info is appreciated, even though it's about 4 years too late to help Clandestiny graduate

NoLoader
October 19th, 2007, 00:35
Hi Kayaker,

Quote:
[Originally Posted by Kayaker;69564]The additional info is appreciated, even though it's about 4 years too late to help Clandestiny graduate

Whoops... I was wondering why one of the other posters stated Kaspersky was having issues detecting packers.

Another interesting topic with regard to BlackHat, Packers, and Malware: http://www.blackhat.com/html/bh-usa-06/bh-usa-06-speakers.html#Morgenstern. It is up to date

I guess my next question is, How did I manage to find a 4 year old thread?

Jeff

JMI
October 19th, 2007, 02:22
Relatively simple. The "index" does a index of individual words in all the threads and is one of the largest indexes in the vBulletin system. Therefore, if the Thread is in the database and the database has been indexed, it will find all threads containing the searched "word."

Regards,

blabberer
October 19th, 2007, 12:13
iinw clandestiny now walks around the world showing how to outscan scanners