Pioneering New Anti-Viral Technologies
by Adam Young
I am a hacker and a computer scientist and I have studied viruses from both perspectives since the mid-1980s. Knowing how viruses work helps to distinguish between good anti-viral software and bad anti-viral software. Similarly, knowing how antiviral programs works helps one to write better and more effective viruses. This article summarizes many years of my independent study on computer viruses.
This article is divided into several sections. In the first section, I correct the misinformation in an article in 2600 called "Protecting your Virus." Background information is then provided on the use of cryptographic checksums for antiviral purposes. In the third section I assume the role of an anti-viral developer and explain an idea of mine that could significantly reduce the viral threat to society. The last section covers how this new method can be bypassed by certain viruses.
This will be of use to virus writers and anti-viral developers alike. It contains information that can help anti-viral developers make software more resistant to viral attack. It also explains how to correctly "protect your virus" and explains one possible method to bypass programs that do cryptographic checksums.
How to Really Protect Your Virus
In order to explain the new anti-viral development, the concept of "polymorphic viruses" must first be explained. A polymorphic virus is a self-replicating program whose object code changes to avoid detection by anti-viral scanners. This change can be made to occur once every generation of the virus or more, depending on how safe the virus needs to be. The topic of polymorphic viruses was incorrectly given in the article, "Protecting Your Virus" by Dr. Bloodmoney in 2600 Magazine, Vol. 10, No. 3. Dr. Bloodmoney provided a "viral protection mechanism" that will, to the contrary, cause viruses with this mechanism to be easily detected by anti-viral programs. The concept of polymorphic viruses has been around since at least the 1980s. The Internet worm exhibited certain polymorphic attributes. Refer to the comp.virus newsgroup on the Internet for more on the subject.
The following is the structure of a virus that can evade detection by anti-viral scanners:
- Decryption Header
- Jump to Main Part of Virus
- Body - MtE
- Body - Main Part of Virus
Here is how it works:
1.) The operating system sends control to the virus.
2.) The Header executes and decrypts the entire body of the virus.
3.) Control jumps over the MtE routine to the main part of the virus.
4.) The main part of the virus executes and the virus replicates. The MtE (mutating engine) is executed to make the child virus have a different header than the parent. A random number is generated. The random number is XORed with each machine word in the body of the child to ensure that the encrypted body of the child is different from the encrypted body of the parent. The random number is then written to the header of the child virus.
5.) Control is sent to the host program.
The Dark Avenger is credited with the term MtE. He is the infamous hacker who distributed source code for a MtE function. This source code is not very special since it is easy to write the function once the purpose of the function is understood.
The mutation routine creates modified versions of the decryption header in the viral offspring. Dijkstra once said that all that is necessary to represent program structure is sequence, iteration, and condition. As it turns out, very often portions of "sequence code" in programs can be rearranged without changing the output of the code. The mutating routine can therefore generate headers with varying instruction sequences. Many mutating routines also interleave "dummy" instructions between the useful instructions in the header.
The following is a list of example dummy instructions in pseudo-assembler:
OR #0, reg1 ADD #0, reg1 SUB #0, reg1 MUL #1, reg2 DIV #1, reg2 NOPThe above instructions are based on the mathematical property that:
x + 0 = x x - 0 = x ... etc.Microprocessors support such instances of these instructions even though they obviously accomplish nothing. By randomly interleaving dummy instructions in the header, the header becomes harder to detect by anti-viral scanners. Therefore, by using this method both the header and the body are mutated from generation to generation.
Dr. Bloodmoney's mechanism uses a header that never gets mutated. Therefore, all a scanner has to do is search for Dr. Bloodmoney's header. Polymorphic viruses are loved by virus writers because they cause the number of false positives during anti-viral scans to increase.
Cryptographic Checksums
A checksum is defined as "any fixed length block functionally dependent on every bit of the message, so that different messages have different checksums with high probability".1
In the case of checksums on programs, the programs' object code is the "message." A program can detect viral infection by performing a cryptographic checksum on itself when it runs. If the checksum fails, the program concludes that it has been modified in some way, and notifies the user. A checksum will almost always indicate an infection when a virus attaches itself to a host that performs integrity checking.
Since most programmers do not even know what a cryptographic self-check is, self-checks are often not included in final products. Another reason why they are not widely used is that the software needed to perform strong checksums is not widely available. The disadvantages to self-checks are that they are not needed in programs and that they use a small amount of CPU time. The amount of CPU time used is insignificant compared to the increase in product reliability. This is why all well written commercial programs perform integrity checks.
The Need for Availability and Standardization
I have seen too many public domain programs succumb to infection by pathetic viruses, and I have seen too many programs perform weak self-checks.
It is embarrassing how many viruses flourish on the IBM PC compatible platform. You want to know why there are so few Mac viruses? Everyone wants to know why. I know why. The main reason is that more Mac programs perform self-checks than PC programs. It's that simple. In the rest of this section I will explain how all programs can be made to be more resistant to viral infection.
It may not be obvious at first, but this new anti-viral development is in the best interest of society and hackers alike. Hackers are egomaniacs who pride themselves on knowing more about computers than everyone else. It therefore follows that every hacker wants to make a name for himself. How many people have written PC viruses? 1,500 or 2,000 people? If writing a virus that spreads becomes more challenging, then only the best hackers will be able to do so and only they will achieve recognition.
The need for standardization is apparent from my own research. Very few programs perform self-checks. Of those that do, very few perform strong cryptographic self-checks. Most self-checking programs simply verify their own size in bytes and verify that certain resources and overlays are present. This is not good enough. A virus could delete non-critical resources in a host, infect the host, and then buffer the end of the code with garbage so that the size of the host is the same as it was originally.
I propose that the standard libraries of all popular commercial languages should include a strong cryptographic checksum function. This would significantly reduce the viral threat to society.
For example, the ANSI C standard library should contain a function called selfcheck(). The following is the prototype:
int selfcheck(void); /* returns true if checksum succeeds, false otherwise */If this were standardized and included with all major compilers, then programmers would have easy access to a strong cryptographic self-checking routine. It is widely known that most viruses spread through the public-domain. If public-domain software developers had this function in their standard libraries, then it would be easy for them to call the function in their programs. Then, in time, only a small subset of viruses would be able to spread effectively. Also, these viruses would be larger and more complex since they would have to circumvent this protection mechanism. A large virus is much easier to detect than a small one.
The next question is, why hasn't this already been done? Strong cryptographic checksum technology has been around for quite a while. I think I know the answer to this question. It probably hasn't been done because it would be too easy to write a virus that disables the proposed checksum routines. For example, consider the following attack.
Hacker X is writing a virus for the PC platform. He knows that the commercial C compiler called "comp A", has selfcheck() in its standard library. He also knows that selfcheck() is in the library of the popular C compiler called "comp B". For the sake of argument, let's say these compilers were used to make roughly 90% of all public-domain software for the PC platform. Hacker X then compiles the following program using each compiler:
#include <stdlib.h> main() { selfcheck(); }
He then analyzes the object code of each program and chooses two search strings. Hacker X then programs his virus to search for these functions in any potential host. If the functions are found in the host, the routine selfcheck() in the host is overwritten with NOPs. The very last instruction in selfcheck() is made to return "true". Therefore, whenever the infected program calls selfcheck(), true is returned.
One could therefore concede from the above argument that if programs included standardized self-checking routines, then viruses would soon include standardized selfcheck() scanners!
As it turns out, this problem can be circumvented. To see how, let me ask the following question. Is polymorphic technology only useful as a viral technology? Of course not. I propose that in addition to adding selfcheck() to the ANSI C standard library, a mutation engine should, to all ANSI C compilers!!! The new ANSI C compiler would then work as follows.
Every time a program that calls selfcheck() is compiled, the compiler completely mutates selfcheck(). This mutated version is then included in the final program. The linker insures that selfcheck() is placed at random between the functions from the source files. Adleman2 proved that detecting an arbitrary virus is an intractable problem. In a similar manner, one can conclude that using this method, detecting selfcheck() by a virus is an intractable problem.
If the above idea is implemented, everyone who uses standard libraries will be able to significantly increase the security of their programs by simply including the following code:
#include <stdlib.h> main() { if (!selfcheck()) { printf("You got problems pal!\n"); exit(1); } /* rest of program */ }This would significantly enhance the security off all Division D ADPs (i.e. Mac and IBM PCs). See the DoD Orange Book for details.
How to Bypass Cryptographic Self-Checks
I have included this section for comparison purposes to the above section. It is important that the general public realize that cryptographic self-checks are not the be-all-end-all of preventative measures. The aforementioned method is to be used to supplement viral protection systems, not to replace them.
Consider a three phase virus. The virus can reside in RAM, in a program, or in the boot sector. When the virus is run in an application it tries to infect the boot sector. When the computer is booted, the virus in the boot sector infects RAM. When the virus is in RAM it tries to infect programs. Rather than having the virus patch an operating system routine so that it infects a program when it starts up, let's assume it patches a routine such that it infects applications when they terminate. Now traditionally, when the virus finishes executing in a host, it remains in the host and sends control to the host. If the host calls selfcheck(), the virus will be detected. But what if, prior to sending control to the host, the virus disinfects itself.
Does this make the virus more vulnerable? Think about it.
Bibliography
1.) Denning, Dorothy E., Cryptography and Data Security, Addison-Wesley Publishing Co., 1982, p. 147.
2.) Adleman, Leonard M., "An Abstract Theory of Computer Viruses", Lecture Notes in Computer Science, Vol. 403
3.) S. Goldwasser(ed.), Advances in Cryptology - CRYPTO '88, Springer-Verlag, 1990.