View Full Version : vm for the masses - a vm compiler incl source
0rp
April 11th, 2007, 14:56
hi,
i have attached the complete sourcecode of a working vm compiler. this compiler was used for the 'impossible crackme' - crackmes
i have also included a brief explanation of everything
please keep in mind that this vm underwent some major changes (read the impossible crackme threads), thats why parts of the code are messy and smelly
p.
Sab
April 11th, 2007, 15:05
Great to see a good public contribution. Thanks orp.
ZaiRoN
April 11th, 2007, 15:58
Thank you Orp

b3n
April 11th, 2007, 21:56
thanks orp, i was looking for something like this!

winndy
April 12th, 2007, 06:16
thank you very much!!
That's what I'm looking for.
FaTaL_PrIdE
April 12th, 2007, 07:56
Great contribution. Thank you for sharing!

winndy
April 12th, 2007, 21:51
I try to compile it with VS6.
I download msvcr80.dll and msvcp80.dll.
But opcodetoheader still can't be executed.
Finally I found it's side-by-side configuration error.
I installed vcredist_x86.exe.It still cann't run .
opcodetoheader source isn't included.
Orp,would you please upload opcodetoheader source code?
Thanks!!
Another question:
What's the BYTE base[] array?
How does it be generated?
NeOXOeN
April 13th, 2007, 06:50
thx for source i was looking for something like this for long time
i think its for VC 7
bye
0rp
April 13th, 2007, 12:32
hi,
i have attached the opcodetoheader sources
the base[] array is the ready-to-use vm-binary-code.
this whole sourcefile (vmfuncs.cpp) is generated by the backend
see void Backend::generateCPP()
Silver
April 13th, 2007, 12:52
Nice stuff 0rp, I'll have a browse through your code later.
Is there a lot of interest in VM these days? Was mulling over a RECON submission for this year...
winndy
April 13th, 2007, 21:58
Thank you,Orp.
I'll take a good study at your code.
So If we want add more fuctions in vmfuncs.cpp,
we should write code to generate it.
Every fuction in vmfuncs.cpp has a different offset.
And instructions.dat is the base array.
char* mem points to the randmized data which is writed in base[] later.
While in compiler.cpp,some base array DWORD are wrote with fuction address or variabal address.
Code:
*(DWORD *)(base + 0) = (DWORD)xm_allocate;
*(DWORD *)(base + 4) = (DWORD)xm_free;
*(DWORD *)(base + 8) = (DWORD)sprintf;
*(DWORD *)(base + 12) = (DWORD)globals;
*(DWORD *)(base + 16) = (DWORD)xm_printf;
*(DWORD *)(base + 20) = (DWORD)xm_export;
I'll study it more carefully to understand the blueprint of how VM works.
BR
0rp
April 14th, 2007, 02:21
if you want more functions in vmfuncs.cpp, then you have to put more funcs into your input script (test.txt)
basically each function has an own startoffset in this base array, but only functions that are exported (__export) get special code, that pushes real stack parameters to the vm stack:
Code:
if (function->containsDeclSpec("export"

)
{
INSTR_BEGIN(ENTER);
vmFunction->exportStart = instr;
INSTR_END();
for (int i = 0; i < function->parameters.size(); i++)
{
MOV_TEMP_CONST(TEMP(1), (10 + i) * 4);
ADD(TEMP(1), APPREGS);
MOV_TEMP_MEM(TEMP(0), TEMP(1));
MOV_MEM_TEMP(ESP, TEMP(0));
MOV_TEMP_CONST(TEMP(0), 4);
ADD(ESP, TEMP(0));
}
}
Code:
*(DWORD *)(base + 0) = (DWORD)xm_allocate;
*(DWORD *)(base + 4) = (DWORD)xm_free;
*(DWORD *)(base + 8) = (DWORD)sprintf;
.....
this are required 'imports', that the vm needs to run happily. so if you finally generated a vm and want to start it, you have to write this functionptrs to those vm addresses. its done in compiler just for testing purposes, since the vm gets executed:
Code:
char msg[1024];
test(43, msg);
info("%s", msg);
winndy
April 14th, 2007, 03:48
Orp,Thanks for your explanation.
I'm sorry to trouble you again.
Coco.exe caused side-by-side configuration error.
It just donn't work.
It seems that you rebuild your coco.exe .
Is your coco source this one:
Quote:
Coco/R for C++
ported and maintained by Markus Löberbauer and Csaba Balazs
|
I replaced Coco.exe with the above coco.exe.
I just got error:
Quote:
Coco/R (Jan 15, 2007)
checking
FuncCallParams deletable
Statements deletable
XM deletable
LL1 warning in Factor: "(" is start of several alternatives
LL1 warning in IfElse: "else" is start & successor of deletable structure
parser -- incomplete or corrupt parser frame file
|
I wonder what coco.exe you used.Thanks.
I just want to compile your xm sourcecode.I didn't expect so much problems.
Sorry.
And I think I should turn to VS2005.
BR
0rp
April 14th, 2007, 04:59
check the attachment
i recompiled coco without msvcrt dlls, ive also included its source
i changed coco a bit to fit my needs
i also re-enabled a fancy vm feature:
data MessageBoxA = __export("user32.dll", "MessageBoxA"

;
MessageBoxA(0, "oook", "hi", 3);
winndy
April 14th, 2007, 05:40
That's very kind of you.
I'll study it.
You're a great coder and reverser.
What's more,you are my patient teacher.

NeOXOeN
April 14th, 2007, 18:53
i think its one or rare source which came to public and are reaLLY great...
thx again ..
b3n
April 19th, 2007, 07:40
do i get this right that coco only generates you the parser and scanner but you have to write the compiler yourself?
from what i understand so far is that coco is run on a language to produce some sort of output. is the coco output already the code that gets executed by the virtual machine or is it processed further in to create a virtual machine byte code?
im a bit lost here (even after having a look at the sources), so maybe someone can point me in the right direction.
0rp
April 19th, 2007, 13:29
coco generates the sourcecode of the used compiler, it is configured by the grammarfile xm.atg
so basically i dont write the compilersource myself, i just make a config file for coco. based on this config, coco generates the sources for the compiler wich are used then
b3n
April 19th, 2007, 20:34
so the compiler generated by coco transforms your instructions into this for example:
00000000 mov temp_0000, 0
00000001 mov i, temp_0000
00000002 mov temp_0000, i
00000003 mov_data temp_0000, src
00000004 mov temp_0001, 0
00000005 not_equal temp_0000, temp_0001
(taken from your strcpy snippet in the bigpicture.txt)
and this is then executed by the vm? or is it processed further to some sort of binary code? which of the method in the packages is actually executing the instructions?
Silver
April 20th, 2007, 04:19
b3n, I don't think following 0rp's code is going to help you with what you want. It might actually make it harder to understand.
0rp, no reflection on your code, just that b3n and I had quite a detailed discussion about VMs via privmsg.
b3n
April 20th, 2007, 09:31
hi silver,
its not really concerned with what we talked about, i just want to get an understanding on how 0rp's code works and i couldnt figure that out yet.
0rp
April 20th, 2007, 15:31
lets assume you have this expression:
1 + 2 * 3
the coco-generated compiler (aka frontend), transforms this expression into:
Code:
00000000 mov temp_0000, 1
00000001 mov temp_0001, 2
00000002 mov temp_0002, 3
00000003 mul temp_0001, temp_0002
00000004 add temp_0000, temp_0001
(if you prefer stackmachines, this code is identical to
Code:
push 1
push 2
push 3
mul
add
actually the first xm generation was a stackmachine)
this frontend code is given to the backend, wich transforms it into real down to the metal vm-instructions:
Code:
00000000 mov temp_0000, 1
---------------------------------------------------------
10 00000126 MOV_TEMP_CONST 00000064, 00000001
11 00000ca0 MOV_TEMP_CONST 00000078, 00000000
12 0000020a ADD 00000078, 00000008
13 00000ce5 MOV_MEM_TEMP 00000078, 00000064
00000001 mov temp_0001, 2
---------------------------------------------------------
14 000003f1 MOV_TEMP_CONST 00000064, 00000002
15 00000944 MOV_TEMP_CONST 00000078, 00000004
16 0000074c ADD 00000078, 00000008
17 0000031a MOV_MEM_TEMP 00000078, 00000064
00000002 mov temp_0002, 3
---------------------------------------------------------
18 00000f62 MOV_TEMP_CONST 00000064, 00000003
19 00000d2e MOV_TEMP_CONST 00000078, 00000008
1a 000008fd ADD 00000078, 00000008
1b 00000ff0 MOV_MEM_TEMP 00000078, 00000064
mul temp_0001, temp_0002
---------------------------------------------------------
1c 00001187 MOV_TEMP_CONST 00000078, 00000004
1d 000011cc ADD 00000078, 00000008
1e 00000d73 MOV_TEMP_MEM 00000064, 00000078
1f 0000125a MOV_TEMP_CONST 00000078, 00000008
20 00000c59 ADD 00000078, 00000008
21 00000e46 MOV_TEMP_MEM 00000068, 00000078
22 0000081f MUL 00000064, 00000068
23 0000004f MOV_TEMP_CONST 00000078, 00000004
24 00000a62 ADD 00000078, 00000008
25 00000a19 MOV_MEM_TEMP 00000078, 00000064
add temp_0000, temp_0001
---------------------------------------------------------
26 000012b3 MOV_TEMP_CONST 00000078, 00000000
27 00000989 ADD 00000078, 00000008
28 000003a8 MOV_TEMP_MEM 00000064, 00000078
29 00000507 MOV_TEMP_CONST 00000078, 00000004
2a 00000f1b ADD 00000078, 00000008
2b 00000094 MOV_TEMP_MEM 00000068, 00000078
2c 000012f8 ADD 00000064, 00000068
2d 00000ed6 MOV_TEMP_CONST 00000078, 00000000
2e 0000066c ADD 00000078, 00000008
2f 000009d0 MOV_MEM_TEMP 00000078, 00000064
(first the frontend instruction, following the required vm instructions)
as you can see, there are a lot of vm instructions required to do one frontendinstruction (i.e. add temp, temp requires 10 vm instructions)
b3n
April 21st, 2007, 20:13
thanks for that explanation 0rp, that made it a lot clearer. im currently still digging through the code commenting as much as i can. but i havent found the method that is doing the execution of the vm instructions yet. where is the generated backend code executed? or is the backend code generated and executed on the fly when the frontend instructions are read?
edit:
am i right if i assume the following snipped of vm code would translate to the instructions shown below?
10 00000126 MOV_TEMP_CONST 00000064, 00000001
11 00000ca0 MOV_TEMP_CONST 00000078, 00000000
12 0000020a ADD 00000078, 00000008
13 00000ce5 MOV_MEM_TEMP 00000078, 00000064
mov dword [ebx+0xededed00], 0xededed01
mov dword [ebx+0xededed00], 0xededed01
mov eax, [ebx+0xededed01]
add [ebx+0xededed00], eax
mov eax, [ebx+0xededed01]
mov ecx, [ebx+0xededed00]
mov [ecx], eax
im dont know what 0xededed00 and 0xededed01 are used for, could you please explain that to me?
[--MOV_TEMP_CONST--]
//initialize temp reg with 1 (ebx+0xededed00 points to the first temp reg?)
//is 00000064 in ebx?
mov dword [ebx+0xededed00], 0xededed01
[--END MOV_TEMP_CONST--]
[--MOV_TEMP_CONST--]
//same as above, initialize second temp reg with 0
mov dword [ebx+0xededed00], 0xededed01
[--END MOV_TEMP_CONST--]
[--ADD--]
//move value of temp reg 2 into eax
mov eax, [ebx+0xededed01]
//probably add the value in eax to the first temp reg, but im not sure what
//the 00000008 in the vm code stands for
add [ebx+0xededed00], eax
[--END ADD--]
[--MOV_MEM_TEMP--]
//move value of second temp reg into eax
mov eax, [ebx+0xededed01]
//move address of first temp reg in ecx
mov ecx, [ebx+0xededed00]
//save eax at address of first temp reg
mov [ecx], eax
[--END MOV_MEM_TEMP--]
0rp
April 22nd, 2007, 08:08
the instructions itself are executable, when the vm is entered, it goes straight to the first opcode, this opcode knows who is next and jumps to it, and so on
this edededXX stuff are markers. i compile the opcode source into .bin and overwrite the edededXX markers with their real values (done in void Backend::writeParam)
example:
ADD TEMP_0064, TEMP_0078
add opcode source:
mov eax, [ebx+0xededed01]
add [ebx+0xededed00], eax
wich gets:
mov eax, [ebx+0x78]
add [ebx+0x64], eax
so 0xedededed01 (the source operand) is replaced with 0x78 during generation, and 0xededed00 (the dest) is replaced by 0x64
and you are right with your example of those 4 instructions and their real asm
b3n
April 22nd, 2007, 08:22
thanks 0rp!
so do i get this right:
1. you let the compiler generate the vm instruction from the input script
2. the vm runs over this script and executes the matching instructions
so:
ADD TEMP_0064, TEMP_0078
will be executed by the vm like:
1. find out instruction (in this case add)
2. look up the compiled opcode
3. patch the 0xebebeb00 and 0xebebebe01 markers
4. execute the opcode instructions
5. get next instruction
did i get this right?
0rp
April 22nd, 2007, 09:36
this replacement of edededXX is done while generation, not while execution
so, when generation is done, you have a big block of x86 executable code, that make up the single steps, so somewhere it will contain
mov eax, [ebx+0x78]
add [ebx+0x64], eax
which was required for something
here is how the final generation result looks like without encryption:
mov temp64, 1:
0049E845 mov dword ptr [ebx+64h],1
0049E84F mov ecx,4FCh
0049E854 mov edx,19h
0049E859 add ecx,dword ptr [ebx+2Ch]
0049E85C jmp ecx
mov temp_78, 0:
0049ECDC mov dword ptr [ebx+78h],0
0049ECE6 mov ecx,0C8h
0049ECEB mov edx,1Bh
0049ECF0 add ecx,dword ptr [ebx+2Ch]
0049ECF3 jmp ecx
add temp_78, temp_8
0049E8A8 mov eax,dword ptr [ebx+8]
0049E8AE add dword ptr [ebx+78h],eax
0049E8B4 mov ecx,516h
0049E8B9 mov edx,1Dh
0049E8BE add ecx,dword ptr [ebx+2Ch]
0049E8C1 jmp ecx
so the vm instructions end up as a chain of small executable and customized (the edededXX markers are replaced) x86 blocks, that are chained
b3n
April 22nd, 2007, 17:24
i see, so the compiled opcode snippets are just small templates of code that get customized by the vm environment and put together to form the final program? the way is see it the backend is kind of a compiler too, which produces the final binary as output. the final program is then run by executing the first instruction in the instruction chain?
0rp
April 23rd, 2007, 12:54
yes, exactly

b3n
April 23rd, 2007, 18:39
why did you decide to create a final binary version of the input program instead of letting the vm execute the vm instructions during runtime as kind of an interpreter? if you have a binary version of the input program, what do you need the vm for? (maybe i missed something on the way but thats what i ask myself)
0rp
April 24th, 2007, 13:38
there was a xm version, that was working like you suggested
it had a static number of generic opcodes (add, mov, mul,...) that were parameterized. thatfor the vm contained also a big parameterstream
i didnt like this idea too much, bc you can easy replace the static number of opcodes by own hacked opcodes and do whatever you want
b3n
April 24th, 2007, 20:28
maybe you can help me with this 0rp: im just trying to develop my own little grammar to play around with, but the scanner and parser generated use wchar_t* everywhere instead of char*. i saw your scanner and parser use just char *. is there any way on how to tell the coco to use char* instead of wchar_t? its driving me nuts cause every time i change something in the grammar and i have to regenerate the parser and scanner i have to manually edit all the files...
dELTA
April 25th, 2007, 02:23
Quote:
[Originally Posted by b3n]why did you decide to create a final binary version of the input program instead of letting the vm execute the vm instructions during runtime as kind of an interpreter? if you have a binary version of the input program, what do you need the vm for? |
Quote:
[Originally Posted by 0rp]bc you can easy replace the static number of opcodes by own hacked opcodes and do whatever you want |
Well, sure, but in the case of building a normal binary like this, you lose the entire idea of people not being able to analyze the code statically with any tool they like, not to mention creating a simple IDC script that marks up all these sequences into their corresponding VM instruction (or even dumps the entire original script to a text file). (and yes, a much more advanced IDC script could do this even if you do it in VM code, but that's much harder, and again, exactly what is the reason/advantage with a VM in the first place with this method?)
And I really don't want to be rude or anything, I just wanted to check if I missed something here, just like b3n?
b3n
April 25th, 2007, 02:56
i think you got more to the point than me dELTA

0rp
April 25th, 2007, 12:09
its using vmregs or a vmstack, so i would still call it a vm, or whats the definition of a vm?
as i said, it was a vm like you mean in some early version:
http://woodmann.com/forum/attachment.php?attachmentid=1531&d=1166647623
opcodes were much bigger and generic, and there was an array of vminstructions that were in fact the params for those generic opcodes
an opcode looked like this:
Code:
0040D0EE 8B6B 24 mov ebp, dword ptr ds:[ebx+24]
0040D0F1 036B 14 add ebp, dword ptr ds:[ebx+14]
0040D0F4 8D75 6C lea esi, dword ptr ss:[ebp+6C]
0040D0F7 8B06 mov eax, dword ptr ds:[esi]
0040D0F9 B9 08000000 mov ecx, 8
0040D0FE 8B148E mov edx, dword ptr ds:[esi+ecx*4]
0040D101 3353 28 xor edx, dword ptr ds:[ebx+28]
0040D104 0353 14 add edx, dword ptr ds:[ebx+14]
0040D107 3302 xor eax, dword ptr ds:[edx]
0040D109 ^ E2 F3 loopd short testcon.0040D0FE
0040D10B 8943 4C mov dword ptr ds:[ebx+4C], eax
0040D10E 8DB5 90000000 lea esi, dword ptr ss:[ebp+90]
0040D114 8B06 mov eax, dword ptr ds:[esi]
0040D116 B9 08000000 mov ecx, 8
0040D11B 8B148E mov edx, dword ptr ds:[esi+ecx*4]
0040D11E 3353 28 xor edx, dword ptr ds:[ebx+28]
0040D121 0353 14 add edx, dword ptr ds:[ebx+14]
0040D124 3302 xor eax, dword ptr ds:[edx]
0040D126 ^ E2 F3 loopd short testcon.0040D11B
0040D128 8943 50 mov dword ptr ds:[ebx+50], eax
0040D12B 8B43 4C mov eax, dword ptr ds:[ebx+4C]
0040D12E 8B4B 50 mov ecx, dword ptr ds:[ebx+50]
0040D131 890C03 mov dword ptr ds:[ebx+eax], ecx
0040D134 8D75 00 lea esi, dword ptr ss:[ebp]
0040D137 8B06 mov eax, dword ptr ds:[esi]
0040D139 B9 08000000 mov ecx, 8
0040D13E 8B148E mov edx, dword ptr ds:[esi+ecx*4]
0040D141 3353 28 xor edx, dword ptr ds:[ebx+28]
0040D144 0353 14 add edx, dword ptr ds:[ebx+14]
0040D147 3302 xor eax, dword ptr ds:[edx]
0040D149 ^ E2 F3 loopd short testcon.0040D13E
0040D14B 8943 24 mov dword ptr ds:[ebx+24], eax
0040D14E 8D75 48 lea esi, dword ptr ss:[ebp+48]
0040D151 8B06 mov eax, dword ptr ds:[esi]
0040D153 B9 08000000 mov ecx, 8
0040D158 8B148E mov edx, dword ptr ds:[esi+ecx*4]
0040D15B 3353 28 xor edx, dword ptr ds:[ebx+28]
0040D15E 0353 14 add edx, dword ptr ds:[ebx+14]
0040D161 3302 xor eax, dword ptr ds:[edx]
0040D163 ^ E2 F3 loopd short testcon.0040D158
0040D165 50 push eax
0040D166 8D75 24 lea esi, dword ptr ss:[ebp+24]
0040D169 8B06 mov eax, dword ptr ds:[esi]
0040D16B B9 08000000 mov ecx, 8
0040D170 8B148E mov edx, dword ptr ds:[esi+ecx*4]
0040D173 3353 28 xor edx, dword ptr ds:[ebx+28]
0040D176 0353 14 add edx, dword ptr ds:[ebx+14]
0040D179 3302 xor eax, dword ptr ds:[edx]
0040D17B ^ E2 F3 loopd short testcon.0040D170
0040D17D 8F43 28 pop dword ptr ds:[ebx+28]
0040D180 0343 14 add eax, dword ptr ds:[ebx+14]
0040D183 FFE0 jmp eax
but again, then you just need to make this basic opcode set patch safe (crcing an backup, or completly remove crcing), thats why i switched to executable instructions, wich are harder to retrieve from the vm, esp. when they are encrypted (yes, i failed here too: http://woodmann.com/forum/attachment.php?attachmentid=1572&d=1170436383)
b3n: try switching your project to multibyte, or if you use coco, you can change the parser/lexer code templates. they are in parser.frame and scanner.frame
Fh_prg
September 18th, 2009, 17:48
hello , how we can use this source code to protect a sample app with it's VM ?
0rp
November 1st, 2009, 07:49
you cant protect x86 code with it. you have to write your secret code with the vm-script language and compile it to vm
(dont use it for serious business, because its too weak)
Fh_prg
November 1st, 2009, 09:04
Thank you so much.
Powered by vBulletin® Version 4.2.2 Copyright © 2018 vBulletin Solutions, Inc. All rights reserved.