Reversing VMs [Archive] - RCE Messageboard's Regroupment

View Full Version : Reversing VMs

Maximus

July 15th, 2006, 13:00

Hi,
Last night I wrote this tute, as I received requests (i.e. foreigner) on how VMs are analysed. It is not complete, don't know if I wish to do it

It is based on a crackme at crackmes.de, but it does not explain much of it, so that the reversing such crackme will not be really affected.

I apologise for the style, but was very late and I was writing in a flow manner, so it's not refined.

@Zairon: if you think this would compromise cm, feel free to remove. But I don't think so.

Regards,
Maximus

dELTA

July 15th, 2006, 17:31

Thanks for your contribution.

JMI

July 15th, 2006, 17:38

Yup. This is a topic which could use more tutorials and/or information as it gains in popularity.

Regards,

ZaiRoN

July 16th, 2006, 06:11

Wow, really interesting article!!!

Here is the link of the target used in the tutorial:
http://www.crackmes.de/users/thehyper/hyperunpackme2/download

Maximus

July 16th, 2006, 10:15

Thanks! When I'll have time, I might reshape it in a more formal way and fixing the few glitches in it -eventually adding samples from harder stuff (commercial protectors?

).
However, my goal was making something that could introduce to the VM analysis, as nothing good is around (apart a short paper on SF's VM of Yates, that explains really little about VM attacks, being rather a description of a morphed opcode).

Regards,
Maximus

Polaris

July 16th, 2006, 11:12

Great job man, really interesting stuff!

JMI

July 16th, 2006, 13:40

Zairon:

I got a error with:

http://www.crackmes.de/users/thehyper/hyperunpackme2/download

but this one gets the page and the download link there works correctly;

http://www.crackmes.de/users/thehyper/hyperunpackme2/

Just a heads-up, if anyone is having similiar problems and wants the unpackme.

Regards,

5aLIVE

July 17th, 2006, 12:26

That's an excellent article you have written Maximus. I have often heard mention of virtual machines being adopted in commercial protections and was curious about what they are and how they work.

You have certainly helped fill a void by wrting this tutorial as they are far from common. I look forward to reading more from you in relation to their use in commercial protectors.

Could you perhaps help me understand what is meant by a binded flow VM, from yates tute he describes it as a op codes data which cannot be examined as it appears to be dynamic.

It then goes on to say that the structure of the opcode data is so large to be unfeasible to step through it and reverse it, the trouble is that every time you trace over each instruction the data will be extracted in different ways.
He suggest the the only feasible way to analyse this is by using heuristics and thats it. Can you elaborate on this or should I ask Yates?

I'm probably getting in way over my head, but I though I'd ask and take the time acknowledge your efforts at the same time. I think I'll read through your tute again just to refresh my memory.

5aLIVE

countryman

July 17th, 2006, 19:19

this is good tut.

gabri3l

July 17th, 2006, 20:38

I agree, a well written article about something that is not too often covered. Good work maximus, hoping to see more.

Woodmann

July 17th, 2006, 21:57

Howdy,

maximus, a most excellent work.

Please continue your efforts in this area of research.
You have given us (the community) a great work.

Woodmann

OHPen

July 18th, 2006, 06:30

Aloha,

i agree to my reverse engineering colleagues in all points. A fine work. It's nice to see other people aroung working activly on vms

OHP.

Maximus

July 18th, 2006, 11:29

Well, really thank you! I didn't honestly expect so much good feedback on it!
It seems I have to make an _interesting_ sequel, someday...

Thanks for your attention!

Regards,
Maximus

Silver

July 19th, 2006, 08:27

5Alive, I have the same question actually. I assume you saw that in Yates' starforce doc? I've done some work on VM's and even coded my own but I've never come across the phrase "binded flow". It might be he's created the term to describe something, and it's commonly known as something else.

In principle at least I can see what he means, a large VM that supports a large set of instructions with variable operands would be very difficult to understand. I did a little exercise a while ago creating a very simple crackme in my scripting language. Tracing script execution through the VM was awkward, even without me adding any protection code to it and knowing what my own code did!

Excellent doc Maximus.

Maximus

July 19th, 2006, 16:56

SF does not have a 'decoder', all the VM logic is instrinsic in opcodes. Let's try to give a look together to Yates paper on such SF version:
You can see that each VM instruction contains current and next opcode, where 'flow' is arranged by a simple jump to another VM instruction.
Opcodes are encrypted using dynamic keys, which changes at each instruction (check the [edi+24h] usage: it is the key for decrypting next opcode, contained within the first opcode).
let's remove complexities and make the instruction this way:

Code:



Instruction:

    byte Current_opcode;

    byte Next_opcode_to_be_masked_with_accumulator;

    byte Next_opcode_accumulator_mask;

    byte Current_operand_1;

    byte Current_operand_2;

-----

VM_Context

    byte Opcode_mask_accumulator;

-----

function get_next_opcode()

    Opcode_mask_accumulator ::= Opcode_mask_accumulator xor Next_opcode_accumulator_mask;

    next_opcode ::= Next_opcode_to_be_masked_with_accumulator xor Opcode_mask_accumulator;

endfunction

Now think: we get next opcode to jump at by xoring "Next_opcode_to_be_masked_with_accumulator" with "Opcode_mask_accumulator".
Let's suppose we accumulate the "Next_opcode_accumulator_mask" within "Opcode_mask_accumulator", as they names i gave means.
So, ForEachInstruction:
What might happen if, a certain point, we jump again to that instruction? What might be the next instruction after it?
This should be what Yates called 'EIP Stream' (also note that nothing prevent eip flows to 'cross', like for x86 opcodes -if not better).
Same for 'Data Stream', but related to instruction's operands, so instructions can i.e. stay the same but their data changes.

the hell, I want to see Ultima recompiler.
(edit: sorry JMI, ff went mad on 'save' button

so I reopened in opera ...windows hate me)

Silver

July 20th, 2006, 05:17

Ahhh, that's smart, I see how that works. Thanks Maximus. Two follow-up questions:

Wouldn't that process be incredibly slow in a VM, as you're increasing the number of real instructions (VM code to CPU) needed to process one VM instruction?

What's the theory behind generating that VM bytecode with the language compiler? I can see how it works, I just can't see how the compiler gets from your script code to the final bytecode with this kind of flow, especially if the flow crosses back as you mentioned. The bit I'm having trouble with is, if every instruction contains a JMP to the next instruction, you're effectively making every single instruction re-entrant, so how can you maintain the accumulator mask correctly? If the code only has one logic flow then sure, it's just a matter of working backwards, but if there are multiple logic branches then you've got a precedence issue and everything...

Maximus

July 22nd, 2006, 09:40

eheh, I'm trying to understand&replicate the thing in code

5aLIVE

August 22nd, 2006, 10:38

An exhaustive analysis of the hyperunpackme2 crackme was posted a few days ago by Rolf Rolles of SABRE Security.

It looks to be a excellent compliment to the tutorial written by Maximus.

Regards 5aLIVE.