Question concerning .NET file format... [Archive] - RCE Messageboard's Regroupment

View Full Version : Question concerning .NET file format...

rendari

March 7th, 2008, 16:43

Quote:

As you can see, some numbers are missing, that's because some tables, as I said before, are not defined yet. It's important you understand how the tables are stored. A table is made of an array of rows; a row is a structure (let's call it this way for the moment to make things easier). After the rows of a given table end, the rows of the next table follow. The problem with a row (remember, think of it like a structure) is that some of its fields aren't always of the same size and they change from assembly to assembly, so you have to calculate them dynamically. For example, I talked about the HeapOffsetSizes field and how it tells us the size that indexes into the "#String", "#GUID" and "#Blob" streams will have; this means if I have in a structure of one of these tables an index into the "#String" stream, its size is determined by HeapOffsetSizes, and so it could be a word or a dword. Of course that's not the only kind of index that can change of size, there are others. A very simple one to calculate is a direct index into another table. For example, the first element of a NestedClass row is an index into the TypeDef table, the size of this index depends on how much rows the TypeDef table counts: if the rows are > 0xFFFF, a dword is necessary to store the number, otherwise a word will do the job. The remaining indexes are the most annoying, they can index into a table or another. The Microsoft documentation is not so clear about this (at all), so I'll try to explain it in an easy way. Let's consider the TypeDefOrRefIndex, this is a kind of index that can either reference a row in the TypeRef table, in the TypeDef table or in the TypeSpec table. The low bits of the value tell us which table is being indexed and the remaining bits represent the actual index; since the choice is between 3 tables, it only takes 2 bits to encode the table for this kind of index. So if we have a word and the 2 low bits are reserved to encode the table that is being indexed, the remaining 14 bits can index a row in one of the three tables, but what if one of those 3 tables has more rows than a value of 14 bits can encode? Well, then a dword is needed. So, to get the size of an index like this it's necessary to compare the rows of each table it can reference, get the table with the biggest number of rows and then see if this number fits into the remaining bits of a word, if not, a dword is required. I paste you from the SDK the list of this kind of indexes and the values to encode the tables for each index type (which is the "Tag" column):

http://ntcore.com/Files/dotnetformat.htm

I've been reading the above text from the article Daniel Pistelli wrote, and have spent the last couple of hours trying to understand it. So, how exactly do you calculate the size of an index pointing to a table? If the number of rows in the table is >FFFF, then all indexes into that table are DWORDs instead of words, right? So, if for example there are 0x11001 methods in a .NET exe (wow, is that even possible? =/) then all indexes going into MethodRef will be DWORDs, right? But doesn't that then make all the other Metadata tables referencing to MethodRef bigger, also? So in the end you have one big mess???

Its so confusing

Furthermore, do that sizes of the tables change from assembly to assembly? Like, will the rows of the Module metadata table always be 5 WORDs and the rows of the TypeRef metadata table always 3 WORDs? Or, can that change if the assembly is big enough? =/

Daniel Pistelli

March 7th, 2008, 21:45

Ciao rendari. I'll try to explain this briefly here. Be a bit patient it's 4 am and I have a really bad headache.

One thing are string or blob or guid indexes. These are easy to calc from the HeapOffsetSizes field. A totally different thing are table indexes. A table index can be of different types: TypeDefOrRef, MethodDefOrRef etc etc. They are all listed in the article. In the text you pasted above I take for example the TypeDefOrRef index. This kind of index can reference 3 different tables. The first 2 bits of the index tell you the WHICH table they reference (either TypeDef, TypeRef or TypeSpec) and the other bits represent the indexed row of that table. Let's make a real word example. Let's say I have a TypeDefOrRef index. Let's see how this index work:

TypeDefOrRef: 2 bits to encode tag

TypeDef 0
TypeRef 1
TypeSpec 2

Ok, 2 bits to encode. Let's say i want to referece the row 33 of a TypeRef table. How do I create the index?

Index = (33 << 2) | 1;

So, if I use a WORD to index this row I will have 2 bits for the table and 14 bits for the row. If the row can't be encoded in 14 bits because the number is too high, then I'll need a DWORD. Of course, then all TypeDefOrRef indexes will be DWORDs.

I hope this clears up your doubts. I know it's not as easy as normal PE stuff, but you should've read Microsoft specs about this.. Dreadful!

rendari

March 8th, 2008, 01:27

Yeah thanks man, I figured it out myself through rereading the text and general trial and error. Man, I really feel sorry for all the crap you must've put up with in CFF explorer. I'm only doing 1% of that and I already have a headache here

BTW, where are the microsoft specs. I looked it up on MSDN, but no go there. Or am I not searching correctly? =/

Daniel Pistelli

March 8th, 2008, 08:18

Well, fortunately, the CFF Explorer was structured to read the .NET format from the beginning. So, I didn't have to adapt the code. Other PE editors which came before the .NET have a hard time keeping up, because the typical PE code is not suitable to read .NET data. They have to write a part of code which stands apart from the rest and this is a bit ugly. The CFF Explorer was written for the .NET format mainly. It then evolved to a complete PE editor. However, the next step will be making a multi file format editor out of it which supports every format =).

As for the specs, you can find all docs in you .NET SDK dir under the dir \Tools\Docs if I remember correctly. Anyway, what you need (the file format specs) are this doc:

http://jilc.sourceforge.net/ecma_p2_cil.shtml

Unfortunately this doc is not the official ms one which I can't find at the moment. You should get the latest version of this doc (dated 2006) which has many more info than the preceding version. Also, one or two tables more are documented in the last doc.

If you can't find the docs on your HD, look for "Partition II Metadata.doc".

rendari

March 8th, 2008, 12:33

Alright, thanks a lot Daniel

OHPen

March 10th, 2008, 04:17

@rendari: the official cli format documentation can be found on the ecma page.

http://www.ecma-international.org

Search for the Paper ECMA-335 or download it here

http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-335.pdf

You don't need more. But one hint if you really want to get into the .net format. not everything is described in the ecma document. microsoft has some specially encoding / decoding which is not document so far.
For example if you try to read reflector format you will get in troubles by applying the documented table encoding/decoding. you will have to figure that out by our own.

i recently wrote a .net file format reader for protection purposes and i can tell you that it is not little work. anyway it's an interesting topic, so keep on.

regards,

OHPen aka PAPiLLiON

Daniel Pistelli

March 10th, 2008, 05:29

Well, I wrote a .NET re-compiler, so I even know the troubles (MANY) of building an assembly from scratch (without using any framework API). There are many rules to follow in order to create a working assembly, but they're not written in the specs. This was something which had nothing to do with the CFF Explorer, but the in the next version I'll mix the work and will give the user the possibility to add tables, change the size of streams, add methods etc. To do this there's only one (clean) way: rebuild the assembly from scratch. As for the reflector, I don't know exactly what you mean, it doesn't seem to me there are additional / strange tables in the metadata. Not even the GenericParam which was someway missing in the 2005 specs, but present in the 2006 one.

Ciao,

Daniel

OHPen

March 10th, 2008, 09:23

@daniel: i can not go into detail but we had problems parsing the whole tables of reflector using the deconding rules in the specs. maybe we did someting wrong, but probably not. a collegue of mine discovered the issue and therefore i do not have more details atm.

but it would be interesting for me if you can provide a full dump of actual reflectors tables and structure in txt file format if possible.

have you ever tried to dump all information of reflector ?

regards,

OHPen

Daniel Pistelli

March 10th, 2008, 14:53

Well, to dump a text file I'd need to produce extra code right now. But you can view the tables in the CFF Explorer. I can see no problem in those table. Also because, if there was a problem in a table, it would reflect on the tables following that table as well. And even the assemblyref table (one of the last tables) looks perfectly ok to me. So, if you have discovered a problem confront your results with the ones produced by the CFF Explorer and we will dig further...

Ciao,

Daniel

OHPen

March 12th, 2008, 04:17

@daniel: i talked to my collegue and he told me that he already solved his problem after reading "your" .net document

So it seems that there isn't something missing in the ecma doc instead it is simply described in a bad way.

regards,

OHPen

Daniel Pistelli

March 12th, 2008, 13:27

I agree that the ecma specs are not crystal clear, quite the opposite. That's why I wrote the .NET format article.

Ciao,

Daniel