Log in

View Full Version : How to find a string in a .NET assembly (manually)


are
November 7th, 2010, 12:42
Hey, I have a fundamental that I've never looked into. I was wondering how to locate a string in a .NET 'binary' using only a hex editor (and presumably some kind of decryption algorithm to reference).

Consider (c#):
Code:

blaa...
MessageBox.Show("Look for this text";
blaa...


When you look at the compilation in reflector, reflector is able to output the C# code and display the text along with it, but how does it do this?

Kurapica
November 7th, 2010, 16:26
you must have a good background about MSIL, your simple message looks like this in MSIL :

Code:
.method private instance void Button1_Click(object sender, class [mscorlib]System.EventArgs e) cil managed
{
.maxstack 8
L_0000: ldstr "Hello World !"
L_0005: ldc.i4.0
L_0006: ldnull
L_0007: call valuetype [Microsoft.VisualBasic]Microsoft.VisualBasic.MsgBoxResult [Microsoft.VisualBasic]Microsoft.VisualBasic.Interaction::MsgBox(object, valuetype [Microsoft.VisualBasic]Microsoft.VisualBasic.MsgBoxStyle, object)
L_000c: pop
L_000d: ret
}


looking at Line 0000 you can see the instruction that pushes the string on the stack

Code:
L_0000: ldstr "Hello World !"


the disassembly for the entire procedure looks like this :

Code:

L_00000000: ldstr 0x700000D5
L_00000005: ldc.i4.0
L_00000006: ldnull
L_00000007: call 0x0A000050
L_0000000C: pop
L_0000000D: ret


the ldstr instruction simply pushes a string object using the token value which is

"0x700000D5" in this case.

these string are usually stored in the "US" stream inside the assembly metadata

are
November 7th, 2010, 18:07
Thanks =D
I have a loose grasp of MSIL thx to win32 asm (I did something awesome once with a very low quality .NET quiz application), but I was clueless about metadata and their streams until your informative post. Thanks for your help on this, I appreciate it.

are
November 16th, 2010, 15:19
Quote:
[Originally Posted by are;88138]I was wondering how to locate a string in a .NET 'binary' using only a hex editor (and presumably some kind of decryption algorithm to reference).


Oops, I feel dumb now lol. I was under the false impression that .NET encrypted all its strings automatically (idk why I thought that). It seems as though they were there in unicode all along. But now, they're being referenced as "ldstr 0x700000D5." That 0x700000D5 doesn't correspond to the real position of the string within the feel (I mean, when you look at the address with a hex editor, you find something else, say 0x409a4345, where ever it actually is).

I guess I'm still wondering after a little browsing, where do I look in the file to find where the #US stream begins without having to search through the file for the unicode string that's caught my eye. I was able to pull up a lot of info using ldasm.exe but I still can't figure out how that string section gets bound to the 0x70000000 address.

Kurapica
November 16th, 2010, 16:25
you could use google from time to time ?

Quote:
Metadata tokens identify both the stream and the location of the item in the stream. The top byte identifies the metadata table (one of the CorTokenType enumerated types documented in the corhdr.h). All of these tables except mdtString can be found in the #~ stream; the mdtString items are located in the #US stream. The lower three bytes of tokens for items in the #~ stream give the record ID (RID) of the item in the stream. In contrast, the lower three bytes of tokens for items in the #US stream are an offset from the beginning of the stream of the item. For example, the code:
Code:

Test* t = new Test;

String* str1 = S"Test1";

String* str2 = S"Test2";


will generate the following MSIL:

Code:
newobj instance void Test/* 02000003 */::.ctor() /* 06000008 */

stloc.2

ldstr "Test1" /* 70000001 */

stloc.1

ldstr "Test2" /* 7000000D */

stloc.0


The string Test1 is stored as the first item in the #US stream. (All streams are indexed from 1.) The string is stored as a Unicode string (0xa bytes long) prefixed with the length of the entire entry. Metadata uses a compressed format for the length of the string so that strings with a short length will use a single byte for the length, which is the case for the Test1 string: it has a length of 0x0b (0xa + 1). This layout means that the second string in the #US stream will be at location 0xd, which is the reason that the string Test2 has the token 0x7000000d (a top byte of 0x70 is a user string). Here is the actual data held in the #US stream:

Code:
71c8 00 00 00 00 00 0b 54 00 ......T.

71d0 65 00 73 00 74 00 31 00 e.s.t.1.

71d8 00 0b 54 00 65 00 73 00 ..T.e.s.

71e0 74 00 32 00 00 00 00 00 t.2.....


are
December 3rd, 2010, 21:26
Not quite satisfied, I did further digging in my free time and amidst an essay involving the topic. I have a picture that tell the story much faster, but can't upload atm =(

Part One:
(Irrelevant) Information about
.NET Strings and the #US Stream


The flow chart for how the memory location .70000001 is determined works like this.
First, the address .00402010 points to the MetadataRoot
Next, the MetadataRoot contains a stream header which points towards the #US stream (the "User String" stream)
The #US stream is an area that contains the string data we tend to be interested in. The file's #US location is mapped to .70000000

MetadataRoot
Let's begin by going over the intricacies of MetadataRoot. MetadataRoot is laid out as: (variable) irrelevant info + stream count + (variable) stream headers
StreamHeaders are laid out as: Offset, size, (variable) tag...
The offset of the #US stream points towards the token .7000001 which is interesting to us.

Because the irrelevant information has a variable size and the streamHeaders presumably have a variable size & layout as well, it is difficult to answer the question, Where is .70000000, but I will attempt to do so.


I've devised a flow chart for explaining where the string of a .NET assembly will be located. Assume numbers prefixed with '.' are addresses in hex. Addresses suffixed with (Xn) refers to a data block n bytes long beginning after the address. The suffix [I] would refer to a specific article of the data block. Consider the string pointer .70000001 (or rather simply 01, the first string)
(for -> read "leads to"

To find string 70000001...
.00402010(X2) -> .MetadataRoot
.MetadataRoot + 16 + m+x + 2 -> .StreamHeaderArray

.StreamHeaderArray(XN) -> AllStreamHeaders
AllStreamHeaders[I] -> .iStreamHeader

.iStreamHeader[offset] -> .70000001 note that: .iStreamHeader[BlockName] is iStream's Named Block


To reiterate our objective, we want iStreamHeader[offset] (within the #US Stream Header wherein the .70000001 address is indicated)
but first we logically need to find the .StreamHeaderArray, which is found by .MetadataRoot + 16 + m+x + 2
So we need to get
.MetadataRoot, found by .00402010(x2)
and m+x, found by V + (V -%4) But this is explained better down below. (hint: V is the Length of the .NET version string code (and + 1 for null), and -%4 refers to padding to 4 byte blocks).



Well, let's walk through the process in more elucidating detail.

MetaDataRoot...
.00402010: 4 byte offset pointing to MetadataRoot (read offset backwards)
.MetadataRoot = .00400000 + .00402010(x4) (thus)
Let's say for example .00402010(x4) = e8 23 ...then...
.MetadataRoot = say .004023E8

Great, that's the first step of the equation

StreamHeaders...
MetadataRoot contains a count of "streams" which we're somewhat interested in. We need to find the stream tagged with "#US" in Unicode which will also include a size chunk and a chunk pointing to where the strings section begins! Each stream header consists of 2 chunks plus n chunks that comprise the null terminated Unicode tag name... So the "#US" stream header actually consists of 12 bytes in total, whereas some other streams like "#strings" would consist of 2chk + ⌈ 9byte/4chk ⌉ or 6 chunks... 24bytes in all). (ref: ceiling function)


.StreamCount = .MetadataRoot + 16 + m+x + 2 |
...such that
.Streamcount is a 2byte count of the number of streams in the assembly, (the address to)
m is a block of chunks consisting of the .NET version in Unicode, and the
X being the padding of that block of chunks to ensure the bytes consumed by the version string are divisible by 4.



If the .NET version is "v4.0.30319" then m+x = 11+1 because... Let's try this notation, I find it handy atm and can't think of a better way to express this... If anyone can think of other, more concise ways, please share.

11 + (11 -% 4) is the same as 11 + -(11 mod 4) + 4 which happens to be the same as the ceiling function's notation 4byte/chk * ⌈ 11byte / 4chk ⌉ they all equal 12bytes

So with that discrete consideration behind us, the Unicode string plus the padding is explained as
m+x = V + (V -% 4)
x = (11 -% 4) = 1 bytes long
m+x = 11 + (11 -% 4) (don't forget the +1 for the null terminator of the string)
m+x = 12bytes

.StreamCount = .MetadataRoot + 16 + m+x + 2
.StreamCount = .MetadataRoot + 16 + 12 + 2
.StreamCount = .004023E8 + 30
.StreamCount = .00402406 ; remember there's x2 bytes of interesting data at that location

Let's just say that our Stream Count is five or
.StreamCount(x2) = 0x0500 or rather 0005 streams counted!

.MetadataRoot: contains a chunk of code containing the Unicode phrase "#US". The US stream is the stream that contains the theoretical .NET string that we're looking for. The phrase is found somewhere after .StreamCount+2 but before the end of the stream headers. End Of The Stream Headers is found by digging through each stream header...

Knowing that each stream header is simply: an Offset, size, and tag...

Code:
.CurrentHeader = .StreamCount + 2

Loop:
UnicodeTagI = .CurrentHeader + 8

.HeaderEnd = Read from UnicodeTagI Till 0x00
.HeaderEnd = .HeaderEnd + (TagLength -% 4)
Return the offset at .CurrentHeader(x4) if UnicodeTagI == "#US"
.CurrentHeader(x4) = .HeaderEnd
Jmp loop


...digging through as I said.

here's the white pages I referenced from the Partition II Metadata.doc ("http://download.microsoft.com/download/d/c/1/dc1b219f-3b11-4a05-9da3-2d0f98b20917/partition%20ii%20metadata.doc") See 24.2.1 and 24.2.2 especially.