Log in

View Full Version : How are C strings accessed???


rebible
May 5th, 2012, 21:48
I am trying to find the offset of some code. I can see the output string tables that the compiler left. How do C compilers normally access the strings.

thanks
robert

disavowed
May 5th, 2012, 23:14
C compilers/linkers parse strings in source code and put them into data sections as multi-byte or wide-char arrays, depending on the source code and compiler options.

rebible
May 6th, 2012, 21:26
I don't understand something. People seem to use strings to find the code offsets. If the compiler just uses the string locations in-line, I am not sure I see anyway to use them to your advantage.

I was hoping that there would be an index table or ...


thanks
robert

owl
May 10th, 2012, 09:02
I think what you want is to look at the strings in order to find a msg or hardcoded pwd, URLs, or license key. In IDA, go to the strings tab. In Ollydbg, right-mouse-click on the main window (code), pick "search for" ->"All reference text strings".

TBone
May 10th, 2012, 14:50
To be a little more nuts-and-bolts, C references strings by their starting address. As Disavowed said, strings whose content is known at compile time are usually found in a/the data section. The program's code will be found in the code section (sometimes called the text section, confusingly enough). When the code needs to access a string, it will reference its address one way or another.

A picture is worth 1000 words. Here's a bit of disassembly from IDA free of "dia-app.dll" from the windows binary of the open source diagramming tool, Dia (http://projects.gnome.org/dia/). Suppose I want to find where in the program it displays the "About" dialogue. One of the things it displays there is "A program for drawing structured diagrams." If you take a look in the .rdata section, you'll find this string:

Code:
.rdata:10042334 aComments db 'comments',0 ; DATA XREF: sub_100049AA+CDo
.rdata:1004233D align 10h
.rdata:10042340 aAProgramForDra db 'A program for drawing structured diagrams.',0
.rdata:10042340 ; DATA XREF: sub_100049AA+BFo
.rdata:1004236B align 4

And the corresponding hex dump:
Code:
.rdata:10042330 44 69 61 00 63 6F 6D 6D 65 6E 74 73 00 00 00 00 Dia.comments....
.rdata:10042340 41 20 70 72 6F 67 72 61 6D 20 66 6F 72 20 64 72 A program for dr
.rdata:10042350 61 77 69 6E 67 20 73 74 72 75 63 74 75 72 65 64 awing structured
.rdata:10042360 20 64 69 61 67 72 61 6D 73 2E 00 00 63 6F 70 79 diagrams...copy


The string starts at virtual address 0x10032340. IDA is kind enough to inform us that there is a reference to this string somewhere in a subroutine located at 0x100049AA. Sure enough, if we follow that reference we find this in the .text (code) section:

Code:

.text:10004A3C push offset aTranslatorCred ; "translator-credits"
.text:10004A41 push offset off_1005F3D0
.text:10004A46 push offset aDocumenters ; "documenters"
.text:10004A4B push offset off_1005F330
.text:10004A50 push offset aAuthors ; "authors"
.text:10004A55 push offset aHttpLive_gnome ; "http://live.gnome.org/Dia"
.text:10004A5A push offset aWebsite ; "website"
.text:10004A5F push offset aC19982009TheFr ; "(C) 1998-2009 The Free Software Foundat"...
.text:10004A64 push offset aCopyright ; "copyright"
.text:10004A69 push offset aAProgramForDra ; "A program for drawing structured diagra"...
.text:10004A6E call libintl_gettext
.text:10004A73 add esp, 4
.text:10004A76 push eax
.text:10004A77 push offset aComments ; "comments"
.text:10004A7C push offset a0_97_2 ; "0.97.2"
.text:10004A81 push offset aVersion ; "version"
.text:10004A86 push offset aDia_0 ; "Dia"
.text:10004A8B push offset aName ; "name"
.text:10004A90 mov ecx, [ebp+var_C]
.text:10004A93 push ecx
.text:10004A94 push offset aLogo ; "logo"
.text:10004A99 push 0
.text:10004A9B call gtk_show_about_dialog

Yep, that looks like code for showing the About dialog, all right.

In this case, the address of the "A program for drawing..." string is pushed onto the stack as an argument to function call to the libintl_gettext function, which obviously does something with text. My guess from the name is that Dia is an "internationalized" program, and this function fetches translated strings for whatever language you happen to using, if not English.

Anyway, if we "zoom in" on that push, you can see exactly how this reference works:

Code:
.text:10004A60 78 23 04 10 68 6C 23 04 10 68 40 23 04 10 E8 65 x#hl#h@#Fe
.text:10004A70 AA 03 00 83 C4 04 50 68 34 23 04 10 68 CC 19 04 ¬.â-Ph4#h¦


Starting at address 0x1004A69 we have this instruction:
Code:
68 40 23 04 10

The first byte (0x68) is an x86 opcode that means to push a 4-byte value on the stack. The following 4 bytes are the data pushed onto the stack. The contain the address:
0x10042340 (Intel is big-endian). Scroll back and you'll see this is the starting address of the "A program for drawing...".

Olly's disassembly looks a little different from IDA's, but the information is the same. Finding key-checking code by following the strings uses the same logic. If the program displays a message like "Program registered!" or "Bad key. Try again.", these strings will probably be reference immediately after the code checks whether the entered key is valid. It isn't always quite that straight-forward, but in a lot of cases, it will at least get you in the right neighborhood.

proc-self-maps
May 13th, 2012, 10:50
Quote:
[Originally Posted by TBone;92544]Starting at address 0x1004A69 we have this instruction:
Code:
68 40 23 04 10

The first byte (0x68) is an x86 opcode that means to push a 4-byte value on the stack. The following 4 bytes are the data pushed onto the stack. The contain the address:
0x10042340 (Intel is big-endian).


Minor nitpick: x86 is little-endian, and that is a little-endian representation of 0x10042340.

Civa
June 24th, 2012, 07:01
Quote:
The string starts at virtual address 0x10032340.


There is typo, address should be 0x10042340.