Log in

View Full Version : I need to find compression type (Data files)


Aquatic
February 4th, 2004, 16:08
I have a game that I play, and I want to be able to extract and decompress the individual files within the large data files for modding purposes.

MrMouse from Xantax already helped me with the extraction part, so I can extract the individual files alright. The problem is that they are still compressed, and therefore unitelligable, and there is no way to make sense of them.

I heard that companies usually don't make their own compression for their data files, but instead they use some pre-made open src compressors. If I can just identify the method being used, then I could maybe decompress the files.

sgdt
February 4th, 2004, 18:39
Quote:
I have a game that I play, and I want to be able to extract and decompress the individual files within the large data files for modding purposes.


You will need to, if you modify, recompress. If they are using a propriatary compression, this may be very difficult.

Assuming it's propriatary, you can get the games decompression routines quite easily. Load up Intel's VTune or AMD CodeAnalyst, run the game while sampling (stop sampling soon after decompression), and voila, the interesting parts will be pointed out.

Load up IDA and look for places calling those locations and you'll have the outer loop of their decompressor.

Then, load up OllyDbg, and after decompression set up a patch to dump files.

If you believe it's a "open src compressor" (say, GNU zip or compress), you will probably see a GNU style copyright notice very close to the module they linked in. Same is usally true for 3rd party compressors. It might be a good idea to look thru the executable.

Aquatic
February 4th, 2004, 21:14
I know that the game reads from these data files at the character customization screen. Then when you save your charater it outputs a file with the same compression. So all the decompression and compression happens in the character creation part of the program. I saw this with Filemon.

For the part where it creates the character file, I set a Bp on Writefile, and then I checked the ASCII of the lpBuffer address, and it showed exactly what was written to the file, I confirmed this by opening the file in Winhex. The problem was that the ASCII in the lpBuffer had already been compressed.

To bad there isn't a function I can bp on that has a parameter like "Data to be compressed", or "Data to be uncompressed".

Anyway, that's as close as I got.

dELTA
February 4th, 2004, 23:28
Disassemble the program and check out the code near the location where the WriteFile function is called from. Trace the buffer backwards from there in the disassembly, and it is very likely that you find your compression function quite quickly. If you don't find the decompression function at the same time (they are quite likely to be placed very near each other in the asm code) do the same with the ReadFile function for the equivalent read operation, and you should be able to find it too (but you will of course have to trace the buffer forwards after the ReadFile operation instead).

sgdt
February 5th, 2004, 00:16
You can also use hardware read breakpoints on the data read to assist, if the code is long winded. Most likely, you *will* find routines that take a couple parameters like "data to be compressed" and "buffer to put compressed data to be written" (and vice versa for the read routine). Despite all the marketing hype, there's seldom much magic to a programs innards.

Aimless
February 5th, 2004, 00:19
Of course, you may want to rename the compressed files as .ZIP and try to decompress them with Winzip before trying anything else, as most of the compressed files (hello Quake, Quake II, Quake III, Unreal Tournamet..et al) are simply packed with the zip alogrithm.

Have Phun

Aquatic
February 5th, 2004, 00:46
Quote:
[Originally Posted by Aimless]Of course, you may want to rename the compressed files as .ZIP and try to decompress them with Winzip before trying anything else, as most of the compressed files (hello Quake, Quake II, Quake III, Unreal Tournamet..et al) are simply packed with the zip alogrithm.

Have Phun


Do I need Winzip specifically? Or can I use Winrar?

It would be bizarre if this actually worked, but I am skeptical.

Note: *edit* it appears some files have diff characters.

Aimless
February 5th, 2004, 03:25
Basically, any util that uncompresses .zip files.

Have Phun

Aimless
February 5th, 2004, 03:52
Just what is this game, btw?

Have Phun,

Aquatic
February 6th, 2004, 20:23
How do I trace the buffer forward/backward?

I'm using Olly.

Also, how will I know where the decompression code starts and finnishes?

I will maybe post some of the code.

dELTA
February 7th, 2004, 09:17
Debuggers aren't nearly as good as disassemblers for analyzing code, use the right tool for the right task. Disassemble it in IDA, and then you can view where the value that is pushed as a buffer parameter for the read/write calls come from, and where it goes afterwards. If that same buffer is pushed as a parameter to another function, there is a high probability that this function compresses/decompresses it.

Aquatic
July 3rd, 2004, 15:01
I want to follow up on this cause I still haven't done it.

Are you saying to use a chart?

If I do find the decompression/compression code, then how can compile it?

I was thinking of a small Radasm app that takes a compresed file and outputs it for you in a decompressed state, and can then recompress it. But the routines that I dump will be raw ASM, so I don't know if it will compile in MASM.

Aquatic
July 3rd, 2004, 16:56
When I BP on WriteFile in Olly it goes to some address starting with '77'. IDA doesn't have addresses that high.

So how will the code even be in IDA?

I can attatch the process with Olly, but it crashes in IDAs debugger.

Aquatic
July 4th, 2004, 17:05
Sorry that means I was in kernel32.dll

Anyway, I have had some success. I managed to go back through the code all the way to the point where clicking the 'save button' no longer triggers a breakpoint on the code.

So should I start at the line where the first breakpoint is triggered up to the call to Writefile? Somewhere in there should be the compression code.

I did see a lot of code that looked weird, and so it is likely that is the compression routine. The compression may span over several routines, I had to go back through about 7 or so till I got to the first line that is triggered by the saving.


I also noticed that over time the address of the buffer will randomly change, so I have to BP on Writefile again to get the new addy.


I will post the code from the first line that breaks up to the writefile. Maybe you guys can help me identify the code that I need to dump.

shadz
July 5th, 2004, 05:41
If its a commercial compression algo chances are there is going to be a signature stream at the start of the file, so looking at the first 50h or so bytes of the compressed file and googling for any suspect textual data could quickly lead you in the right direction.

No need to pull out the heavy duty toolz until you've done your background work...

Just my 2c worth

-Shadz

Aquatic
July 5th, 2004, 18:22
^ no google results

Do you guys think that my logic is sound?

From the writefile I go back in the code to the first line that reacts to the clicking of the 'save' button. Because clicking the save button must initialize the process of getting the data ready to be written. right?

Seems logical.


I will look for similar code with 'Openfile'. (the reverse)

I can't seem to get IDA's flow-chart to show a routine and then make an arrow to an xref. It seems to only show a chart of code within a single routine. sure it has the xrefs charts, but those don't show the code.

doug
July 5th, 2004, 21:07
BOOL WriteFile(
HANDLE hFile,
LPCVOID lpBuffer,
DWORD nNumberOfBytesToWrite,
LPDWORD lpNumberOfBytesWritten,
LPOVERLAPPED lpOverlapped
);

Your first thought should be: where does this lpBuffer get modified.
Check the Cross-references to lpBuffer.

Leave the flow-chart bulllshit alone, and start hunting down every reference to that buffer using a disassembler.

Furthermore, is this buffer within the .data section of the game? Probably not. Then it must have been allocated "somehow". Is lpBuffer entirely on the stack? Is it on the stack, but pointing to an allocated memory slot ? was it allocated using VirtualAlloc/malloc/GlobalAlloc?

---
I didn't follow this thread from the beginning, but if you're interested in unpacking data files.. I would not look for file writing, but file reading. Just watch it unpack its own data, it's more than likely the same compression/decompression algorithm.

Aquatic
July 5th, 2004, 21:35
well it works like this.


Big Data file(resource)--------->EXE---------->small data file(your character)


The data in the big data file is compressed in the same way as the data in the small data file.

The EXE chooses the compressed data that it needs from the big data file and then outputs it to the small data file (dependeing on how you tweak your character).

I'm worried that the EXE doesn't even need to decompress the data, it just grabs from the big data file, then prepares an output for the small data file to be written. Maybe it just knows which offsets to grab from.

doug
July 6th, 2004, 07:42
At some point, if it wants to display it to you, it's going to have to decompress it.

About your character file, I doubt it's data copied from a bigger data file, that would be a big waste. It would make more sense if your character file was just references (offsets as you mentionned) into the big data file. That doesn't change the fact that it could be compressed, but if what you want is to extract resource, I'd look elsewhere.

Look for what happens after a new handle has been opened to this big data file.

Aquatic
July 6th, 2004, 12:47
Quote:
[Originally Posted by doug]At some point, if it wants to display it to you, it's going to have to decompress it.


Well it doesn't display it to you 'as such'. You are presented with a point&click/dag$drop GUI interface where you drag things onto your character or tick specific boxes to give them attributes...etc. Each time I edit a feature for my character like this, Filemon shows a READ from the big data file, and then I watch what is being read at the Readfile buffer in Olly.


Do you think that all the compression/decompression functions could all be in one of the game's Dlls? If so, then I could just use this Dll in my own app.

I will try your advice though.

shadz
July 7th, 2004, 14:45
If the datafile is large, there will most likely be a single call to CreateFileA() to open the file (handle being stashed away somewhere) followed by an initial ReadFile() which may validate headers and stuff before going into some kind of read/write mode as items get read out or written back.

The header is a good starting point so you may want to bpx on ReadFile and verify what gets read to be whats known to be at the datafile in question. You can then xref this address to your disassembly and start looking for associated code to parse the header.

Alternativelly, to avoid the overheads of constant seeks when accessing random items within the datafile, the file (or parts of it) could have been mapped using MapViewOfFile(), effectively mapping the file to an address range that can be traversed like normal memory.

Again, you would want to catch where the initial file header is acquired and parsed, with the decompression routines most likely near by.

As for all this being in a dll - good probability. You should be able to verify this by traversing the call-stack after you have located a ReadFile operation with the data in question and seeing if the call originated from a dll or the main binary.

-Shadz