Log in

View Full Version : Editing a CHM file "live".


5aLIVE
October 23rd, 2005, 05:22
Hi fellas,
I've got a compressed HTML eBook which I'd like to edit a little. The contents page contains working links to the respective chapters and sub sections. The niggle I have is the names of the chapters and sub sections don't have the expected names.

For example: Objects and Classes becomes ch06. All sub section from this chapter have the same name Objects and Classes.

Can anyone recommend a tool that will allow me to edit the HTML names and save in the CHM format without having to manually decrompress the .CHM to a folder rename the the HTML files and the recompress everything?

I've Googled for such a tool but haven't found anything that appears to do this. I'd be interested to hear what you guys recommend.

I found a shareware tool called Pocket CHM that claims to do the job. I'll give it a go and post back my findings.

UPDATE. Looks like a nice little tool, shame it won't let me compile a file with more than 10 topics during evaluation. The search continues...

Thanks,
5aLIVE.

Admiral
October 23rd, 2005, 07:37
"During evaluation"?
Surely you know which board you're on. If you need to use the cripple-free version just this once, there's surely no guilt in putting your RCE skills to the test.

5aLIVE
October 23rd, 2005, 07:52
Hi Admiral, yes I know which board I am on I'm aware that some of you guys are developers so I thought some you might have previos experience in creating help files.

I may look at this if I cannot find an unrestricted alternative. I don't want to get distracted too much by this you see.

I only planned on using a tool once or twice to better organise my eBook. Besides, I have other projects on the boil which demand my time.

Thanks

LLXX
October 23rd, 2005, 16:55
This is a bit like editing PDFs - the file format wasn't designed to facilitate editing in the first place. I've briefly read over the documentation for the file format, and it seems that all the content is compressed. That software mentioned must decompress the file first, if only temporarily, just so that it can be edited.

I've downloaded and installed it. Simple user+regkey scheme. Inspect the executable, and notice a lot of suspiciously-registration-key-looking hex strings in it. Hmm... "flate" compressed binary. Let's decompress and see what's in it...

A complete PE executable of approximately 1.2M that unfortunately doesn't run. Quite possibly a little live work with SoftIce on the reg dialog would land us in the algorithm. It does its regchecking at startup. If I had loaded SoftIce I'd check, but I'm not desperately wanting a challenge right now. A moderate-skill-level RCE exercise.

5aLIVE
October 24th, 2005, 04:38
Hi LLXX, Thanks for taking a look at this.
I see that some refs state the "C" in CHM means compressed and others compiled. Though both terms apply, I think the correct acronym is Compliled Help Manual. But I digress.

Yes I agree, the .CHM must be "decompressed" before it can be edited, i.e., "on-the-fly".

I see that the PE has the string "inflate 1.1.4" which is part of the ZLib library so packed as you say. I searched for a tool expecting to find a freely available compressor/decompressor like UPX but could find anything that look usable.

I know a little about unpacking but not enough to do it manually. How did you do it?
BP on .code section to reach OEP and dump?
: Ah! the SFX string in the header which I assume means "self-extracting" and I remember that Olly has an SFX Options Tab. I'll have a look at the manual and see what I can come up with.

Hmm. Purely by chance I noticed when you run the main exe it creates a hidden and uncompressed file at runtime and deletes it on closing the program. Trying to run it gives without the "father" gives an error saying instruction xxxx referenced memory at xxx that could not be read. So the "child" reads mem space from the "father" to do a check and runs if it is present. I've never tackled this type of target before. When I get a chance I'll have a look at this erroneous instruction in Olly and try and figure out what it is doing.

Any sugestions on how I might approach this are more than welcome


If I can get it unpacked and running, I can probably patch or sniff a serial.

Thanks LLXX.

Silver
October 24th, 2005, 06:07
Quote:
Yes I agree, the .CHM must be "decompressed" before it can be edited, i.e., "on-the-fly".


FWIW, the CHM format is a closed proprietary format that Microsoft won't disclose. The only way to manipulate it is through win32api, so all these different CHM programs are just glorified HTML editors that use the MS HTMLHELP functions. A few years back I was trying to read CHM files on a pocketpc which doesn't have the necessary API's - a few people have tried to reverse the format of the CHM but at that point they mostly had the basics and a lot of "unknown fields"...

5aLIVE
October 24th, 2005, 06:57
Hi Silver, I recently learned that it is a closed format.

I can decomplile the eBook easily enough and view all the images,html files and directories of the CHM.

Can you recommend an editor which would allow me to edit the chapter names and so on with the minimum of fuss? It appears it's not just a case of renaming folder and HTML file names as I suspected unless I am going about it the wrong way?

I have installed a copy of MS HTML Help Workshop which allows you to de/compile CHM files. I'm not sure how if it will be of any other utility beyond that though.
Is this all I really need to get the job done?
UPDATE: It looks like all I need to do is edit the .hhc file (table of contents), create a project file(.HHP) recompile and thats it.

Job done. Well nearly... clicking a thumbnail image should load a larger image but doesn't since the JPEG files haven't been added to the project file. The project wizard only prompts for index, TOC and html files NOT .jpg
A little help please


Thanks,
5aLIVE.

LLXX
October 25th, 2005, 03:35
Quote:
[Originally Posted by 5aLIVE]I see that the PE has the string "inflate 1.1.4" which is part of the ZLib library so packed as you say. I searched for a tool expecting to find a freely available compressor/decompressor like UPX but could find anything that look usable.

I know a little about unpacking but not enough to do it manually. How did you do it?
BP on .code section to reach OEP and dump?
: Ah! the SFX string in the header which I assume means "self-extracting" and I remember that Olly has an SFX Options Tab. I'll have a look at the manual and see what I can come up with.

Hmm. Purely by chance I noticed when you run the main exe it creates a hidden and uncompressed file at runtime and deletes it on closing the program. Trying to run it gives without the "father" gives an error saying instruction xxxx referenced memory at xxx that could not be read. So the "child" reads mem space from the "father" to do a check and runs if it is present. I've never tackled this type of target before. When I get a chance I'll have a look at this erroneous instruction in Olly and try and figure out what it is doing.

Any sugestions on how I might approach this are more than welcome


If I can get it unpacked and running, I can probably patch or sniff a serial.

Thanks LLXX.

Well, I've seen a lot of these "home-made" packers before, and they all seem to use some sort of standard compression algorithm, so I wrote a little skeletal program that can use various compression library DLLs to essentially compress or decompress raw data from files. It's also very useful for decompressing software installation packages too. If you'd rather not code your own, searching the Internet reveals "demonstration" programs for particular compression libraries, which often allow you to compress and decompress raw data.

As for the unpacking... let me tell you that I never touched it with a debugger at all. From reading the paragraph above I think you can already guess what I did. In a hex editor I located the start of the compressed data portion (a compressed executable in zlib will start with an end letter of the alphabet like x, y, or z due to the fixed "MZ" header, this particular one begins with 'x'), extracted it into another file and decomp'd it.

I also noticed the "drop to disk" image, after I'd decompressed the thing already (well, it was kind of fun...). I'd guess both are the same. (I normally, unpack if necessary, examine new software by browsing through it with Notepad (it works well, and easily enables you to see code compactly) to ensure there is nothing particularly malicious or strange in there). Running the dropped image on my system yields an IPF in MFC42.DLL. I presume the dropper initialises the DLL first or does some similar action to prevent this from happening.

The decompressor stub is about 175K but most of that is static data (a lot of hex strings and possible reg keys?) and zlib unpacking routines, so it shouldn't be that much of a bother to reverse and see exactly how it drops the image to disk and then loads it. (There is a CreateProcessA and some file-related imps in the IT, so it's quite obvious what it's doing.)

The registration key is encoded in base-64. Figure out how I hypothesized that without actually using a debugger or a disassembler, only using Notepad

5aLIVE
October 25th, 2005, 08:30
Quote:
[Originally Posted by LLXX]Well, I've seen a lot of these "home-made" packers before, and they all seem to use some sort of standard compression algorithm, so I wrote a little skeletal program that can use various compression library DLLs to essentially compress or decompress raw data from files. It's also very useful for decompressing software installation packages too.

Sounds like a neat little tool. Would you be willing to share the source so that I can learn from it please?
Quote:
[Originally Posted by LLXX] If you'd rather not code your own, searching the Internet reveals "demonstration" programs for particular compression libraries, which often allow you to compress and decompress raw data.

I wouldn't mind coding my own tool some day, but I don't yet have the experience to do it. I am working my way through some C++ books at the moment so with time and practice I should be able to come up with something. I image it's a bit of an advanced project for me at the moment.
Quote:
[Originally Posted by LLXX]


Quote:
[Originally Posted by LLXX]As for the unpacking... let me tell you that I never touched it with a debugger at all. From reading the paragraph above I think you can already guess what I did.

Yes, I think so.

Quote:
[Originally Posted by LLXX] In a hex editor I located the start of the compressed data portion (a compressed executable in zlib will start with an end letter of the alphabet like x, y, or z due to the fixed "MZ" header, this particular one begins with 'x'), extracted it into another file and decomp'd it.

Now that is an interesting fact, I had no idea a compressed exe begins with x,y or z. You must have learned this from direct experience with ZLib and perhaps a data compression expert? You decomp'd the new file using your custimised tool in this case, yes?

Quote:
[Originally Posted by LLXX]I also noticed the "drop to disk" image, after I'd decompressed the thing already (well, it was kind of fun...).

Absolutley, I was suprised by this myself (never seen such a "trick". You've just proven that there is more than one way to approach this.

Quote:
[Originally Posted by LLXX] I'd guess both are the same. (I normally, unpack if necessary, examine new software by browsing through it with Notepad (it works well, and easily enables you to see code compactly) to ensure there is nothing particularly malicious or strange in there). Running the dropped image on my system yields an IPF in MFC42.DLL. I presume the dropper initialises the DLL first or does some similar action to prevent this from happening.

I wouldn't mind making the comparison to verify if you like. Using notepad as a reversing tool now that is a new one on me. Quite a revelation I normally use hiew or IDA. I;m afraid I don't understand what you mean by an IPF, what is it?
Some sort of page fault would be my best guess.

Quote:
[Originally Posted by LLXX]The decompressor stub is about 175K but most of that is static data (a lot of hex strings and possible reg keys?) and zlib unpacking routines, so it shouldn't be that much of a bother to reverse and see exactly how it drops the image to disk and then loads it. (There is a CreateProcessA and some file-related imps in the IT, so it's quite obvious what it's doing.)

I understand in essence what you are saying but I really need to read up on some of the Win32 APIs to get anywhere close to seeing what makes it tick.

Quote:
[Originally Posted by LLXX]The registration key is encoded in base-64. Figure out how I hypothesized that without actually using a debugger or a disassembler, only using Notepad

I can only guess that you made this discovery having seen string data which has a familiar characteristic found in base64 code strings; it looks random but there appears to be some pattern to it. and the case of the ASCII characters changes every so often, with upper case characters occuring in groups of 3. Well that was the case for the test base 64 string I looked at.
Am I right?

LLXX, I really must thank you for taking the time to write such a lengthy and informative reply.

UPDATE:
I found a user name and serial number for an old release of this which looks like base64. I thought if I decode it, it would reveal the the stolen/sniffed user name and password. The decoder gives unexpected ASCII chars like LSÿ›3Ú£4dKAaÌ‹¼1, so I can only assume they are not valid base64 strings or I am assuming the registration scheme is simpler than it actually is. I also tried encoding the user name and password which gives VEZQL216UGFvelFTQldSTFFXSE1pN3d4 which is easier to read but still unitelligible. I won't really know how it works unless I actually try and debug the reg scheme of course. This is proving to be an interesting distraction.

UPDATE :The user name and serial number doesn't even work with the old release so there is probably not much point playing with the base64 de/encoder at this point.

Best regards,
5aLIVE

LLXX
October 25th, 2005, 19:57
Quote:
[Originally Posted by 5aLIVE]Sounds like a neat little tool. Would you be willing to share the source so that I can learn from it please?

I really don't want to disassemble it to get the source; I've since lost the source since it was written a long time (5+ years) ago. But all it is is a dialog box with a few textboxes (input filename, output filename, codec DLL, codec DLL function name) and two buttons Execute and Exit. It opens the input file, reads it into memory, runs the specified codec, writes the output to the output file, reads in the next block, etc. Not too complicated at all. All I had to do was "standardise" the compression library interfaces by writing wrapper DLLs for them.

Quote:
[Originally Posted by 5aLIVE]
Now that is an interesting fact, I had no idea a compressed exe begins with x,y or z. You must have learned this from direct experience with ZLib and perhaps a data compression expert? You decomp'd the new file using your custimised tool in this case, yes?

This is just from experimenting with zlib. Since all EXEs have "MZ" as the first two letters, there must be a constant in the compressed version. Same goes for other compressed files with constant headers (BMPs, GIFs, ...)

Quote:
[Originally Posted by 5aLIVE]I;m afraid I don't understand what you mean by an IPF, what is it?
Some sort of page fault would be my best guess.

Invalid Page Fault.

Quote:
[Originally Posted by 5aLIVE]I can only guess that you made this discovery having seen string data which has a familiar characteristic found in base64 code strings; it looks random but there appears to be some pattern to it. and the case of the ASCII characters changes every so often, with upper case characters occuring in groups of 3. Well that was the case for the test base 64 string I looked at.
Am I right?

Good observation but no, that's not it. This is a little more tricky but you'll understand this quite easily. The string

ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/

is found. There are 52 letters, 10 numbers, and 2 extra symbols bringing the total to... 64. It's a very good guess that this is a base-64 converting array. Every serial I could find is 24 characters in length, composed of only characters from the above array. Each base-64 character corresponds to 6 bits (2^6), so 144 bits are present. This is 18 bytes. Now, in the dropper/unpacker, there are interesting sections - first, a large array that seems to be composed of 12-byte elements. Next, hex strings that look like 14-byte (112-bit) hashes, all appearing to begin with "7" (some appear concatenated together but there is a fair bit of padding between them). I think I have a hypothesis of the protection system already, without ever having touched a disassembler or debugger to the program. The username is hashed in some way, and the hashed username is supposed to agree with the resulting 18-byte value converted from the regkey. The 14-byte hashes somehow play a role in this... how exactly I'm not sure.

I've essentially done a partial "mental reverse", using assumptions and observations.

upb
October 26th, 2005, 20:23
Quote:
[Originally Posted by 5aLIVE]So the "child" reads mem space from the "father" to do a check and runs if it is present


That is impossible under win32

LLXX
October 27th, 2005, 03:01
Quote:
[Originally Posted by upb]That is impossible under win32

Apparently you never heard of the OpenProcess, ReadProcessMemory, and WriteProcessMemory APIs...