Log in

View Full Version : Reverse-engineering old database application


MavisBeacon
April 14th, 2012, 18:42
I have in my possession a database containing about 900MB worth of useful information[1] - it uses a relatively obscure database system (nothing proprietary - just one of those third-party DBMSs that were popular in the mid-1990s) however the program is taking rather extensive steps to protect itself.

[1]This isn't illegally acquired information - actually this is an old company database from a third-party program that my boss would like me to recover data from, if I can.

To begin with, the actual data is distributed on its own CD (separate from the installer) protected with SecuROM and needs Alcohol120's RMPS Emulation. Then, within the program directory there's the tell-tale sign of CrypKey - finally, the database files themselves seem to be encrypted - though I can't tell if this is encryption built-in to the DBMS (what little documentation I've found about it denies that encryption is supported, which suggests that records are encrypted or compressed before being sent to the DBMS).

SecuROM and CrypKey seem to work together to defeat my attempts to debug the running process - I've tried a static analysis with IDA Pro and OllyDBG but after poking around I quickly get lost, and the program quits (thanks to SecuROM) if it's launched from within a debugger anyway (and CrypKey or another tool) actively prevents me from trying to attach to the process after it's launched.

I decided to start from the other end and try to access the database files directly using the DBMS engine DLL file included in the project (and a few header files and documentation pieces I found on the web) - I'm making some progress, but I won't know if it's worked for a while.

Is there a failsafe way to remove SecuROM and CrypKey from a program and still have it function okay? I know this program uses CrypKey to control access to certain features of the program, which means removing it might be harder than SecruROM (which only seems to be involved during the program's bootstrap phase).

Presumably after the DRM has been stripped out I'd be able to debug the EXE properly and inspect calls to the DBMS DLL directly and see where/why/how data is being obfuscated before being written.

Is my plan okay so far? Can anyone recommend resources for removing SecuROM et al?

Thanks.

reverser
April 28th, 2012, 17:12
Maybe try API hooking or dynamic instrumentation instead of straight debugging. Something like Intel's PIN might work.

Aimless
April 29th, 2012, 11:24
Maybe you could let me know the DBMS name (as long as you are not giving out the application name, it's ok) that you have found out (as you say you have some basic documentation on it).

Let me know if its Btrieve or DBase X.

Your approach depends on what you want to do:

1. If you want to get a working executable, then you don't need to concern yourself with the data. Especially since you mentioned it does not support encryption

2. If you want to get the data, you need to forget Crypkey and Securom and directly attack the DBMS. Note that encrypted data can by put in the database, as compared to the database itself supporting encryption inbuilt. A major difference. Not to say that this option then becomes problematic, as you need option 1 to understand how data is being encrypted. But first things first - lets see you access the data at least.

Have Phun

MavisBeacon
April 29th, 2012, 12:55
Quote:
[Originally Posted by Aimless;92421]Maybe you could let me know the DBMS name (as long as you are not giving out the application name, it's ok) that you have found out (as you say you have some basic documentation on it).

Let me know if its Btrieve or DBase X.


It's "C-Tree" (the C-Tree DLL is from 1997, so it's the "old-old" version, not the new client/server version). However it's slightly more complicated than that:

I'll come-clean and announce the software is a CD full of marketing data and leads - it's a massive database of companies, consumers, potential customers - that kind of stuff (all legal!).

The data is broken into a series of .dat and .idx files. Each .dat file corresponds to a single table of data, and each .idx is just an index.

Most of the tables are unencrypted and accessing the data is easy (I could write a quick C# program to dump the data no problem). When I view these files with hex editor or even notepad I can see the file's header, field definitions, and the raw data. Each table contains one segment of a street address, so one table contains a listing of every town/city name, another two files contains all of the street and building names, and so on. None of these readable tables contain data that "joins everything up" (i.e. there's no readable data that says which streets exist in which town).

However there are a couple of giant files (also with a .dat extension) that are obviously encrypted (i.e. there's no obvious file header, no field definitions, the statistical distribution of data seems completely random). It's obvious that these giant files would contain the data that correlates the address component data together.

After the program boots up, it is possible to run Process Monitor to see what data files the program accesses when you perform a query with the protected software. It seems the unencrypted data files are used to populate the auto-complete and "index" areas of the program, however when you actually hit the Search button in the GUI the program finally makes attempts to read the giant encrypted files.

Process Monitor's Stack Trace tool revealed that attempts to read the unencrypted files all go through the C-Tree DLL file, but attempts to read the encrypted files go through a different DLL file - this second DLL seems to be protected by Crypkey (and there are also hints of Crypkey protection in the entrypoint EXE too).

In summary, the unprotected data is easily readable and uses C-Tree, however it's useless without the central data which seems to be encrypted and possibly using a proprietary or homebrew DBMS. The DLL that reads that data is about 400KB.

Quote:
[Originally Posted by Aimless;92421]Your approach depends on what you want to do:

1. If you want to get a working executable, then you don't need to concern yourself with the data. Especially since you mentioned it does not support encryption


The data is the important part, but because it's encrypted (or at least obfuscated) I've no chance of reading it without being able to get into the program that is able to read it - however, said program is protected by Securom and Crypkey.

Quote:
[Originally Posted by Aimless;92421]Have Phun


I'm glad this is a "hobby" assignment - if this job head a deadline I'd be stressing myself - far from having fun

Thanks!

Aimless
April 29th, 2012, 13:53
Incidentally...

since you mention .dat and .idx files...

I presume the application is written in some form of COBOL?

Have Phun

MavisBeacon
April 29th, 2012, 14:12
Quote:
[Originally Posted by Aimless;92424]Incidentally...

since you mention .dat and .idx files...

I presume the application is written in some form of COBOL?

Have Phun


Nah, from what I can tell it's an MFC application. There are also references to msvcrt.

I think I'm being inaccurate when I describe the system as old - this CD was released in 2008 but when I first made this thread I assumed that all of the data was 1997 C-Tree data files (hence "old", it's only in the past week I saw how the encrypted data goes through a separate (and more modern) DLL.

I'm going to start with trying to remove Securom first - I've found lots of resources online about getting the original executable out and the version used is very similar to the ones used in the guides.

I suspect removing Securom will be the easy part because Crypkey also has anti-debugging defenses, but there are a lot less resources on the web about it; and unlike Securom, the Crypkey system is integrated with the application rather than just being a wrapper/unpacker.