BGL (babylon glossary) to GLS (babylon glossary source). [Archive]

acidmelt

April 19th, 2005, 04:15

hello reversers, a while back i tried to reverse the babylon GLS format so i would be able to read data out of it and use it in my own personal project, this task however is beyond my very noobish debugging skills and obviously i failed.

i wasnt sure about where this post best fits in, it was either advanced reversing (since this requires pretty advanced practises) or the mini project area, feel free to move it.

anyways this is the data i have gathered so far:
the decryption algo can be found at the "babylon" program itself (http://www.babylon.com)
the encryption algo can be found at the "babylon builder" program which is used to write dictionaries and is publicly available (http://www.babylon.com/builder)
-----------
there is zero documentation about this format available on the net.
ive found this page (http://fjolliton.free.fr/babytrans/) which asserts that the new babylon bgl format is encrypted using the "Cipher Square" algorithm (http://www.esat.kuleuven.ac.be/~rijmen/square/).
-----------
after examining a few *.blg's it is visible that the first 8 bytes of the file are the signature.
ive checked wotsit.org for documentation and found nothing.

in a recent thread (http://www.woodmann.com/forum/showthread.php?t=6934) bilbo have suggested this as a project.. so i thought that id start this thread and see what happens.

what do you say?

bilbo

April 19th, 2005, 11:05

In my opinion, that would be a nice true RE activity, and not related to software stealing...
You have my support, as long as I have time...!

For the moment, I tell you how I would start...

(1) Install Babylon - I have 5.0.1 r7 - dunno if last - and focalize on one BGL you have installed...
The program is not compressed/protected in any way from debuggers.
(2) Menu->Glossaries->Glossary Options, and remove your BGL
(3) Attach to Babylon.exe with your preferred debugger and set a breakpoint on CreateFileA / ReadFile
(4) Menu->Glossaries->Install glossary from disk, and reinstall your target BGL
(5) Debugger will break at start of API: on stack you will find the return address and the BGL file name.
(6) ... no time for now to go on...

Best regards, bilbo

acidmelt

April 20th, 2005, 07:49

hey bilbo, i have tried your suggestion for debugging the app (using olly) and i have encountered a rather strange behaviour.. it seems that at startup babylon is iterating thru all the files inside %windir%\fonts and opening each one of them... i dont see any reason for that.
anyways, i have stepped-over the code searching for the right CreateFile() and i wasnt able to find any reference that is opening a *.bgl.

another problem was that as soon as i go into glossaries->add glossaries olly reports a memory access violation.. id guess that babylon does holds some sort of anti-reversal protection

as i said my debugging skills are very limited and i would be glad if you (bilbo) or any other experienced reversers would take a look at that

oh, and one last thing.. judging by the ram usage and the speed of seeking i assume that the glossaries are being loaded into memory at startup (duh.) so taking a memory dump should provide us with a valid copy of the decrypted gloss right?

bilbo

April 20th, 2005, 11:07

Hello, acidmelt!

I had some time to make other nice steps in "our" project. Let's see...

Quote:

[Originally Posted by acidmelt]it seems that at startup babylon is iterating thru all the files inside %windir%\fonts and opening each one of them... i dont see any reason for that.

No, that's not my case (I've checked with FILEMON). It could be that you have a yet fresh installation of Babylon, and it is yet auto-learning the fonts installed on your system for OCR. If that is the case you should also see an high CPU load for the following hours on your system.

Quote:

[Originally Posted by acidmelt]anyways, i have stepped-over the code searching for the right CreateFile() and i wasnt able to find any reference that is opening a *.bgl.

That was the reason I suggested you to put a breakpoint only after the initial phase and load a new BGL when the program is already started.

Quote:

[Originally Posted by acidmelt]another problem was that as soon as i go into glossaries->add glossaries olly reports a memory access violation.. id guess that babylon does holds some sort of anti-reversal protection

You're right, I don't use Olly and I did not noticed it. It is not a Memory Access Violation neither an anti-debugging trick. It is a lot of Exceptions C++ E06D7363. I dunno the exact reason. Anyway: Options->Debugging Options-> Exceptions->select Ignore Custom Exceptions and press button "Add last exception". This solves Olly problem!

Quote:

[Originally Posted by acidmelt]oh, and one last thing.. judging by the ram usage and the speed of seeking i assume that the glossaries are being loaded into memory at startup (duh.)

Correct!

Quote:

[Originally Posted by acidmelt]so taking a memory dump should provide us with a valid copy of the decrypted gloss right?

You have yet to localize the data and to interpret them, though!

Quote:

[Originally Posted by acidmelt]Ive found this page (http://fjolliton.free.fr/babytrans/) which asserts that the new babylon bgl format is encrypted using the "Cipher Square" algorithm (http://www.esat.kuleuven.ac.be/~rijmen/square/).

That's a wrong info, as far as I've seen!

And now the good news.
What you already found, the 4(8?)-bytes signature, can be of three types:
12340003 .BDC extension - to be studied
12340002 .BGL generated by the builder in some cases - to be studied
12340001 .BGL distributed on Babylon site - I've started from these...

I've managed to identify their decompression (not decryption) algorithm, using the 5 steps I suggested you. It is simply ZLIB, release 1.1.3 (rather old...). The routines are inside BabyServices.DLL, but they are called from BContentServer.DLL. I will tell you more details in the following messages if you are interested.

Since the Library is completely free, and not GPL-ed, they cannot be blamed for performing a GPL violation, I suppose.

Now, take one BGL of the last type, remove the first 0x47 bytes, and save it with a .GZ extension. The new file must start with 0x1F. Then you can extract it with WinZip, and you can browse its uncompressed contents.
Not so bad, isn't it? There are many initial field we must discover yet, tough.

If you want to play reversing some more, put a breakpoint at 0x9B29AF, run Baby and "Install glossary from disk" as I told you at step (4).
You must land at this code

Code:



009B29AF   lea         ecx,[ebp-1030h]  ; uncompressed buffer to be filled

009B29B5   push        1  ; number of bytes to uncompress

009B29B7   push        ecx

009B29B8   mov         ecx,dword ptr [ebp-1Ch]

009B29BB   push        ecx  ; compression structure 64h bytes

009B29BC   mov         ecx,eax  ; ZLIB object (Baby source is in C++)

009B29BE   call        dword ptr [edx+18h]  ; inflate

Execute the whole subroutine and you will find in the buffer the first uncompressed byte, 60 in my case. Try to discover the meaning of that value...
I stop here at the moment... no more time.

Best regards, bilbo

P.S. JMI, I don't know if I can go on. Maybe the subforum is not correct, the matter is against rules, nobody else is interested, etc. etc.
Please let me know...

JMI

April 20th, 2005, 11:20

Seems OK so far. Go for it.

Regards,

acidmelt

April 20th, 2005, 12:48

bilbo that is some awesome information!

here are my finding:
to my surprise, after decompression the resulting files dont require any further decryption.. after scrolling a bit (offset 0xC47 at the eng_eng dictionary) you can see simple html tags and inbetween them are the definitions
try changing the extension of the uncompressed file to html

i have created a simple glossary with only 3 words to figure out the way that the definitions are aligned:
TERM 0x000C DEFINITION 0x101809 TERM 0x000C and so on..
however this is different in 12340001 bgls.. ill further analyse them.

the byte at offset 0x5 points to the begining of the gzip header, convenient

the gzip header of 12340001 files starts at 0x47 (as you said).
the gzip header of 12340002 files starts at 0x69.

on 12340003 (*.bdc) files however this is not the case.. this files seem to be uncompressed and it seems that their format is similer to the old *.dic.

p.s im stupid, i totaly forgot about babylons ocr capabilities.

thank you bilbo

bilbo

April 21st, 2005, 11:28

Hello, acidmelt, and everyone interested (nobody seems to be...),

Quote:

try changing the extension of the uncompressed file to html

ok, but that is just a resource... the whole dictionary is not HTML format

Quote:

the byte at offset 0x5 points to the begining of the gzip header, convenient

great... I would say at offset 0x4, though, because all the Baby entities are in big-endian form (the high byte first, read on)

Quote:

the gzip header of 12340002 files starts at 0x69

great! one point to you!

And now the step for today...

I started from the address I told you yesterday and I have reversed some stuff here and there (sub_9B1DCO and related ones). These are my findings.

The uncompressed file is a collection of records.
Every record has a one-byte header.
The low nibble is the record type.
The high nibble holds indication of the record length, with the following rule:

high nibble>=4: subtract 4; that is the length
high nibble <4: add 1: that is the number of bytes for the following length (in big-endian format)

As for the record types:
0 - one-byte specifier will follow, and the data next
1 - this is an entry: the entry name will follow as a string preceded by one byte for length, and the definition next
2 - this is a named resource: the resource name will follow as above (e.g xxx.bmp, xxx.html) (and the data next)
3 - two byte specifier will follow, and the data next
4/6 - no specifier, 0 bytes of data - type 6 is at end

But I hate the theory, so here is a little program which will scan the whole uncompressed file.
I have tried it successully on a little BGL: Code Analysis, at http://info.babylon.com/gl_index/gl_template.php?id=46760

Code:



#include <stdio.h>

#include <stdlib.h>

#include <string.h>



void

main(int argc, char **argv)

{

	char resname[256];

	unsigned char hdr, high_nibble, lenbyte;

	unsigned char specifier[2];

	int i, record_length;

	FILE *fpin;

	long curpos, datapos;



	if (argc != 2) {

		printf("usage: %s uncompressed_filename\n", argv[0]);

		return;

		}



	fpin = fopen(argv[1], "rb";

	if (!fpin) goto ko;



		// a record per loop

	while (1) {

		curpos = ftell(fpin);

		fread(&hdr, 1, sizeof(hdr), fpin);

		if (feof(fpin)) return;



			// get the record size

		high_nibble = hdr >> 4;

		if (high_nibble >= 4) record_length = high_nibble - 4;

		else for (i=record_length=0; i<high_nibble+1; i++) {

			record_length *= 256;

			fread(&lenbyte, 1, sizeof(lenbyte), fpin);

			record_length += lenbyte;

			}

		datapos = ftell(fpin);



		switch (hdr & 0xF) {  // low nibble



		case 0:  // one-byte specifier follows

			fread(specifier, 1, 1, fpin);

			printf("@%x: <id %x> %x bytes\n",

				curpos, specifier[0], record_length);

			break;

		case 3:  // two-bytes specifier follows

			fread(specifier, 1, 2, fpin);

			printf("@%x: <id %x> %x bytes\n",

				curpos, specifier[0]*256+specifier[1], record_length);

			break;

		case 4:  // no specifier

		case 6:  // no specifier

			printf("@%x: <no id(%d)> %x bytes\n",

				curpos, hdr&0xF, record_length);

			break;



		case 2:  // named resource

			fread(&lenbyte, 1, sizeof(lenbyte), fpin);

			fread(resname, 1, lenbyte, fpin);

			printf("@%x: <res %.*s> %x bytes\n",

				curpos, lenbyte, resname, record_length);

			break;



		case 1:  // entry

			fread(&lenbyte, 1, sizeof(lenbyte), fpin);

			fread(resname, 1, lenbyte, fpin);

			printf("@%x: <entry> \"%.*s\"> %x bytes\n",

				curpos, lenbyte, resname, record_length);

			break;



		default:

			printf("unexpected low_nibble %x\n", hdr & 0xF);

			return;

		}

		fseek(fpin, datapos+record_length, SEEK_SET);

		}



	return;

ko:

	printf("exit due to error %d: %s\n", errno, strerror(errno));

}

We need only to understand the meaning of the specifiers...
Best regards, bilbo

dELTA

April 22nd, 2005, 03:20

Nice work as always bilbo.

Quote:

and everyone interested (nobody seems to be...)

Sure we are, just lurking.

Keep up the good work.

acidmelt

April 22nd, 2005, 03:56

hey bilbo!

again thats plenty of great information.. thank you

i wrote a little program to explore uncompressed bgls based on your code

Code:



#include <stdio.h>

#include <windows.h>

#include <conio.h>



int isvalidchar(char ch);

void stripjunk(char *buffer);



struct bdc {

	char szTerm[256];

	char szDefinition[256];

} **babyterm[27]; 

int ptrcnt[27]; //sorted



void main(int argc,char **argv) {

FILE *fdic;

int ix,iy,rec_length;

unsigned char hdr,high_nibble,lenbyte,tmpch;

unsigned long datapos;

char tmpbuff[256];

char uterm[256];

int bg,eg;



if(argc!=2) { printf("usage: %s uncompressed_filename\n", argv[0]); return; }

//initial allocation of pointers

for(ix=0;ix<27;ix++)  {

	babyterm[ix]=(struct bdc**)malloc(sizeof(struct bdc*));

	ptrcnt[ix]=0;

}

//>>parsing

fdic=fopen(argv[1],"rb";

if(!fdic) { printf("error opening file [%s].\n",argv[1]); return; }

bg=GetTickCount();

while(1) {

	fread(&hdr,sizeof(char),1,fdic);

	if(feof(fdic)) break;



	//get record size

	high_nibble=hdr >> 4;

	if(high_nibble>=4) rec_length=high_nibble-4;

	else {

		for(ix=rec_length=0;ix<high_nibble+1;ix++) {

			rec_length*=256;

			fread(&lenbyte,sizeof(char),1,fdic);

			rec_length+=lenbyte;

		}

	}

	datapos=ftell(fdic);



	switch(hdr & 0xF) {

			case 1: {

			fread(&lenbyte,sizeof(char),1,fdic);

			memset(tmpbuff,0,lenbyte+1);

			fread(tmpbuff,sizeof(char),lenbyte,fdic);

			if(!isalpha(tmpbuff[0])) break;

			stripjunk(tmpbuff);

			//printf("TERM [%s] -> \n",tmpbuff);

			//>>allocating space for term struct

			tmpch=tolower(tmpbuff[0])-'a';

			babyterm[tmpch][ptrcnt[tmpch]]=(struct bdc*)malloc(sizeof(struct bdc));

			if(babyterm[tmpch][ptrcnt[tmpch]]==NULL) {

				printf(":O ran out of space.\n";

				return;

			}

			strcpy(babyterm[tmpch][ptrcnt[tmpch]]->szTerm,tmpbuff);

			//>>

			fseek(fdic,1,SEEK_CUR); //definiton lenbyte is next

			fread(&lenbyte,sizeof(char),1,fdic);

			memset(tmpbuff,0,lenbyte+1);

			fread(tmpbuff,sizeof(char),lenbyte,fdic);

			stripjunk(tmpbuff);

			strcpy(babyterm[tmpch][ptrcnt[tmpch]]->szDefinition,tmpbuff);

			//printf("DEF [%s]\n",tmpbuff);

			ptrcnt[tmpch]++;

			} break;

			default: break;

	}

	fseek(fdic,datapos+rec_length,SEEK_SET);

}

eg=GetTickCount();

fclose(fdic);	



printf("total parsing time: %dms\n",eg-bg);



printf("--------------------------\n";

for(ix=0;ix<27;ix++) {

	if(ptrcnt[ix]>0) {

		for(iy=0;iy<ptrcnt[ix];iy++) {

		printf("--\n[%s][%s]\n",babyterm[ix][iy]->szTerm,babyterm[ix][iy]->szDefinition);

		if(getch()==27) goto takeinp;

		}

	}

}

printf("--------------------------\n\n";

takeinp:

for(; { //input loop

	memset(uterm,0,256);

	printf("Term:";

	scanf("%256s",uterm);

	if(uterm[0]) {

		tmpch=tolower(uterm[0])-'a';

		for(ix=0;ix<ptrcnt[tmpch];ix++) 

			if(!strcmpi(babyterm[tmpch][ix]->szTerm,uterm))

			printf("%s = \n%s\n",uterm,babyterm[tmpch][ix]->szDefinition);

	}

}

}



void stripjunk(char *buffer) {

int ix,slen;

slen=strlen(buffer);



for(ix=1;ix<slen;ix++)

	if(buffer[ix]=='$') { buffer[ix]=0; break; }

slen=ix;

for(ix=0;ix<slen;ix++) 

	if(!isvalidchar(buffer[ix])) { buffer[ix]=0; break; }

	

}



int isvalidchar(char ch) {

	int ix;

	char valtab[]="abcdefghijklmnopqrstuvwxyz 0123456789!@#$%&8()_-+=|{}[]<>\"',.%%/\\:;!?";

	ch=tolower(ch);



	for(ix=0;(unsigned)ix<strlen(valtab);ix++)

		if(ch==valtab[ix]) return 1;

return 0;

}

however whats about the rest of the data?
it seems as if the uncompressed files have some sort of header?

i just took a look at some *.gls and i belive our goal is completed (well you did most of the work so kudos to you)

the format is really simple:
### Glossary title:testTitle
### Author:testAuthor
### Description:testGlossDescription
### Source language:English
### Source alphabet

efault
### Browsing enabled?No
### Type of glossary:00000000
### Case sensitive words?0
### Glossary section:

test1
meaning1

test2
meaning2

test3
meaning3
--------
using the code above it is really easy to produce gls's..

bilbo

April 22nd, 2005, 09:52

Good, acidmelt, you added some indexing feature (dynamic array 'babyterm'), but... there is a bug...

You initialized babyterm[ix] just at init time with only one pointer in it! In this way the entries are overwritten as they grow.
You can remove the whole initialization loop, but you must add, before every new entry allocation, a resizing of the **babyterm array:

Code:



babyterm[tmpch] = (struct bdc**)realloc(babyterm[tmpch],

                           (ptrcnt[tmpch]+1)*sizeof(struct bdc*));

babyterm[tmpch][ptrcnt[tmpch]] = (struct bdc*)malloc(sizeof(struct bdc));

instead of the simple

Code:



babyterm[tmpch][ptrcnt[tmpch]] = (struct bdc*)malloc(sizeof(struct bdc));

By the way, realloc will work also the first time, when the area to reallocate has address 0.

Ok.
And you removed a lot of things: not just spaces in the source, I see, you don't like spaces

); but also non ASCII characters which are used as quotes or underscores, etc. If you try your program on the BGL I suggested, many definitions are cut.

That's all for this weekend, I have other things to do...
A simple addition would be to integrate ZLIB in the program in order to uncompress the file automatically...

Best regards, bilbo

P.S. thx dELTA (and acidmelt) for appreciation...
P.P.S. I suggest to have a look at the dictionary I linked in my previous message, there is also something for Fravia

Quote:

+Fravia: One of the best reverser in the world. Founder of +Fravia's Pages of Reverse Engineering

and for our friend Zero

Quote:

Universitas Virtualis: Free knowledge project which provides a professional place for Algorithms, Software-Engineering, Software-Protection and Reverse Code Engineering, Cryptography and Cryptanalysis.

acidmelt

April 24th, 2005, 03:45

hey bilbo!

thanks for the corrections

in my previous code i have ignored some important details which made the parsing crippled.. anyways here is a fixed code incorporating zlib, so there is no need to manually unpack bgls

Code:



#include <stdio.h>

#include <windows.h>

#include "zlib.h"



#pragma comment(lib,"zlib.lib"



int isvalidchar(char ch);

void stripjunk(char *buffer,char type);

int focc(char *cstr,char ch);

int uncomp_bgl(char *bglname,char *datname);

int writegls(char *datname);



char glsheader[1024];

char glsheadertemplate[]=

"### Glossary title:%s\r\n"

"### Author:%s\r\n"

"### Description:%s\r\n"

"### Source language:English\r\n"

"### Source alphabetefault\r\n"

"### Target language:English\r\n"

"### Target alphabetefault\r\n"

"### Browsing enabled?No\r\n"

"### Type of glossary:00000000\r\n"

"### Case sensitive words?0\r\n"

";gls generated by bglgls\r\n\r\n"

"### Glossary section:\r\n\r\n";



int main(int argc,char **argv) {

int ix;

char szAuth[32];

char szTitle[32];

char szDescription[128];

char datfname[128];



if(argc!=2) { 

	printf("usage: bglgls.exe filename.bgl\n"; 

	return 0; 

}

//>get input

printf("gls Author:";

fgets(szAuth,32,stdin);

printf("gls Title:";

fgets(szTitle,32,stdin);

printf("gls Description:";

fgets(szDescription,128,stdin);



szAuth[strlen(szAuth)-1]=0;

szTitle[strlen(szTitle)-1]=0;

szDescription[strlen(szDescription)-1]=0;

sprintf(glsheader,glsheadertemplate,szAuth,szTitle,szDescription);

//>set output filename

strncpy(datfname,argv[1],128);

ix=focc(datfname,'.');

if(ix<0) { printf("invalid filename\n"; return 0; }

datfname[ix]=0;

strcat(datfname,".dat";

//>>

if(!uncomp_bgl(argv[1],datfname)) { printf("error uncompressing BGL.\n"; return 0; }

if(!writegls(datfname)) { printf("error writing GLS.\n"; return 0; }

return 0;

}

//>>uncompression routine

int uncomp_bgl(char *bglname,char *datname) {

FILE *ztmp;

FILE *zfile;

char iobuff[128];

char tmppath[256];

char tmpfname[256];

unsigned char zptrbyte;

int tread;



//get temp filename

GetTempPath(256,tmppath);

GetTempFileName(tmppath,"bgl",0,tmpfname);

ztmp=fopen(tmpfname,"wb";

if(!ztmp) return 0;

//>

zfile=fopen(bglname,"rb";

if(!zfile) return 0;

fseek(zfile,0x5,SEEK_SET);

fread(&zptrbyte,sizeof(char),1,zfile);

printf("zlib header@0x%X\n",zptrbyte);

fseek(zfile,zptrbyte,SEEK_SET);

while(!feof(zfile)) {

	tread=fread(iobuff,sizeof(char),128,zfile);

	fwrite(iobuff,sizeof(char),tread,ztmp);

}

fclose(zfile);

fclose(ztmp);

//>>uncompressing >

zfile=fopen(datname,"wb";

ztmp=gzopen(tmpfname,"rb";

if(!zfile||!ztmp) return 0;

while(!gzeof(ztmp)) {

	tread=gzread(ztmp,iobuff,128);

	fwrite(iobuff,sizeof(char),tread,zfile);

}

gzclose(ztmp);

fclose(zfile);

DeleteFile(tmpfname); //get rid of temporary file

return 1;

}

//write gls

int writegls(char *datname) {

FILE *fdic,*fgls;

int ix,rec_length;

short int lenword;

unsigned char hdr,high_nibble,lenbyte;

unsigned char lenmul,lenadd;

unsigned long datapos;

char tmpbuff[1024];

char glsf[256];

int tt=0,lt=0;



//gls filename

strcpy(glsf,datname);

ix=focc(glsf,'.');

glsf[ix]=0;

strcat(glsf,".gls";

printf("gls filename:%s\n",glsf);

fgls=fopen(glsf,"wb";

if(!fgls) return 0;

//>write header

printf("writing GLS";

fwrite(glsheader,sizeof(char),strlen(glsheader),fgls);

//>>parsing

fdic=fopen(datname,"rb";

if(!fdic) return 0;

while(1) {

	fread(&hdr,sizeof(char),1,fdic);

	if(feof(fdic)) break;



	//get record size

	high_nibble=hdr >> 4;

	if(high_nibble>=4) rec_length=high_nibble-4;

	else {

		for(ix=rec_length=0;ix<high_nibble+1;ix++) {

			rec_length*=256;

			fread(&lenbyte,sizeof(char),1,fdic);

			rec_length+=lenbyte;

		}

	}

	datapos=ftell(fdic);



	switch(hdr & 0xF) {

			case 1: {

			fread(&lenbyte,sizeof(char),1,fdic);

			memset(tmpbuff,0,1024);

			fread(tmpbuff,sizeof(char),lenbyte,fdic);

			if(!isalpha(tmpbuff[0])) break;

			stripjunk(tmpbuff,0);

			strcat(tmpbuff,"\r\n";

			fwrite(tmpbuff,sizeof(char),strlen(tmpbuff),fgls);

			fread(&lenmul,sizeof(char),1,fdic);

				fread(&lenadd,sizeof(char),1,fdic);

			memset(tmpbuff,0,1024);

			lenword=lenmul*256+lenadd;

			if(lenword>1019) lenword=1019;

			fread(tmpbuff,sizeof(char),lenword,fdic);

			stripjunk(tmpbuff,1);

			strcat(tmpbuff,"\r\n\r\n";

			fwrite(tmpbuff,sizeof(char),strlen(tmpbuff),fgls);

			if(tt-100==lt) { lt=tt; printf("."; }

			tt++;

			} break;

			default: break;

	}

	fseek(fdic,datapos+rec_length,SEEK_SET);

}

fclose(fdic);

fclose(fgls);	

DeleteFile(datname); //we dont need the *.dat anymore..

printf("%d terms written to file!\n",tt);

return 1;

}

//find occurrence

int focc(char *cstr,char ch) { 

int ix;

for(ix=0;(unsigned)ix<strlen(cstr);ix++)

	if(cstr[ix]==ch) return ix;

return -1;

}

//>

void stripjunk(char *buffer,char type) {

int ix,slen;

slen=strlen(buffer);



if(!type) {

	for(ix=1;ix<slen;ix++)

		if(buffer[ix]=='$') { buffer[ix]=0; break; }

	slen=ix;

}	

for(ix=0;ix<slen;ix++) 

	if(!isvalidchar(buffer[ix])) { buffer[ix]=0; break; }

}

//valid term/definition char

int isvalidchar(char ch) {

	int ix;

	char valtab[]="abcdefghijklmnopqrstuvwxyz 0123456789!@#$%&8()_-+=|{}[]<>\"',.%%/\\:;!?";

	ch=tolower(ch);

	for(ix=0;(unsigned)ix<strlen(valtab);ix++)

		if(ch==valtab[ix]) return 1;

return 0;

}

i have tested it with the code_analysis bgl that you suggested and it now works perfectly. i have also tested it with bablyons english_english dictionary (since it is the largest (18mb unpacked)) and it works really well.

though something is missing and i couldnt figure it out.. babylon shows part-of-speech for each word and id guess that this information is stored in a table somewhere inside the bgl.. thats the last piece missing i believe.

heres a binary (http://www.woodmann.com/forum/attachment.php?attachmentid=1230&stc=1) compiled and linked with zlib.

hrmprog

May 19th, 2008, 23:08

hi
i tried to use above code with unicode BGL but output file didn't complete
which part of this code should be corrected?

dELTA

May 20th, 2008, 03:44

The one that fails. And why don't you debug it and tell us which one that is?

hrmprog

May 20th, 2008, 06:53

by use of above code with unicode BGL, in output file, there isn't any unicode letter and only english letter will appear. i try to conver english to farsi BGL, but in output file only english word appear.

dELTA

May 20th, 2008, 15:11

Windows uses special APIs to handle unicode strings, you must integrate these into the existing source code.

szereshki

October 7th, 2008, 02:45

hi. I have the same problem as hrmprog and waiting a long time to an answer in this post. But this seems not to be continued. in fact the main guys didn't go here since 2005!
Many of the babylon BGLs are in unicode and so its very important to be able to handle unicode BGLs as well. I have little information in C coding and no success in manupulating acidmelts code for unicode. would someone please help me how to modify his code for unicode BGLs?
dELTA should be right. But it's in theory. Thanks to acidmelt, the code is presented above. it will be appreciated if someone put the unicode corrected code here. thx

bilbo

October 7th, 2008, 22:42

What is the release of Babylon you are referring to (7.0 is out) and what is an example of unicode BGL? It is a long time I'm not using Babylon and it is become ever and ever more commercial...
Anyway, some new activity on the target could be interesting... But be prepared to give your contribute: if you do not know C, you can use ASM as well!
Best regards, bilbo

szereshki

October 8th, 2008, 02:09

i'm using v5. but it doesn't matter. cuz the BGLs should work on the new versions as the old ones. Better working on the new version ofcorse. I tried a little farsi BGL file which is attached.
the unicode (farsi) words dont appear in the output file of acidmelt code.
thx for help

bilbo

October 11th, 2008, 23:05

szereshki, I looked at the file you posted; it has exactly the same format as the files we were talking about three years ago... The problem is that the data are discarded by the conversion program because they are not valid ASCII characters.
Let's see for example the first definition, taken from the uncompressed dictionary:

Code:



00000ED5 1241 6273 6F72 7074 696F 6E20 636F 7374 .Absorption cost

00000EE5 696E 6700 0FE5 D2ED E4E5 20ED C7C8 ED20 ing....... ....

00000EF5 CCD0 C8ED                               ....

First byte (12) is the length of first part of the definition; after 12h bytes ("Absorption costing"

you will find the length of the second part, on two bytes in big-endian asset (00 0F). And finally the Unicode stuff follows, 15 bytes. The strangeness is that they are not even (2 bytes per character). Can you interpret this stuff ("E5 D2 ED E4 E5 20 ED C7 C8 ED 20 CC D0 C8 ED"

, or can you provide a BGS source with the corresponding compiled BGL file?

Best regards, bilbo

szereshki

October 13th, 2008, 07:22

Dear Bilbo, you are right.

Code:

The strangeness is that they are not even (2 bytes per character)

Because its not a unicode stuff. Im sorry. I tried removing the first 0x47 bytes and extracting gz file to a html again. I opened it with ie and found the encoding should be on arabic not unicode. indeed the problem is why your program discard this codes and how should not?
many thanks

bilbo

October 15th, 2008, 05:49

Quote:

[Originally Posted by szereshki]Because its not a unicode stuff

Yeah! Simple indeed!
If we launch "charmap" selecting Arial and we select "Windows: Arabic" we will see exactly the codes E5 D2 ED... in arabic chars!

Quote:

[Originally Posted by szereshki]Indeed the problem is why your program discard this codes and how should not?

Simply remove the following check:

Code:

for(ix=0;ix<slen;ix++) 

	if(!isvalidchar(buffer[ix])) { buffer[ix]=0; break; }

Best regards. bilbo

szereshki

October 17th, 2008, 10:41

Many thanks Bilbo. I'll go through checking it.
You are great as you know so much in Reversing, and greatest as an F1 for me and the other newbies.

Bigal

October 24th, 2008, 05:27

Quote:

[Originally Posted by bilbo;77380]Yeah! Simple indeed!
If we launch "charmap" selecting Arial and we select "Windows: Arabic" we will see exactly the codes E5 D2 ED... in arabic chars!

Simply remove the following check:

Code:
for(ix=0;ix<slen;ix++)
if(!isvalidchar(buffer[ix])) { buffer[ix]=0; break; }

Best regards. bilbo

Hi, congrats for the excellent work. I am not a C programmer (just only some Perl hacking). Nevertheless I have tried to compile your code with different compilers but I am always getting compile errors apparently related with zlib. Anyway, it would be great if you could post the binary file without the above piece of code which apparently gives problemes whenever there are Unicode chars (also accents, umlauts, etc).

Maybe you could also add some input parameters too. That would surely be great.

In any case, thanks again for your great work.

szereshki

October 25th, 2008, 02:53

You are right Bigal. I also had some problems and finally unpacked the previously posted binary and changed the asm code in olly. the new problem...
(I'll post some images later) there is also some encoding problems. the reconverted BGL have all the chars, but it didn't follow the encoding. I meen it doesnt show the true charecters, arabic win in my case.
maybe I should explain more. I'll post two images of the original BGL and the reconverted one and also the new binary one week later.

szereshki

October 25th, 2008, 04:29

The original bgl result is attached (Farsi with windows Arabic encoding). The converted one has all the letters, but with wrong encoding. The source and target languages in the glossary properties tab is selected as English although. Changing from English will result in a more defective output.
I also attached the changed binary file. It seems there is just one step more. Bilbo! Its your turn again!
(p.s. I also tried these on Babylon v5)

bilbo

October 28th, 2008, 00:39

Quote:

[Originally Posted by Bigal;77467]I have tried to compile your code with different compilers but I am always getting compile errors apparently related with zlib.

ZLIB must be downloaded apart

Quote:

[Originally Posted by szereshki]I also had some problems and finally unpacked the previously posted binary and changed the asm code in olly

great approach: if you can find the spot to patch, you already know 'C' well! but why don't you try some free C compiler? see for example http://www.thefreecountry.com/compilers/cpp.shtml

Anyway, since I do not know arabic (I tried to learn it when I was young but I have forgot anything) and I don't know arabic/farsi Windows, could you please post both BGL and GLS files, not just a BGL like ahsan.zip?

Best regards, bilbo

szereshki

October 29th, 2008, 05:28

The original sample BGL and its converted GLS, and also some recreated BGLs with different settings is attached:

bilbo

November 1st, 2008, 00:23

szereshki,

what are interesting are not the broken BGL (you posted 1.BGL, 2.BGL, 3.BGL) neither the reconstructed GLS which, as you said, does not work, but the original GLS used to generate the working BGL.
Only in this way we - me, or you - cam compare the original GLS with the reconstructed GLS and see what is different!

Anyway, here is another homework for you... I suspect that the problem is no more in the data contents, but in the initial lines of the reconstructed GLS.

Code:

### Source language:English

### Source alphabetefault

### Target language:English

### Target alphabetefault

Try editing them with an ascii editor (e.g. replacing English or Default with Arabic), and see what happens...
Best regards... bilbo

szereshki

November 1st, 2008, 11:11

Bilbo,

I have posted the original GLS and BGL as well (Read redme>ahsan.gls). although I know you are busy.
You are right as always. Changing the target alphabet from default to Arabic (or Farsi) solved the problem. Your work seems to be completed now. Or someone could add some functionality to the C source to consider this.
Best wishes for you

afree

November 3rd, 2008, 04:09

Hi,
Has anyone compiled it with these new changes (to work with arabic), and if Yes can he post it. I just can't Compile it

bilbo

November 3rd, 2008, 23:54

Quote:

[Originally Posted by afree]I just can't Compile it

Please don't be so categoric! Get a free compiler, get ZLIB.LIB (first hit in Google) and you too will be able to compile it.

Best regards, bilbo

afree

November 4th, 2008, 02:58

I worked a little bit in C, but I almost forgot it all.
Any way, I did manage to compile it, but something doesn't work

Program starts, reads data(I think) but it doesnt write anything except for the header to the file. I will take a look at it later

szereshki

November 4th, 2008, 06:52

Hi, please read the previous post carefully before sending such questions. I also wants you to compile and work on it yourself. but I have sent the compiled file before (#25). Use it and after you got the GLS edit it and change the default lang to arabic.

dreamer155

November 20th, 2008, 15:29

hi
first of all, i wanna thank for this superb code and app. it's been very useful for me. i got plenty of babylon dictionaries, some of them really big. Yesterday, i noticed something regarding the size of characters deflated. for some words, definitions sometimes exceed 1000 chars with all formatting tags eg. "<font><hr><br>...." and these are cut off after 1000 chars, i think. is it possible to increase the buffer size or something to overcome this?

antico2

November 25th, 2008, 18:35

Hi everyone,

I'm triyng to compile acidmelt code but something goes wrong.
I've discovered that is missing #include <ctype.h> ( I'm using DEVC++ ).
The other error that I cannot solve is:

line1> //>>uncompressing >
line2> zfile=fopen(datname,"wb"

;
line3> ztmp=gzopen(tmpfname,"rb"

;
line4> if(!zfile||!ztmp) return 0;

line2> error is: In function `int uncomp_bgl(char*, char*)': invalid conversion from `void*' to `FILE*'

I think is a cast type error, but I'm not able to solve it..

Can anyone help me?

regards

antico2

December 1st, 2008, 11:04

Quote:

[Originally Posted by antico2;77902]Hi everyone,

I'm triyng to compile acidmelt code but something goes wrong.
I've discovered that is missing #include <ctype.h> ( I'm using DEVC++ ).
The other error that I cannot solve is:

line1> //>>uncompressing >
line2> zfile=fopen(datname,"wb";
line3> ztmp=gzopen(tmpfname,"rb";
line4> if(!zfile||!ztmp) return 0;

line2> error is: In function `int uncomp_bgl(char*, char*)': invalid conversion from `void*' to `FILE*'

I think is a cast type error, but I'm not able to solve it..

Can anyone help me?

regards

Thats's all ok...

I've solved the problem by putting: ztmp=(FILE*)gzopen(tmpfname,"rb"

ie:

first: ztmp= gzopen(tmpfname,"rb"

; > error
after: ztmp=(FILE*)gzopen(tmpfname,"rb"

; > ok ( casting non implicit in c++ )

The other problem was during the linking process with dev-c++:

I had 4 linking error:

[Linker error] undefined reference to `gzopen'
[Linker error] undefined reference to `gzeof'
[Linker error] undefined reference to `gzread'
[Linker error] undefined reference to `gzclose'
ld returned 1 exit status

to solve this problem, is necessary to say the linker where is located the libz.a otherwise the linker does not recognize function methods.

( in devcpp go in project>option>linker and add the file required ).

bye

d8o8s8

December 1st, 2008, 15:31

You go guys, great effort and good cause. (:
I'm gonna join you soon trying to kick the .bdc files.

szereshki

December 2nd, 2008, 02:13

You are absolutely right dreamer. The buffer size is 1024 and the characters after 1019 chars will be cut. Maybe not an important problem in the C code. But since me as well as some other guys couldn't compile the code and went to the ASM code, I don't know how to workaround this. What if our melted guy (acidmelt) was here! And probably the busy guy -Bilbo- may have a suggestion.

antico2

December 2nd, 2008, 03:52

Hi,

if you want I can help in the development of the full version of program. Say me what to modify I can do it ( I've all installed on my pc..).

Bigal

December 2nd, 2008, 04:29

Quote:

[Originally Posted by antico2;78023]Hi,

if you want I can help in the development of the full version of program. Say me what to modify I can do it ( I've all installed on my pc..).

It woud be great if you could help with that buffer which is not big enough for all the characters. With some dictionaries an entry can be really big. It would also be great if you could produce a compiled version so that we could all test it. I had a lot of trouble trying to compile the sources, specially with the ZLIB. At the end, after a lot of wasted time I finally had to give up.

One more thing it would be great to have is the decompliling of the bdc format.

Good luck and thanks a million.

antico2

December 2nd, 2008, 05:13

Quote:

[Originally Posted by szereshki;77490]The original bgl result is attached (Farsi with windows Arabic encoding). The converted one has all the letters, but with wrong encoding. The source and target languages in the glossary properties tab is selected as English although. Changing from English will result in a more defective output.
I also attached the changed binary file. It seems there is just one step more. Bilbo! Its your turn again!
(p.s. I also tried these on Babylon v5)

Ok, I can make modification to the file ( but let me understand what to modify!! )
First of all you can find a compiled version of it in the atthached file I've quoted. Try it and after we speak about modification to do.

regards

Bigal

December 2nd, 2008, 05:29

Quote:

[Originally Posted by antico2;78025]Ok, I can make modification to the file ( but let me understand what to modify!! )
First of all you can find a compiled version of it in the atthached file I've quoted. Try it and after we speak about modification to do.

regards

Don't see any attachments. Am I missing something?

antico2

December 2nd, 2008, 05:32

go to the main address of forum: http://www.woodmann.com/forum/ and make the login from there

Bigal

December 2nd, 2008, 07:09

Quote:

[Originally Posted by antico2;78027]go to the main address of forum: http://www.woodmann.com/forum/ and make the login from there

I've done that but i still don't see your attachment

szereshki

December 2nd, 2008, 07:22

Bigal: Go directly to my post#25. I attached it there.
Antico2: How you compiled the c code? Any special compiler? let us now about it. thx

d8o8s8

December 2nd, 2008, 08:23

I found a simple small decompiler using google at
http://tankado.com/?2008/06/21/281-babylon-bgl-decompiler
I checked and it worked the the older version of BGL file I had (didn't work for the new ones).
I also verified with filemon it doesn't mess around (still not taking responsibility, not my file).
Hope this helps.

------------------ EDIT: sorry, just realized its the same app as bglgls previously posted on this thread.

antico2

December 2nd, 2008, 08:28

Szereshki I've used nothing of special, I've simply used DevC++ 4.9.9.2 and of course the zlib 1.2.3 downloaded as package from DevC++. The only think to remind is the casting problem ( not implicit in c++ see my post #35 and #36 ), the inclusion of ctype.h in the main and finally the linker problem solved by addressing DevC++ linker to the location ( folder ) of libz.a.
If can be useful I can post my entire DevC++ project.

regards

szereshki

December 3rd, 2008, 05:49

Thx antico2.
Anybody tried to simply increase the buffer from 1024 up?

antico2

December 3rd, 2008, 17:57

That's OK.

I've taken the code of the post #11 and modified the buffer size to 2048 in:

//write gls

.
.
char tmpbuff[2048];
.
.

Test it and let me know...

regards

Bigal

December 4th, 2008, 03:05

Quote:

[Originally Posted by antico2;78059]That's OK.

I've taken the code of the post #11 and modified the buffer size to 2048 in:

//write gls

.
.
char tmpbuff[2048];
.
.

Test it and let me know...

regards

Thanks. However I am sure that won't be enough for many dictionaries. I remember I had to multiply the buffer size by 6 or 7 or even more for some dictionaries.

szereshki

December 4th, 2008, 15:35

Quote:

[Originally Posted by antico2;78059]That's OK.

I've taken the code of the post #11 and modified the buffer size to 2048 in:

//write gls

.
.
char tmpbuff[2048];
.
.

Test it and let me know...

regards

It doesn't change anything. How about increasing 1019 in the 'if' statement to more too?

szereshki

December 7th, 2008, 09:29

Quote:

[Originally Posted by antico2;78033]Szereshki I've used nothing of special, I've simply used DevC++ 4.9.9.2 and of course the zlib 1.2.3 downloaded as package from DevC++. The only think to remind is the casting problem ( not implicit in c++ see my post #35 and #36 ), the inclusion of ctype.h in the main and finally the linker problem solved by addressing DevC++ linker to the location ( folder ) of libz.a.
If can be useful I can post my entire DevC++ project.

regards

May you plz post theDevC++ project? I have problems with linker? would you please tell us the details? Sorry.

Ulrezaj

December 7th, 2008, 16:30

Since I got most of my help from here, I figured I'd register and post what I've discovered for the benefit of all, to give something back

I'm working specifically on the japanese->english dictionary http://info.babylon.com/glossaries/4E9/Babylon_Japanese_English_dicti.BGL. My goal was to decompile it, extract all the data, then add additional entries from a different dictionary.

I don't know why (not a huge C person) but the code provided thus far misses a lot of entries, and, more importantly in my case, doesn't extract the alternate spellings from each record, which is critical for word recognition in Japanese. So I decided to write my own extractor (in Python, because it's awesome).

I don't know about other dictionaries, but in this one the record structure is:

header byte - record type/length byte as described earlier by Bilbo
length bytes - 1-2 bytes holding length of record
term length byte - byte holding length of term
term - the dictionary entry for this record
0x00 byte
unknown byte - never figured out what this does. I suspect it specifies the record contents, eg: has a definition, has alternate spelling, has a classification, etc
definition - term's definition, including html code and such
0x14 - separator byte (or end byte if definition was the last part of record)
0x02 - classification specifier - means a word type (noun, verb, etc) will follow
classification - in this case, was between 0x30 and 0x3b and was mapped in one of the 'id' records earlier in the dictionary
alternate spellings - separated by 0x## between 0x00 and 0x30 (seems arbitrary what the separator character is)

Note that the record length does not include the record header byte or the length bytes themselves.

Anyway, armed with this I created a quick and dirty program to parse it, and lo and behold, it works. The resulting file can be run through the Glossary builder and, at least as far as I've tested, appears to be identical to the original.

Code:

import traceback

styles = {48: "n", 49: "adj", 50: "v", 51: "adv", 52: "interj", 53: "pron",

          54: "prep", 55: "conj", 56: "suff", 57: "pref", 58: "art", 59: "aux"}



ps = open("u.gls","w"



# Header

ps.write("### Glossary title:Uru\n"

ps.write("### Author:Urudict\n"

ps.write("### Description:Urudict\n"

ps.write("### Source language:Japanese\n"

ps.write("### Source alphabetefault\n"

ps.write("### Target language:English\n"

ps.write("### Target alphabetefault\n"

ps.write("### Browsing enabled?No\n"

ps.write("### Type of glossary:00000000\n"

ps.write("### Case sensitive words?0\n\n"

# Glossary section

ps.write("### Glossary section:\n\n"



r = open("Babylon_Japanese_English_dicti.txt" # un-gz'd dictionary (see earlier posts)

e = open("entries.txt" # output of Bilbo's original code

for line in e.readlines():

    line = line.decode('utf-8')

    if '<entry>' not in line: continue

    line = line.split()

    offset = int(line[0][1:-1],16)

    entry = " ".join(line[2:-2])[1:-2]

    record = int(line[-2],16)

    spelling, type, defn, alts = "", "", "", ""

    

    try:

        r.seek(offset)

        bin = r.read(record)

        nib = int(hex(ord(bin[0]))[2])+1 # length of the 'length' header

        if len(hex(ord(bin[0]))) == 3: nib = 1 # 0x1 case

        bin += r.read(nib+1) # first byte and length headers aren't part of record length

        term = bin[nib+2:nib+2+ord(bin[nib+1])] # Get term from 2 bytes after nib + 'length'

        bin = bin[nib+4+ord(bin[nib+1]):] # discard up to and including term

        if bin.find("<I>" != -1: # check for spelling

            spelling = bin[bin.find("(":bin.find(""+1] # extract spelling

            bin = bin[bin.find(""+2:] # discard spelling if exists

        # at this point, bin should start with definition

        defn = bin[:bin.find('\x14')] # extract defintion

        bin = bin[bin.find('\x14')+1:] # discard definition

        if bin and bin[0] == '\x02': # check for type

            type = styles[ord(bin[1])] # extract type

            bin = bin[2:] # discard type

        if bin: # if bin isn't empty, rest of record is alts

            alts = bin

            for c in [chr(k) for k in range(1,30)]:

                alts = alts.replace(c, '\x00')



        ps.write(term.decode("shift-jis".encode('utf-8')) # write term

        for k in alts.split('\x00'):

            if k: ps.write("|"+k.decode("shift-jis".encode('utf-8')) # write alts

        #ps.write("\n<font color='blue'>"+type+"</font> " #type

        ps.write("\n"+type.encode('utf-8')) # write type

        if type: ps.write(". "

        ps.write(spelling.decode("shift-jis".encode('utf-8'))

        if spelling: ps.write(" "

        ps.write (defn.decode("shift-jis".encode('utf-8')+"\n\n" # definition

    except Exception:

        print hex(offset), hex(record)

        dmp = open("dmp.txt","w"

        r.seek(offset)

        dmp.write(r.read(record+nib+1))

        dmp.close()

        print traceback.format_exc()

        break



e.close()

r.close()

ps.close()

The code clearly isn't designed to be flexible or anything - I seriously just threw it together in an hour - but hopefully it might provide some insight as to how to go about making the perfect decompiler :P

antico2

December 7th, 2008, 19:35

Quote:

[Originally Posted by szereshki;78108]May you plz post theDevC++ project? I have problems with linker? would you please tell us the details? Sorry.

Ok, here attached you can find the devc++ project I use.

I've also atthached a compiled bglgls exe with the buffer more capable.

good luck
p.s.
If you have problems with devc++ let me help you.

regards

szereshki

December 12th, 2008, 03:32

Quote:

[Originally Posted by antico2;78118]Ok, here attached you can find the devc++ project I use.

I have building problems:
Dev C++: [Build Error] [Progetto1.exe] Error 1
Borland C++: Error: 'C:\BC5\LIB\ZLIB.LIB' contains invalid OMF record, type 0x21

Quote:

[Originally Posted by antico2;78118]I've also atthached a compiled bglgls exe with the buffer more capable.

Doesn't work. I tried a big bgl (>9mb). The previous bgl2gls works great but has the problem of cutting long definitions. But this one exports a 600kb gls (incomplete) and also dont delete the 50mb dat temporary file.

thx antico

szereshki

December 17th, 2008, 05:15

My compiler errors have been solved. No its very simpler to change some part of the code.
antico: you should change some other 1024s to 2048 or more.

Any body know the attributes of a bitmap which could be use in a gls? e.g. 24bit or 16 bit? 72 or 96? ...

szereshki

December 17th, 2008, 09:42

I increased all 1024s to more and it worked. I also changed the character validation function to always return 1 (for characters other than English) except for 1E and 1F characters (which are placed before and after a bitmap file name). I changed the Target language and alphabet from English and Default to Arabic (in my case).
Now it can generate a gls from my big bgl. To problems still exist:
1- I tried hFrasi advanced version (a Persian dic, 9.38mb) and the reproduced bgl is 4.78mb. Some part of definitions is still cut. This problem is not related to the buffer size. (try “forces” or “cut” to see).
2- Now it realizes the bitmap file, but doesn’t correctly include it in the bgl.

These represent a basic defect in the code. Compare two same words (author name) from the original and reproduced bgls:

antico2

December 17th, 2008, 17:18

Quote:

[Originally Posted by szereshki;78270]I increased all 1024s to more and it worked. I also changed the character validation function to always return 1 (for characters other than English) except for 1E and 1F characters (which are placed before and after a bitmap file name). I changed the Target language and alphabet from English and Default to Arabic (in my case).
Now it can generate a gls from my big bgl. To problems still exist:
1- I tried hFrasi advanced version (a Persian dic, 9.38mb) and the reproduced bgl is 4.78mb. Some part of definitions is still cut. This problem is not related to the buffer size. (try “forces” or “cut” to see).
2- Now it realizes the bitmap file, but doesn’t correctly include it in the bgl.

These represent a basic defect in the code. Compare two same words (author name) from the original and reproduced bgls:

Ok szereshki, I'm happy for your progress to solve compilation errors.
I think that we need the help of the original author of the code..

please post here the code you had modified so we can start to see it and think what's to do.

regards

szereshki

December 18th, 2008, 00:00

here is the code:

Code:



#include <stdio.h>

#include <stdlib.h>

#include <windows.h>

#include <ctype.h>

#include <zlib.h>

#pragma comment(lib,"zlib.lib"

#include "zlib.h"

#include "zconf.h"



//include namespace std

int isvalidchar(char ch);

void stripjunk(char *buffer,char type);

int focc(char *cstr,char ch);

int uncomp_bgl(char *bglname,char *datname);

int writegls(char *datname);



char glsheader[32768];

char glsheadertemplate[]=

"### Glossary title:%s\r\n"

"### Author:%s\r\n"

"### Description:%s\r\n"

"### Source language:English\r\n"

"### Source alphabetefault\r\n"

"### Target language:Arabic\r\n"

"### Target alphabet:Arabic\r\n"

"### Browsing enabled?No\r\n"

"### Type of glossary:00000000\r\n"

"### Case sensitive words?0\r\n"

";gls generated by bglgls\r\n\r\n"

"### Glossary section:\r\n\r\n";



int main(int argc,char **argv) {

int ix;

char szAuth[32];

char szTitle[32];

char szDescription[128];

char datfname[128];



if(argc!=2) { 

	printf("usage: bglgls.exe filename.bgl\n"; 

	return 0; 

}

//>get input

printf("gls Author:";

fgets(szAuth,32,stdin);

printf("gls Title:";

fgets(szTitle,32,stdin);

printf("gls Description:";

fgets(szDescription,128,stdin);



szAuth[strlen(szAuth)-1]=0;

szTitle[strlen(szTitle)-1]=0;

szDescription[strlen(szDescription)-1]=0;

sprintf(glsheader,glsheadertemplate,szAuth,szTitle,szDescription);

//>set output filename

strncpy(datfname,argv[1],128);

ix=focc(datfname,'.');

if(ix<0) { printf("invalid filename\n"; return 0; }

datfname[ix]=0;

strcat(datfname,".dat";

//>>

if(!uncomp_bgl(argv[1],datfname)) { printf("error uncompressing BGL.\n"; return 0; }

if(!writegls(datfname)) { printf("error writing GLS.\n"; return 0; }

return 0;

}

//>>uncompression routine

int uncomp_bgl(char *bglname,char *datname) {

FILE *ztmp;

FILE *zfile;

char iobuff[128];

char tmppath[256];

char tmpfname[256];

unsigned char zptrbyte;

int tread;



//get temp filename

GetTempPath(256,tmppath);

GetTempFileName(tmppath,"bgl",0,tmpfname);

ztmp=fopen(tmpfname,"wb";

if(!ztmp) return 0;

//>

zfile=fopen(bglname,"rb";

if(!zfile) return 0;

fseek(zfile,0x5,SEEK_SET);

fread(&zptrbyte,sizeof(char),1,zfile);

printf("zlib header@0x%X\n",zptrbyte);

fseek(zfile,zptrbyte,SEEK_SET);

while(!feof(zfile)) {

	tread=fread(iobuff,sizeof(char),128,zfile);

	fwrite(iobuff,sizeof(char),tread,ztmp);

}

fclose(zfile);

fclose(ztmp);

//>>uncompressing >

zfile=fopen(datname,"wb";

ztmp=(FILE*)gzopen(tmpfname,"rb";

if(!zfile||!ztmp) return 0;

while(!gzeof(ztmp)) {

	tread=gzread(ztmp,iobuff,128);

	fwrite(iobuff,sizeof(char),tread,zfile);

}

gzclose(ztmp);

fclose(zfile);

DeleteFile(tmpfname); //get rid of temporary file

return 1;

}

//write gls

int writegls(char *datname) {

FILE *fdic,*fgls;

int ix,rec_length;

short int lenword;

unsigned char hdr,high_nibble,lenbyte;

unsigned char lenmul,lenadd;

unsigned long datapos;

char tmpbuff[32768];

char glsf[256];

int tt=0,lt=0;



//gls filename

strcpy(glsf,datname);

ix=focc(glsf,'.');

glsf[ix]=0;

strcat(glsf,".gls";

printf("gls filename:%s\n",glsf);

fgls=fopen(glsf,"wb";

if(!fgls) return 0;

//>write header

printf("writing GLS";

fwrite(glsheader,sizeof(char),strlen(glsheader),fgls);

//>>parsing

fdic=fopen(datname,"rb";

if(!fdic) return 0;

while(1) {

	fread(&hdr,sizeof(char),1,fdic);

	if(feof(fdic)) break;



	//get record size

	high_nibble=hdr >> 4;

	if(high_nibble>=4) rec_length=high_nibble-4;

	else {

		for(ix=rec_length=0;ix<high_nibble+1;ix++) {

			rec_length*=256;

			fread(&lenbyte,sizeof(char),1,fdic);

			rec_length+=lenbyte;

		}

	}

	datapos=ftell(fdic);



	switch(hdr & 0xF) {

			case 1: {

			fread(&lenbyte,sizeof(char),1,fdic);

			memset(tmpbuff,0,32768);

			fread(tmpbuff,sizeof(char),lenbyte,fdic);

			if(!isalpha(tmpbuff[0])) break;

			stripjunk(tmpbuff,0);

			strcat(tmpbuff,"\r\n";

			fwrite(tmpbuff,sizeof(char),strlen(tmpbuff),fgls);

			fread(&lenmul,sizeof(char),1,fdic);

				fread(&lenadd,sizeof(char),1,fdic);

			memset(tmpbuff,0,32768);

			lenword=lenmul*256+lenadd;

			if(lenword>32608) lenword=32608;

			fread(tmpbuff,sizeof(char),lenword,fdic);

			stripjunk(tmpbuff,1);

			strcat(tmpbuff,"\r\n\r\n";

			fwrite(tmpbuff,sizeof(char),strlen(tmpbuff),fgls);

			if(tt-100==lt) { lt=tt; printf("."; }

			tt++;

			} break;

			default: break;

	}

	fseek(fdic,datapos+rec_length,SEEK_SET);

}

fclose(fdic);

fclose(fgls);	

DeleteFile(datname); //we dont need the *.dat anymore..

printf("%d terms written to file!\n",tt);

return 1;

}

//find occurrence

int focc(char *cstr,char ch) { 

int ix;

for(ix=0;(unsigned)ix<strlen(cstr);ix++)

	if(cstr[ix]==ch) return ix;

return -1;

}

//>

void stripjunk(char *buffer,char type) {

int ix,slen;

slen=strlen(buffer);



if(!type) {

	for(ix=1;ix<slen;ix++)

		if(buffer[ix]=='$') { buffer[ix]=0; break; }

	slen=ix;

}	

for(ix=0;ix<slen;ix++) 

	if(!isvalidchar(buffer[ix])) { buffer[ix]=0; break; }

}

//valid term/definition char

int isvalidchar(char ch) {

    if (ch==30 ||ch==31) return 0;

    return 1;//I didn't delete the old code here, but u can.

	int ix;

	char valtab[]="abcdefghijklmnopqrstuvwxyz 0123456789!@#$%&8()_-+=|{}[]<>\"',.%%/\\:;!?";

	ch=tolower(ch);

	for(ix=0;(unsigned)ix<strlen(valtab);ix++)

		if(ch==valtab[ix]) return 1;

return 0;

}

peterparler

December 29th, 2008, 04:08

hi, I found this site that links to a project, StarDict ("http://stardict.sourceforge.net/download.php"). sort of open source dictionary. in the source code ("http://sourceforge.net/project/showfiles.php?group_id=146506&package_id=221578"), there is something that may interesting you

peterparler

December 29th, 2008, 04:18

Quote:

[Originally Posted by peterparler;78395]hi, I found this site that links to a project, StarDict ("http://stardict.sourceforge.net/download.php"). sort of open source dictionary. in the source code ("http://sourceforge.net/project/showfiles.php?group_id=146506&package_id=221578"), there is something that may interesting you

brilliant! I didn't post the urls... here they are
StarDict: http://stardict.sourceforge.net/download.php
code: http://sourceforge.net/project/showfiles.php?group_id=146506&package_id=221578

bilbo

December 29th, 2008, 07:01

I'll never stop being astonished at all the nice things we can dig out from the net... Nice finding, peterparler, and nice C++ implementation (if the C one was too garbled :-)). I like too Python implementation by Ulrezaj, even if it is not yet a stand-alone application...
Best regards, bilbo

naides

January 1st, 2009, 09:25

Bilbo: Long time no see

dlldlldll

January 15th, 2009, 15:06

Please share the source code.

thanks, is very good

dlldlldll

January 16th, 2009, 00:20

After many tests it a bit of evolution, still has many errors, I am sending the source code in delphi, if someone wants to improve or add new features, is the will, but do not forget to share.

It will be great to have the dictionaries of babylon running on Linux and other OS.

Thank you all.
http://rapidshare.de/files/41722761/Unpack_BGL_-_Proj_Delphi_v1.0.7z.html

szereshki

January 17th, 2009, 04:46

Great KJWLCSLC! I'll test it with some Arabic, Persian and Unicode BGLs soon. I hope it work.

dlldlldll

January 17th, 2009, 12:58

Fix more errors

- working with unicode
- remove $xxxxxx$ from terms
- save attached files
- working with premium content

http://i39.tinypic.com/9r3m7m.jpg

http://rapidshare.de/files/42101483/Unpack_BGL_1.2.7z.html ("http://rapidshare.de/files/42101483/Unpack_BGL_1.2.7z.html")

szereshki

January 18th, 2009, 04:40

Quote:

[Originally Posted by KJWLCSLC;78754]Thank you dear dlldlldll!
1)Sorry about sharing the source code right here but if you want to have the source code I will mail it to you after you mail me your e-mail address to BGLCONVERT@GMAIL.COM .

2)I tested it with many BGLs but there was not any errors.

Thx KJWLCSLC. But why not a GLS output? what could I do with a 50mb html result!?
could I take a look at your code plz?
Thanks anyway. great work.

szereshki

January 22nd, 2009, 06:07

I'm using Babylon Glossary Builder v3.1.0(r10) and there is no option to import a html file. the only accepted file formats are gpr, gls or xls. would u plz guide me?

dlldlldll

January 29th, 2009, 12:11

Fix more errors

- Save to Firebird Database

http://rapidshare.de/files/44176320/Unpack_BGL_1.3.7z.html

leomoon

February 3rd, 2009, 04:39

Anybody knows how to extract the exe glossaries to get the BGL files out? An example would be the Concise Oxford English Dictionary and Thesaurus:
http://www.babylon.com//display.php?id=227&tree=5&level=3

Update: Sorry found the answer. When you run the exe, it extracts it into the temp folder. Then you can go and phish for it.

szereshki

February 4th, 2009, 02:09

Quote:

[Originally Posted by szereshki;78852]I'm using Babylon Glossary Builder v3.1.0(r10) and there is no option to import a html file. the only accepted file formats are gpr, gls or xls. would u plz guide me?

Anyone would answer me?

Quote:

[Originally Posted by dlldlldll;79024]Fix more errors

- Save to Firebird Database

http://rapidshare.de/files/44176320/Unpack_BGL_1.3.7z.html

Thanks dlldlldll. I applied it on hFarsi advanced version. but babylon builder crashes while using it. may u try it?

cysin

February 14th, 2009, 23:51

Quote:

[Originally Posted by dlldlldll;79024]Fix more errors

- Save to Firebird Database

http://rapidshare.de/files/44176320/Unpack_BGL_1.3.7z.html

Thank dlldlldll for your great work.

I am using it extracting English-Chinese dictionary, but it seems I can only extract the English words and its Chinese meaning. How can I get phonetic symbols and more information？

vragon

February 21st, 2009, 18:53

hi,
I attempted to use the unpack BGL program to decrypt the BGL file but it said that "realpoint not found"
so what does it mean???
and how do I fix it
regards,

cousinitt

March 28th, 2009, 13:37

I'm trying to convert the babylon german-english (http://www.babylon.com/dictionary/2522/Babylon-German-English.html) dictionary to stardict format so I can use it on my n810.

The problem is that when I use unpackbgl to transform the dictionary to gls, the gender information is lost and this is very inportant for german

.

The question: Is there any way to retreive the gender information? And mayby the correct plural form?

Looking at the raw dat I saw that there are bits of text like "Frau (die) -Frauen" so I'm assuming the information is there.

dlldlldll

September 3rd, 2009, 08:29

http://rapidshare.de/files/48273276/Unpack_BGL_1.4_-_Proj_Delphi.rar.html

- Source code
- support sqlite3 database

cli.iface

September 18th, 2009, 13:52

I've to say thanks for such effort, dictionaries are essential when we look for information

The babylon host a lot of "free" dictionaries that would be very interesting if they could be used in unix-like env too

Once again, thanks.

That's an amazing reverser work

BabylonBob

October 13th, 2009, 20:41

Quote:

[Originally Posted by dlldlldll;82768]http://rapidshare.de/files/48273276/Unpack_BGL_1.4_-_Proj_Delphi.rar.html

- Source code
- support sqlite3 database

Hey dlldlldll, just bump into the page and downloaded your sw. However, the exe doesnt start at all (Sending reports) ... Did I miss sth in the readme, or how is it intended to be used ?

Quote:

[Originally Posted by bilbo;77321]It is a long time I'm not using Babylon and it is become ever and ever more commercial...

Out of curiousity, what do you use if not Babylon ?

Btw, can we freely decompile the aspell dicts, if the BG: are so problematic ?

BabylonBob

October 14th, 2009, 19:08

P.S. Just to mention, I downloaded the previous version (1.3.7z), and it is seems to run well with unicode characters. So, which features the newest 1.4 version has will remain a secret for ever

.

BabylonBob

November 21st, 2009, 12:26

P.S2. Sorry to inform you, but apparently with 1.3.7 unicode is not fully supported (e.g. some unicode turkish characters). So, do you know whats wrong with 1.4 ?

olegt

November 23rd, 2009, 01:30

Hello, everybody!!!
I've used your prog and it's amazing!!! Thank you !!!
I'm wondering If such work could have been done on Java?
I would like to implement it, but the problem is I don't realy understand the format of BGL.
Any suggestions/thoughts are welcome.

Thanks in advance.

cysin

December 11th, 2009, 01:44

Looks like 1.4 version doesn't work properly, and it couldn't be started at all. Any idea about this?

maya

January 17th, 2010, 10:41

don't have any fix on 1.4? thanks

glikoz

January 19th, 2010, 16:59

we are waiting fix

or .net port of program ..
Thx for endeavor ..

dELTA

January 19th, 2010, 17:57

I am waiting blowjob by playboy bunnies
or threesome with the same ..
thx for endeavor ..

dlldlldll

May 3rd, 2010, 15:47

Unpack.BGL.v1.4.1.0
http://www.multiupload.com/FN97JZQBCU

Sorry for the delay, I am very busy.

jatvarthur

May 29th, 2010, 08:44

Quote:

[Originally Posted by dlldlldll;86400]Unpack.BGL.v1.4.1.0
http://www.multiupload.com/FN97JZQBCU

dlldlldll, is it possible to get source for this? thanks.

PPCC

November 21st, 2010, 12:34

Quote:

[Originally Posted by antico2;78118]Ok, here attached you can find the devc++ project I use.

I've also atthached a compiled bglgls exe with the buffer more capable.

good luck
p.s.
If you have problems with devc++ let me help you.

regards

How can I download bglgls3.zip? Thanks! >> It's OK now. Sorry.

PPCC

November 21st, 2010, 12:49

Quote:

[Originally Posted by jatvarthur;86671]dlldlldll, is it possible to get source for this? thanks.

Unpack.BGL.v1.4.1.0 works quite good for me. But in some cases it can not, such as this BGL file:

Code:

http://www.4shared.com/file/ikJChutb/Business_EV_2008_release_01.html

Please help me.

-------

And the other problem: I need to convert a BDC (Babylon Dictionary File) to GLS too. For example,

Code:

cambridge_advanced_learners_dictionary_2nd_ed.rar 

(6.75 MB, BDC inside)

http://www.mediafire.com/?2iyls1hjlmo

That since I would like to use its BGL format in GoldenDict but I can't find out BGL source of this glossary.

Thanks!

PPCC

November 24th, 2010, 08:53

Quote:

[Originally Posted by PPCC;88312]Unpack.BGL.v1.4.1.0 works quite good for me. But in some cases it can not, such as this BGL file:

Code:
http://www.4shared.com/file/ikJChutb/Business_EV_2008_release_01.html

Please help me.

-------
...
Thanks!

I can solve the first problem with the help of stardict-tools in Linux.

Zeper

November 28th, 2010, 22:54

Quote:

[Originally Posted by PPCC;88312]Unpack.BGL.v1.4.1.0 works quite good for me. But in some cases it can not, such as this BGL file:

Code:
http://www.4shared.com/file/ikJChutb/Business_EV_2008_release_01.html

Please help me.

-------

And the other problem: I need to convert a BDC (Babylon Dictionary File) to GLS too. For example,

Code:
cambridge_advanced_learners_dictionary_2nd_ed.rar
(6.75 MB, BDC inside)
http://www.mediafire.com/?2iyls1hjlmo

That since I would like to use its BGL format in GoldenDict but I can't find out BGL source of this glossary.

Thanks!

Where did downloaded Unpack.BGL.v1.4.1.0 ?
Could You give me download address？
thanks，

PPCC

November 29th, 2010, 04:54

Quote:

[Originally Posted by Zeper;88408]Where did downloaded Unpack.BGL.v1.4.1.0 ?
Could You give me download address？
thanks，

Copy MU link from above multiupload:

Code:

http://www.megaupload.com/?d=CFDK4ZKD

dlldlldll

January 15th, 2011, 08:18

http://www.interupload.com/files/P4OIG9KS/Unpack_BGL_1.4.2.0.rar_links

seyed_farid

April 8th, 2011, 09:02

Whoud you mind if i have delphi source code of this program?
Thanks alot for your attention.

altay

October 11th, 2011, 03:05

Quote:

[Originally Posted by dlldlldll;89099]http://www.interupload.com/files/P4OIG9KS/Unpack_BGL_1.4.2.0.rar_links

All links from interupload are dead. Anyone who have this version please upload it. Thanks.

giang_asl_8

January 5th, 2012, 00:55

Quote:

[Originally Posted by dlldlldll;89099]http://www.interupload.com/files/P4OIG9KS/Unpack_BGL_1.4.2.0.rar_links

May you share source code for this version.

altay

January 5th, 2012, 08:10

Quote:

[Originally Posted by giang_asl_8;91689]May you share source code for this version.

Thanks. It seems meanwhile dlldlldll reuploaded the file to some of those hosts.

giang_asl_8

January 9th, 2012, 12:34

Quote:

[Originally Posted by altay;91692]Thanks. It seems meanwhile dlldlldll reuploaded the file to some of those hosts.

Yes, the link are alive, but those are only the binary,not the source code. Anyone may share me the source code (any version is ok).
Thanks.

Jackob

February 29th, 2012, 06:26

Would somebody please reupload:

Unpack.BGL.v1.4.2.0

Thanks

Jackob

March 2nd, 2012, 08:43

I found Unpack_BGL_1.4.1.0

But when I extract dictioanries in Greek or Hebrew, the fonts don't appear properly.

I get something like this:

buffet supper
�ّه�ن ل��ّهْ ٍِ�� {ل�وًه�}

If someone has I found Unpack_BGL_1.4.2.0, please reupload.

Help please.

Thanks

douglascm

March 18th, 2012, 01:32

I think maybe it is an encoding problem with your text editor, not a problem with Unpack_BGL. Try to change the encoding to UTF8 or some Greek/Hebrew encodings.

Anyway, I downloaded v1.4.2.0 before that site went down. This is the original archive. Thanks dlldlldll for his great work!

http://depositfiles.com/files/ephugnuo8
https://hotfile.com/dl/149783679/753f80d/Unpack_BGL_1.4.2.0.rar.html

I am also wondering how to convert BDC (Babylon Dictionary File) to GLS. If someone knows, please give me some advice. Thanks!

Quote:

[Originally Posted by Jackob;91993]I found Unpack_BGL_1.4.1.0

But when I extract dictioanries in Greek or Hebrew, the fonts don't appear properly.

I get something like this:

buffet supper
�ّه�ن ل��ّهْ ٍِ�� {ل�وًه�}

If someone has I found Unpack_BGL_1.4.2.0, please reupload.

Help please.

Thanks

jerryim

March 22nd, 2012, 07:25

Quote:

[Originally Posted by douglascm;92080]I think maybe it is an encoding problem with your text editor, not a problem with Unpack_BGL. Try to change the encoding to UTF8 or some Greek/Hebrew encodings.

Anyway, I downloaded v1.4.2.0 before that site went down. This is the original archive. Thanks dlldlldll for his great work!

http://depositfiles.com/files/ephugnuo8
https://hotfile.com/dl/149783679/753f80d/Unpack_BGL_1.4.2.0.rar.html

I am also wondering how to convert BDC (Babylon Dictionary File) to GLS. If someone knows, please give me some advice. Thanks!

thank you very much. have a nice day

Jackob

March 23rd, 2012, 17:33

Thank you very much for your kindness.

Unfortunately I still couldn't view hebrew from the hebrew dictionary.

Thanks

Quote:

[Originally Posted by douglascm;92080]I think maybe it is an encoding problem with your text editor, not a problem with Unpack_BGL. Try to change the encoding to UTF8 or some Greek/Hebrew encodings.

Anyway, I downloaded v1.4.2.0 before that site went down. This is the original archive. Thanks dlldlldll for his great work!

http://depositfiles.com/files/ephugnuo8
https://hotfile.com/dl/149783679/753f80d/Unpack_BGL_1.4.2.0.rar.html

I am also wondering how to convert BDC (Babylon Dictionary File) to GLS. If someone knows, please give me some advice. Thanks!

aryaei

December 5th, 2012, 18:03

hi,
i am trying to include bgl support to my dictionery which has been written by wpf & C#

but i don't have much knowledge of c++. can any one help me over this and get me some way point or somekind of help(what? source code? thanks that would be great

)

aryaei

December 9th, 2012, 04:37

nothing?? at least someone give me c++ code that works with any bgl and any language...

altay

June 24th, 2013, 10:55

Hello,
I still have the delphi project v1.4 of dlldlldll. The executable doesn't work.
http://netload.in/dateiF2igS59xMN/UnpackBGL1.4.7z.htm