Segfault
January 12th, 2011, 13:34
Greetings all!
I recently came across a packer I'd like to classify. Actually I came across some packed data while hex editing a BIOS rom for a HP laptop. I managed to locate a document that weakly describes the algorithm, but it doesn't mention any names, and only documents the decompression routine. I'd like to now the name of this algorithm, and if there are any (open-source) packers and unpackers available for it. The unpacking routine goes like this:
Suppose you have two buffers, one filled with compressed data and another empty one for uncompressed data. The compressed data is structured like this: There is always an instruction byte followed by at least 8 more data bytes. This pattern repeats throughout the compressed data...
[instruction byte][at least 8 bytes of data][instruction byte][at least 8 bytes of data][instruction byte][at least 8 bytes of data] etc...
If the instruction byte is 0xFF it means that the operation is a full copy operation. All eight bytes following the instruction are copied to the output buffer. The other possibility is that the instruction is not 0xFF. Consider an instructon byte of 0xF3. If we convert 0xF3 to binary we get 11110011. For each "set" bit (1) the code copies one data byte to the output buffer (it appends it at the end). If we now look back at the previous 0xFF instruction, and convert it to binary, we see that it consists of 11111111. This is eight set bits, meaning that all eight bytes get copied.
The code reads these bits in reverse order. For instruction 0xF3 we get 11110011 in binary, and reversing that gives us 11001111. So what happens when a bit is "not set" (0)? Now things are getting interesting. For each bit that is not set, the code will read two bytes of packed data at the appropriate location. This means the number of bytes that follow an instruction byte is always one byte for each set bit plus two bytes for each unset bit in the instruction byte (unless we hit an EOF). The two bytes read are then processed like this:
The code swaps the bytes (i.e. AABB would end up BBAA), and splits them at 12 bits. We get one 12-bit number and another 4-bit one. The first number is the offset i.e. how many bytes back from the write pointer in the output buffer does the code need to set the read pointer in order to copy [3 + (the second number)] of bytes to the location of the output buffer write pointer. It must always copy at least 3 bytes. By doing this the uncompressed data buffer actually acts like a wordlist.
Example of compressed data:
FF 00 11 22 33 44 55 66 77 FF 88 99 AA BB CC DD EE FF 1B 12 34 04 01 56 78 EOF
How it works:
The first instruction is 0xFF - the code copies all 8 bytes to the output buffer, which looks like this:
00 11 22 33 44 55 66 77
Next operation is another copy operation. The output buffer is now:
00 11 22 33 44 55 66 77 88 99 AA BB CC DD EE FF
Now we hit an instruction of 0x1B (binary 00011011, reverse 11011000). The first two bits in reverse order are set. This means the first two bytes (0x12 and 0x34) are copied to output. The buffer is now:
00 11 22 33 44 55 66 77 88 99 AA BB CC DD EE FF 12 34
Next, we hit an unset bit. We now read two bytes (0x04 and 0x01). If we reverse 0401 we get 0104. The code then splits this into 12:4 bit ratio: 010 and 4 - we get 0x10 and 0x04. The first number (0x10) is an offset that means 16 bytes backwards in the uncompressed code from the current write location. The second number indicates the number of bytes to copy + 3 = 7. So let's go 16 bytes backwards and select 7 bytes at that location:
00 11 22 33 44 55 66 77 88 99 AA BB CC DD EE FF 12 34
The selected bytes are now copied at the end. The new output buffer now looks like:
00 11 22 33 44 55 66 77 88 99 AA BB CC DD EE FF 12 34 22 33 44 55 66 77 88
Finally, there are two more set bits - the last two bytes are copied from compressed data to output buffer. There are still some bits left, but since we hit and EOF in the compressed section it means we have finished unpacking. The final uncompressed data is now:
00 11 22 33 44 55 66 77 88 99 AA BB CC DD EE FF 12 34 22 33 44 55 66 77 88 56 78
So now my questions to you:
What is the name of this routine?
Is there any existing code available to pack and unpack data using this algorithm?
Thanks in advance!
I recently came across a packer I'd like to classify. Actually I came across some packed data while hex editing a BIOS rom for a HP laptop. I managed to locate a document that weakly describes the algorithm, but it doesn't mention any names, and only documents the decompression routine. I'd like to now the name of this algorithm, and if there are any (open-source) packers and unpackers available for it. The unpacking routine goes like this:
Suppose you have two buffers, one filled with compressed data and another empty one for uncompressed data. The compressed data is structured like this: There is always an instruction byte followed by at least 8 more data bytes. This pattern repeats throughout the compressed data...
[instruction byte][at least 8 bytes of data][instruction byte][at least 8 bytes of data][instruction byte][at least 8 bytes of data] etc...
If the instruction byte is 0xFF it means that the operation is a full copy operation. All eight bytes following the instruction are copied to the output buffer. The other possibility is that the instruction is not 0xFF. Consider an instructon byte of 0xF3. If we convert 0xF3 to binary we get 11110011. For each "set" bit (1) the code copies one data byte to the output buffer (it appends it at the end). If we now look back at the previous 0xFF instruction, and convert it to binary, we see that it consists of 11111111. This is eight set bits, meaning that all eight bytes get copied.
The code reads these bits in reverse order. For instruction 0xF3 we get 11110011 in binary, and reversing that gives us 11001111. So what happens when a bit is "not set" (0)? Now things are getting interesting. For each bit that is not set, the code will read two bytes of packed data at the appropriate location. This means the number of bytes that follow an instruction byte is always one byte for each set bit plus two bytes for each unset bit in the instruction byte (unless we hit an EOF). The two bytes read are then processed like this:
The code swaps the bytes (i.e. AABB would end up BBAA), and splits them at 12 bits. We get one 12-bit number and another 4-bit one. The first number is the offset i.e. how many bytes back from the write pointer in the output buffer does the code need to set the read pointer in order to copy [3 + (the second number)] of bytes to the location of the output buffer write pointer. It must always copy at least 3 bytes. By doing this the uncompressed data buffer actually acts like a wordlist.
Example of compressed data:
FF 00 11 22 33 44 55 66 77 FF 88 99 AA BB CC DD EE FF 1B 12 34 04 01 56 78 EOF
How it works:
The first instruction is 0xFF - the code copies all 8 bytes to the output buffer, which looks like this:
00 11 22 33 44 55 66 77
Next operation is another copy operation. The output buffer is now:
00 11 22 33 44 55 66 77 88 99 AA BB CC DD EE FF
Now we hit an instruction of 0x1B (binary 00011011, reverse 11011000). The first two bits in reverse order are set. This means the first two bytes (0x12 and 0x34) are copied to output. The buffer is now:
00 11 22 33 44 55 66 77 88 99 AA BB CC DD EE FF 12 34
Next, we hit an unset bit. We now read two bytes (0x04 and 0x01). If we reverse 0401 we get 0104. The code then splits this into 12:4 bit ratio: 010 and 4 - we get 0x10 and 0x04. The first number (0x10) is an offset that means 16 bytes backwards in the uncompressed code from the current write location. The second number indicates the number of bytes to copy + 3 = 7. So let's go 16 bytes backwards and select 7 bytes at that location:
00 11 22 33 44 55 66 77 88 99 AA BB CC DD EE FF 12 34
The selected bytes are now copied at the end. The new output buffer now looks like:
00 11 22 33 44 55 66 77 88 99 AA BB CC DD EE FF 12 34 22 33 44 55 66 77 88
Finally, there are two more set bits - the last two bytes are copied from compressed data to output buffer. There are still some bits left, but since we hit and EOF in the compressed section it means we have finished unpacking. The final uncompressed data is now:
00 11 22 33 44 55 66 77 88 99 AA BB CC DD EE FF 12 34 22 33 44 55 66 77 88 56 78
So now my questions to you:
What is the name of this routine?
Is there any existing code available to pack and unpack data using this algorithm?
Thanks in advance!