LLXX
March 13th, 2006, 06:32
Not a reversing question, but a programming one. ("Advanced reversing and programming" so I believe it goes here.)
I've written a base-85 encoder and decoder (http://en.wikipedia.org/wiki/ascii85) for a project. However, one of them does not seem to be working correctly and only for some cases of input. I've manually done the arithmetic, and it seems that this may be a cause of a flaw in how the official Ascii85 specification handles partial blocks (base85 encodes 4 bytes blocks into 5 bytes), by truncating the output. One particular example I encountered was while trying to encode the three-letter string "art":
61 62 74 ; original string
61627400 ; pad with zero and turn into a dord, as per specification
01254ca8 ; quotient after division by 85 (55h) remainder: 38h
00037359 ; quotient after division by 85 (55h) remainder: 1bh
00000a64 ; quotient after division by 85 (55h) remainder: 25h
0000001f ; quotient after division by 85 (55h) remainder: 19h
The "digits" of the base-85 number are then 1f,19,25,1b,38 but according to the specification, since I'm encoding a partial block, only the first 4 are used and the last one is dropped, resulting in 1f,19,25,1b. However, when this is attempted to decode, it is padded with nulls to become 1f,19,25,1b,00, and the decoding process:
1f*55 + 19 = a64
a64*55 + 25 = 37359
37359*55 + 1b = 1254ca8
1254ca8*55 + 0 = 616273c8
Output: 61 62 73
Notice the original string was 61 62 74, while this is 61 62 73. Noticing the c8 at the end, perhaps it is necessary to round the output before taking the first 3 bytes? This is *not* mentioned at all in the official specification (which only details the encoding process, and only says that "decoding is the inverse of encoding" :thinking
. I don't understand why a lossless encoding such as this would just drop the last byte of the output - information is being lost there. I think it's not impossible for this case to occur also with 1 and 2-byte partial blocks as well.
From searching on the Internet for the sourcecodes of other base85 decoders, I've found three variations:
1. Ignore the extra bytes completely when outputting partial blocks
2. Add a power of 85 corresponding to the length of the partial block (length=3,add 85; length=2,add 85²; length=1,add 85^3)
3. Add 128 to the byte preceding the last one to be output
I'm supposing that only one of those is correct. Currently my decoder is following (1) and it doesn't look like it's working, so it's either (2) or (3).
Another possibility is that my encoder is wrong, as incrementing the last byte output may solve the problem (but introduce another one in the process
)
Any thoughts or comments on this?
I've written a base-85 encoder and decoder (http://en.wikipedia.org/wiki/ascii85) for a project. However, one of them does not seem to be working correctly and only for some cases of input. I've manually done the arithmetic, and it seems that this may be a cause of a flaw in how the official Ascii85 specification handles partial blocks (base85 encodes 4 bytes blocks into 5 bytes), by truncating the output. One particular example I encountered was while trying to encode the three-letter string "art":
61 62 74 ; original string
61627400 ; pad with zero and turn into a dord, as per specification
01254ca8 ; quotient after division by 85 (55h) remainder: 38h
00037359 ; quotient after division by 85 (55h) remainder: 1bh
00000a64 ; quotient after division by 85 (55h) remainder: 25h
0000001f ; quotient after division by 85 (55h) remainder: 19h
The "digits" of the base-85 number are then 1f,19,25,1b,38 but according to the specification, since I'm encoding a partial block, only the first 4 are used and the last one is dropped, resulting in 1f,19,25,1b. However, when this is attempted to decode, it is padded with nulls to become 1f,19,25,1b,00, and the decoding process:
1f*55 + 19 = a64
a64*55 + 25 = 37359
37359*55 + 1b = 1254ca8
1254ca8*55 + 0 = 616273c8
Output: 61 62 73
Notice the original string was 61 62 74, while this is 61 62 73. Noticing the c8 at the end, perhaps it is necessary to round the output before taking the first 3 bytes? This is *not* mentioned at all in the official specification (which only details the encoding process, and only says that "decoding is the inverse of encoding" :thinking

From searching on the Internet for the sourcecodes of other base85 decoders, I've found three variations:
1. Ignore the extra bytes completely when outputting partial blocks
2. Add a power of 85 corresponding to the length of the partial block (length=3,add 85; length=2,add 85²; length=1,add 85^3)
3. Add 128 to the byte preceding the last one to be output

Another possibility is that my encoder is wrong, as incrementing the last byte output may solve the problem (but introduce another one in the process

Any thoughts or comments on this?