View Full Version : Dissasmbling Chip8 Rom but cannot seem to seperate code from data
Pingcrosby
December 20th, 2011, 07:54
Hi
I hope this is the correct place to post on this forum.
I currently completing work on a Chip8 dissasembler - however I am having difficulty seperating code from data. I am hoping somebody can shed some light on techniques available to me that will enable me to distinguise code from data.
Chip8 instructions are 2bytes wide - so currently i just step 2 bytes at a time and dump the file accoding to the opcodes i read.
eg if my simplified source assembly looks like
JP START
DB 'Hello word hope your ok..etc"
START:
HIGH
And my dissasembly looks like (please note this is a trivialised example)
0x200 JMP 209 ;; correct
0x202 SHL 12, 7 ;; incorrect this is just Hello word hope your ok data
0x204 MOV 12, 7 ;; incorrect this is just Hello word hope your ok data
0x206 SNE 22, 4 ;; incorrect this is just Hello word hope your ok data
0x208 SHR 22, 3 ;; incorrect this is just Hello word hope your ok data
0x20A MOV 55, 3 ;; incorrect incorrect assembly as it should now be 0x209 HIGH
;; the rest of the dissasembly is out-of-sync and is just incorrect
What i ideally want to see is
0x200 JMP 209 ;; correct
0x202 DATA;
0x204 DATA;
0x206 DATA;
0x208 DATA;
0x209 HIGH; ;; correct
0x20B MOV 10, 19 ;; correct
Are there any generic techniques to deal with this?
Thanks in advance
Maximus
December 20th, 2011, 09:49
...if they were existing, decompiling would be a trivial exercise... at the end of the day, THAT is the problem of decompiling.
Short answer: no.
Long answer: you can use some heuristic and code analysis techinuqe in order to predict if something is 'code' or 'data'. Problem is, there's always a certain degree of failure, that depends by the kind of code/machine you are disassembling.
blabberer
December 20th, 2011, 10:56
googling around to find what chip8 is/was i see someone writing a dis assembler
http://www.emulator101.com/chip-8-disassembler/ maybe it is useful to you maybe you already saw it
wikipedia says chip 8 uses only 3584 bytes for its memory
and has only 35 opcodes
CHIP-8 has 35 opcodes, which are all two bytes long. The most significant byte is stored first. The opcodes are listed below, in hexadecimal and with the following symbols:
well iirc Benglays pvdasm understands this chip8 and iirc it is open source maybe you can check it out
anyway i compiled the disassembler after some tweaking in the above link
and disassembled a tetris game that i randomly downloaded from net
seems to be working dont know if it does code analysis or not
Code:
VISUAL~1\Projects\chip8dis>chip8dis.exe TETRIS
0200 a2 b4 MVI I,#$2b4
0202 23 e6 CALL $3e6
0204 22 b6 CALL $2b6
0206 70 01 ADI V0,#$01
0208 d0 11 SPRITE V0,V1,#$1
020a 30 25 SKIP.EQ V0,#$25
020c 12 06 JUMP $206
020e 71 ff ADI V1,#$ff
0210 d0 11 SPRITE V0,V1,#$1
0212 60 1a MVI V0,#$1a
0214 d0 11 SPRITE V0,V1,#$1
Pingcrosby
December 20th, 2011, 11:13
Hi,
thanks for replies - I had not seen http://www.emulator101.com/chip-8-disassembler/ - but it essentially does what i do.
Unfortunately I am trying to just go further than dumping opcodes and do some kind of analysis.
So my current plan is -
read the file - assume that everything is an instruction
validate each instruction - eg ensure its a valid opcode and ensure the operands are in range (check outside memory bounds and register indexes between 0 and 15)
find the first branch in the code (and re-read if necessary all instructions from this offset onwards)
repeat...
Somehow logically this seems incorrect...! If a branch occurs which causes a jump backwards then i could end up in some kind of never ending loop
The thing is the Chip8 instruction set is quite simple so I should be able to do this relatively easy but for the life of me i cannot think of a reasonable solution - it does not have to be 100% perfect just a really good guess.
I think PVDasm does some form of analysis; is the source code available to even give me a hint as to how it could be done?
Thanks again
Maximus
December 20th, 2011, 13:36
there's more behind the scenes, than this.
Example: say you have a JMP $ADDR opcode. Now, imagine you have a LDA+BNE/BEQ+ADC+STA sequence that changes the value of $ADDR depending on the condition's outfit.
How can you discover the address of your code segments without actually emulating (all) of the code paths?
Pingcrosby
December 21st, 2011, 07:23
All,
I have downloaded the source for the excellent "Borg" which looks promising. From what i can gather having taken a breif (20mins) look at the code - it reads blocks of instructions stopping on branches and decodes/validates each instruction in the block adding them to a table. If a branch causes an overlap of code already in the list it removes the old code and overwrites it with the new.
Thats what i think it does? I am probably wrong.
Can anyone suggest any other references for this - I presume other people have had this issue when writing there own dissasmblers.
naides
December 21st, 2011, 09:30
If this is a serious project and you are willing to invest a substantial ammount of time into it, consider IDA.
It gives you the hability of building custom processor modules. (See th IDA PRO book by Chris Eagle, available in the wild). It may take A substantial ammount of initial analisis and effort, but once you finish it, gives you the power of IDA, with all of its bells and whistles.
Pingcrosby
December 21st, 2011, 10:10
To be honest - its not particularly serious. I just was bored at work so i knocked up a chip8 emulator.
Once i had that working I thought "mmmm it would be nice to debug it !" - so i began my journey with dissasembling. However I soon realised that trivially dumping instructions and opcodes from file bytes is not the way to go forward and some form of simple analysis of the input is needed.
It is at this point i am kind of stuck to be honest
blabberer
December 22nd, 2011, 01:12
some body has written a disassembler / debugger / emulator / in .net that runs in win7
see sharpchip8
Bengaly
January 15th, 2012, 03:24
here's my (very) old code for CrazyChip-8, sadly i lost the code for PVDasm plugin, but that can be easily re-coded.
Sources are in VC++ / ASM.
though i know the CPU emulation has some bugs in it, maybe it will set you on some coding direction

Powered by vBulletin® Version 4.2.2 Copyright © 2018 vBulletin Solutions, Inc. All rights reserved.