[Essay02 - XCOM Pitch-Patching (introducing alien code)]
I played UFO: Enemy Unknown (XCOM: Ufo Defense in the US) a lot in the good old dos days. Some months ago, I felt like playing the game again, and found out it didn't run under win2k. (Don't know what happens under win9x). Still, when I saw the Collectors Edition in a store a few weeks later, for around $12, I just had to buy it, even if I would only be able to play it on my kid brothers' machine.
Great was my joy when I found out that the games had been ported to win32 and DirectDraw/DirectSound - now they might finally work on my own machine! And annoyed was I when I started the game and the graphics were all garbled. It looked pretty much like some problems I had with my own first DirectDraw tests... ignoring the fact that width!=pitch. No bugfixes were to be found. No official patches. And Mythos Games seem to have disappeared.
A couple of months later I was pretty bored, and needed something to do, to stay awake. One of my friends were here, XCOM fan like me, and he had been annoyed by the problem as well. So I thought... why not fix it? ;).
A lot of thanks to cynica_l for helping me find a stupid register / calling convention / lack of sleep related bug. A lot of thanks to _Stone for "oh.... interesting... do you use the correct flags for your SetThreadContext?" which made me smack my head against the wall, and made the 9x loader work. Thanks to the various people who have tested my loaders.
Ok, enough background information. Now time for some technical information you might or might not know. I must admit I don't know the reason pitch was invented. Something to do with performance I guess, but what performance? More efficient algorithms, or internal card architecture? Too lazy to google, if anybody knows why, let me know ;).
Now, the following COM/OOP explanation is based on my own limited knowledge of the issue. There might be flaws and/or misunderstandings. But it should at least give enough (correct) information to be able to follow the rest of this essay ;).
DirectDraw is COM based. This means you'll most likely only see a single API reference: DirectDrawCreate. From this, you get an object back. I won't deal with the issue of Interface Querying here, as XCOM uses the original DDraw interface only :). Objects are pretty easy to use in C++. object.method or object->method. With standard objects, this ends up in "push offset object" followed by "call method".
However, to support the multiple interfaces, the DirectDraw objects have "virtual" methods. Depending on the internal implementation, this perhaps wouldn't have been necessary, but it helps a bit when you want to support multiple programming languages, I guess. When using virtual methods in a C++ object, something called the "vtable" is constructed.
The vtable is an array of pointers stored at offset 0 of an object instance (object variable). This allows, for instance, to have a dummy "base class" that doesn't implement any functions, and have a lot of derived classes that implement the base methods in different ways. Like an encryption class. A simple example might include setkey, encrypt, and decrypt. You could then have classes that implement twofish, rinjdael, serpent. In your application you could have a single object variable of the base type. However, this object could be used to "point to" any of the descendants... and your code wouldn't need to know which algorithm that's used, as the vtable takes care of redirecting the generic "encrypt" method to the specific implementation. (Ok, this could be written clearer I guess).
When a C++ method is called (virtual or not), it's first parameter is always the this pointer, a pointer to the object instance. You don't see this in your programs though, as the C++ compiler conveniently hides the details for you.
Anyway, when you do object->method(parm)
when the method is
virtual, code similar to the following would be generated:
mov eax, [pObject] ; get pointer to object
mov ebx, [eax] ; ebx now points to vtable
push parm ; function argument
push eax ; "this" pointer
call [ebx + 0] ; call method through vtable
This code is somewhat longer than normal API or function calls, and can certainly be more difficult to follow. And this is what you deal with when digging into DirectDraw code. There are a couple of ways to make this easier. The first method, which is pretty brute, is to create an application to do lookups. If you know the class interface (which is necessary to use it in your programs - thus DDraw interface is pretty wellknown ;), you can easily work out what vtable entries contain which methods. So, you enter class and method names in a couple of tables. Select class name from listbox, write vtable index in an edit box, and presto the app comes up with the method name.
Yeah this works. But you still see call dword ptr [ecx + 064h]
or
similar in IDA, and you have to add comments yourself and... ick.
After my reversing job was done, I was informed of an easier way to
do this (which I had looked for but was unable to find... silly me).
First, open the "structures" subview, and add a new structure. We'll
add the IDirectDraw defintion, so type in "IDirectDraw". We see in
DDraw.h that there are 23 methods. So add (23*4) 92 bytes to the
structure. Time to start naming them... the structure editor is a bit
weird at first, but you'll get used to it ;). First you have to change
data type. The first time you press "d" the field will get associated a
name field_x. All method pointers are (obviously) dwords.
The first method is QueryInterface, the next is AddRef, and the third is Release. These three methods are common in all COM classes. The rest you'll have to look up in DDraw.h ;). When you look through it, you will notice that there are IDirectDraw and IDirectDraw2 - we're only interested in IDirectDraw (the original version) for this essay). Yes, all this typing in is pretty tedious, but when you're through, you can "file->produce->dump typeinfo to IDC file". You can then cut away the crap from the IDC, and use your structure definitions is future IDA sessions.
When you've typed in the IDirectDraw definition, also add a struct entry for IDirectDrawSurface. Argh, even *more* methods ;).
Ok, now the prerequisites should have been taken care of. The actual reversing can take place. We know (or suspect ;) that DirectDraw calls are what we are interested in. The game is an old DOS game, running 320x200. Back then it was normal to keep a "pixel buffer" in system memory, and for each frame copy this directly to the framebuffer. Converting this directly to DirectDraw, without taking the surface pitch into consideration, causes graphics problems like XCOM has. If you have no prior DirectDraw experience, now is probably a good time to look at the DirectX SDK info ;).
A brief overview: it all starts with DirectDrawCreate. This gives you a pointer to a DirectDraw object. A method of this object is used to create a DirectDrawSurface - which is another object. A surface represents video memory. To get direct access to the pixel data of a surface, you call it's Lock method. So we can assume the flawed code is located close to a DDrawSurf->Lock call.
Ok. Start by going to the DirectDrawCreate thunk function (the one that
calls [__imp_DirectDrawCreate]. Open the cross reference window. There
should be only one reference. Go there. DirectDrawCreate has the following
prototype:
HRESULT WINAPI DirectDrawCreate(GUID FAR *lpGUID, LPDIRECTDRAW FAR *lplpDD,
IUnknown FAR *pUnkOuter);
We are interested in the lplpDD parameter, as it will be filled with the pointer to a DirectDraw object on output. Yes, the DX SDK really does come in handy - always try to gather as much information about your target as possible.
Go to the pDirectDraw variable, and change it's name from "dword_xxxxxx" to pDirectDraw (or whatever you fancy - pDirectDraw would be a logical choice though ;). Now open the cross reference window. You will want to go through all the cross reference that involves a mov *from* the variable, as this is what's done in the method calling stuff.
An example piece of code, from my first cross reference. This is the
initial commenting I did on it.
.text:0045C5FD mov eax, pDirectDraw ; get pointer to object
.text:0045C602 mov edx, dword_479B60
.text:0045C608 push 51h
.text:0045C60A push edx
.text:0045C60B mov ecx, [eax] ; ecx = ptr to object vtable
.text:0045C60D push eax ; push "this" pointer
.text:0045C60E call dword ptr [ecx+50h]
.text:0045C611 test eax, eax
.text:0045C613 jz short @@OK_2
.text:0045C615 xor eax, eax
.text:0045C617 pop ebx
.text:0045C618 add esp, 74h
.text:0045C61B retn
.text:0045C61C @@OK_2: ; CODE XREF: sub_45C5E0+33j
Ok, now comes the magic. We know we're dealing with a IDirectDraw object. We'd like to transform "ecx+50h" into a method name. Since you went through all the trouble of typing in all the vtable pointers, you can now right-click the line containing "ecx+50h", and you should see "ecx+IDirectDraw.SetCooperativeLevel" in the list. Great, this makes life a lot easier :). With some DX SDK browsing and header file studying, you could come up with the following disassembly (no, you don't need to do this, as this is not the call we're interested in - but I don't think it ever hurts to document what's going on):
.text:0045C5FD mov eax, pDirectDraw ; get pointer to object
.text:0045C602 mov edx, hWnd
.text:0045C608 push 51h ; DDSCL_FULLSCREEN | DDSCL_EXCLUSIVE | DDSCL_ALLOWMODEX
.text:0045C60A push edx
.text:0045C60B mov ecx, [eax] ; ecx = ptr to object vtable
.text:0045C60D push eax ; push "this" pointer
.text:0045C60E call [ecx+IDirectDraw.SetCooperativeLevel]
.text:0045C611 test eax, eax
.text:0045C613 jz short @@OK_2
.text:0045C615 xor eax, eax
.text:0045C617 pop ebx
.text:0045C618 add esp, 74h
.text:0045C61B retn
.text:0045C61C @@OK_2: ; CODE XREF: sub_45C5E0+33j
The first comment-adding was already an improvement, and I think the
second comment-adding makes the code rather easy to read. This is my
first "real" reversing project, and it's been some time since I dealt
with DirectDraw, so I found it helpful to comment almost all the method
calls like this. You can just convert the "[register + offset]"
and move on if it's not "CreateSurface". Up to you :). There should
only be one cross reference to CreateSurface. Createsurface prototype:
HRESULT CreateSurface(LPDDSURFACEDESC lpDDSurfaceDesc, LPDIRECTDRAWSURFACE FAR
*lplpDDSurface, IUnknown FAR *pUnkOuter);
It is obviously the lplpDDSurface we're interested in. Note that there are multiple DirectX SDKs. (old), DX5, DX7, DX8... the problem here is that the DX documentation in DX7, well, assumes DX7. So when you look up "CreateSurface" in the DX7 SDK, it will say "LPDDSURFACEDESC2" and "LPDIRECTDRAWSURFACE7". This is obviously not the case when you're working on older DirectDraw programs ;). So either use a bit of guessing (like me), or find an older DX SDK.
Well, give the LPDIRECTDRAWSURFACE variable a name. I chose "pSurface". Again, time to track cross references. This time they will obviously me IDirectDrawSurface methods, not IDirectDraw :). I was a bit surprised after doing this. I found a reference to "Flip" but not to "Lock". Hm. However, there's a call to "GetAttachedSurface". Ok, after you read and add comments (be sure to (ab)use IDAs "standard symbol constants"), you'll see that a surface with "DDSCAPS_BACKBUFFER" is requested.
Ok, so this call gives us YET a DirectDrawSurface object, this time for the backbuffer. The backbuffer is an offscreen (not visible) buffer, which is almost always stored in display memory. The reason to have a front- and backbuffer is the following: if you draw directly to the frontbuffer, and accidentally draw while the screen update is in progress (quite easy to do ;), the image on the monitor would show part of the old image, part of the new. If "enough" pixels on screen have been changed (like in a 3d game), this will be quite noticable, and will give the "tearing" effect. The solution is to draw into the backbuffer and, when done, tell the display card to swap front- and backbuffers. This is done by the video card by changing a "where do I get my data from" pointer, instead of copying the data over.
Ok, enough explanation of video techniques for now ;). It's time to follow cross references to "pSurface2" and hope we find a Lock method call. Well, I did ;). A function located at 45C800 which has one purpose: to Lock() the backbuffer (after verifying it isn't a NULL pointer), and returning the pointer to video memory. You might want to study the "return pointer or NULL" code - it's pretty neat. See, the Lock method doesn't return the pointer - it returns DD_OK or one of the DDERR_* error codes. The backbuffer pixel pointer is stored in the DDSurfaceDesc structure that's passed on to the method. The code that returns either NULL or the pointer from the struct is pretty neat and avoid conditional jumps :).
I named this function "lockSurface", and proceeded to the cross references
window. IDA is such a nice and strong disassembler. One xref. Another short
function, which I dubbed "internalBufferToVidmem". And the offending code:
.text:0042FF29 mov esi, [ebp+ptrDest]
.text:0042FF2C mov edi, [ebp+ptrSrc]
.text:0042FF2F mov ecx, 3E80h ; (320 * 200) / 4
.text:0042FF34 repe movsd
This code definitely does not take the surface "pitch" into account ;). So, how do we fix this? We need a modified transfer loop. We need the pitch of the surface. This all takes up a few bytes. There's multiple ways to apply the patch: add section, extend section, search for cave... I chose to wrote a loader since it's smaller to distribute, never did a loader before, and seemed like a fun project.
The "mov ecx, 3E80h / rep movsd"
gives plenty of space to insert a
CALL and two NOPs, while leaving us with source and destination pointers
in ESI and EDI. How do we get the pitch then? We call pSurface2->GetSurfaceDesc.
So take note of the address of the pSurface2 variable, and the byte offset
of GetSurfaceDesc in the vtable. The size of a DDSURFACEDESC is 108
bytes, and the pitch is stored at offset 16.
Under NT, the job of the loader is simple. CreateProcess in CREATE_SUSPENDED
mode. Then you VirtualAllocEx a page of memory, and write the "Proper
Pixel Transfer" code there. Finally patch the original "mov, rep movsd"
code with a call to the new page. Use the E8 opcode which has 32bit
immediate data. Note that the offset is relative to the beginning of the next
opcode. Thus, you calculate the 32bit immediate as "DESTINATION - (SOURCE + 5)".
After the WriteProcessMemory is done, you can finally ResumeThread, and the game should be working and such. However, to make this work under 9x, there's a lot of work to be done. Why? Because there's no VirtualAllocEx under 9x (oh thanks so much microsoft...)
Getting the stuff to work under 9x (the method is compatible with NT as well), the following steps are involved:createevent save bytes at program entrypoint overwrite bytes at entrypoint with helper code VirtualProtectEx (so helper code can patch) resumethread waitforsingleobject suspendthread getthreadcontext (with control regs specified) set context.eIP back to entrypoint, setthreadcontext VirtualProtectEx (set back old protection - not *really* necessary) resumethread
See, there's a couple of pecularities. First, when you create a process in suspended mode, the thread is NOT suspended at program initial EIP, it is suspended somewhere in kernel32 code. So you can't depend on context.eIP after the process is created. You'll have to postpone the get/set context to after the helper has run, to have a valid stack pointer - this also means the helper has to preserve esp. However, esp is the only register you really need to preserve, as win32 does NOT guarantee register state on program entry.
The event stuff is needed so the loader can wait for the helper code to finish. The helper code will call SetEvent when it's done. Make sure you pass a SECURITY_ATTRIBUTES structure to CreatEvent (so the event can be shared), and make sure you set "bInheritHandles" to TRUE in the CreateProcess call.
Suspending thread before Get/SetThreadContext and WriteProcessMemory is important :).
The function of the helper code is to VirtualAlloc a page and move the "Proper Pixel Transfer" code in place - a task that was handled by the loader in the NT example. I chose to make the helper code do the program patch as well, that's why you have to mess with VirtualProtectEx.
Ok. To call API functions you will need to look up their __imp_* VA in IDA. Not a big deal. But XCOM doesn't import SetEvent. You must import this yourself. There's a lot of ways, but the shortest & easiest is probably to import GetModuleHandle and GetProcAddress.
To see how this is all put together, have a look at my loader.
Essay by f0dder(a)yahoo.com (f0dder.cjb.net), last edit at 2002-01-03.