PDA

View Full Version : Crash analysis of a plugin in a browser [need help]


nothize
October 7th, 2009, 13:10
Code:
/--------------------\
| 0 - Introduction |
\--------------------/

This is a request for help of a crash analysis of xxxxx10b.ocx crashed within a browser. The crash dump is analyzed but with 2 mysteries unexplained.

There are 4 sections below:

1. Crash dump analysis
The crash dump of the browser was analyzed with windbg. This section includes the result of the !analyze -v command, the r(register) command and the page protection information. Comments are colored in blue.

2. IDA analysis
The binary of xxxxx10b.ocx is loaded into IDA for static analysis. This section contains the disassembly of the caller before the crash. The nProcessor variable was traced by cross reference and the purpose is found to be storing the number of processors.

3. Disassembly around crash point
Since the crash point != eax, the last function pointer to call, the disassembly starting from eax are analyzed to help finding the most reasnoable execution path.

4. Analysis evaluation
The result and analysis of the incident is presented in this section. The unknowns / questions are also written here.

/---------------------------------\
| 1 - Crash dump !analyze -v |
\---------------------------------/

EXCEPTION_RECORD: ffffffff -- (.exr ffffffffffffffff)
ExceptionAddress: 035db7c3
ExceptionCode: c0000096
ExceptionFlags: 00000000
NumberParameters: 0

BUGCHECK_STR: c0000096

DEFAULT_BUCKET_ID: APPLICATION_FAULT

PROCESS_NAME: browser.exe

ERROR_CODE: (NTSTATUS) 0xc0000096 - {

THREAD_ATTRIBUTES:
LAST_CONTROL_TRANSFER: from 045761a0 to 035db7c3

STACK_TEXT:
WARNING: Frame IP not in any known module. Following frames may be wrong.
0617e334 045761a0 00000001 18b83ef0 00000000 0x35db7c3
0790d444 00000000 00000003 00080813 00000000 Xxxxx10b+0x761a0
As seen from the stack, the return address is 45761a0, arg0 is 00000001. Matched push 0x1; call eax at .text:1007619C.
18b83ef0 matched push edi at .text:10076170.

FOLLOWUP_IP:
Xxxxx10b+761a0
045761a0 8b4f44 mov ecx,[edi+0x44]

SYMBOL_STACK_INDEX: 1

FOLLOWUP_NAME: MachineOwner

SYMBOL_NAME: Xxxxx10b+761a0

MODULE_NAME: Xxxxx10b

IMAGE_NAME: Xxxxx10b.ocx

DEBUG_FLR_IMAGE_TIMESTAMP: 4987a6c3

STACK_COMMAND: .ecxr ; kb

FAILURE_BUCKET_ID: c0000096_Xxxxx10b+761a0

BUCKET_ID: c0000096_Xxxxx10b+761a0

Followup: MachineOwner
---------

0:029> r
eax=035db7b0 ebx=00000000 ecx=00020002 edx=08740000 esi=00000002 edi=18b83ef0
eip=035db7c3 esp=0617e338 ebp=0790d444 iopl=0 nv up ei pl nz na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000202
035db7c3 ed in eax,dx

0:029> !vprot eax
BaseAddress: 03590000
AllocationBase: 03590000
AllocationProtect: 00000004 PAGE_READWRITE
RegionSize: 00068000
State: 00001000 MEM_COMMIT
Protect: 00000004 PAGE_READWRITE
Type: 00020000 MEM_PRIVATE
There is no EXECUTE access in this memory region.

0:029> !vprot 45761a0
BaseAddress: 04501000
AllocationBase: 04500000
AllocationProtect: 00000080 PAGE_EXECUTE_WRITECOPY
RegionSize: 002e9000
State: 00001000 MEM_COMMIT
Protect: 00000010 PAGE_EXECUTE
Type: 01000000 MEM_IMAGE
While the return address has EXECUTE access.

/---------------\
| 2 - IDA |
\---------------/

.text:10076170 sub_10076170 proc near ; CODE XREF: sub_10082F10+131j
.text:10076170 ; sub_10088C30+251p
.text:10076170 57 push edi ;
.text:10076171 8B F9 mov edi, ecx
.text:10076173 83 7F 44+ cmp dword ptr [edi+44h], 0
.text:10076177 74 3B jz short loc_100761B4
.text:10076179 56 push esi
.text:1007617A 33 F6 xor esi, esi
.text:1007617C 39 35 8C+ cmp nProcessor, esi
.text:10076182 7E 2F jle short loc_100761B3
.text:10076184
.text:10076184 loc_10076184: ; CODE XREF: sub_10076170+41j
.text:10076184 8B 47 44 mov eax, [edi+44h] ; edi+44 = 18b83f34 -> 1886bc78
.text:10076187 83 3C B0+ cmp dword ptr [eax+esi*4], 0
.text:1007618B 8D 04 B0 lea eax, [eax+esi*4] ; for esi=2, [eax+esi*4] -> 00020002
.text:1007618E 74 1A jz short loc_100761AA
.text:10076190 8B 00 mov eax, [eax]
.text:10076192 85 C0 test eax, eax
.text:10076194 74 0A jz short loc_100761A0
.text:10076196 8B 10 mov edx, [eax] ; edx = 08740000
.text:10076198 8B C8 mov ecx, eax ; ecx = 00020002
.text:1007619A 8B 02 mov eax, [edx] ; eax = 035db7b0
.text:1007619C 6A 01 push 1
.text:1007619E FF D0 call eax ; crash later at 035db7c3
.text:100761A0
.text:100761A0 loc_100761A0: ; CODE XREF: sub_10076170+24j
.text:100761A0 8B 4F 44 mov ecx, [edi+44h]
.text:100761A3 C7 04 B1+ mov dword ptr [ecx+esi*4], 0
.text:100761AA
.text:100761AA loc_100761AA: ; CODE XREF: sub_10076170+1Ej
.text:100761AA 46 inc esi
.text:100761AB 3B 35 8C+ cmp esi, nProcessor ; nProcessor is 2 in the crash dump.
.text:100761AB 40 37 10 ; cmp esi, 2
.text:100761AB ;
.text:100761B1 7C D1 jl short loc_10076184 ; Why would jl jmp when esi = 2?

/-------------------------------------\
| 3 - Disassembly on 035db7b0 |
\-------------------------------------/

Crash dump disassembly:

035db7b0 c00000 rol byte ptr [eax],0x0
035db7b3 0000 add [eax],al
035db7b5 0200 add al,[eax]
035db7b7 0000 add [eax],al
035db7b9 0100 add [eax],eax
035db7bb 0000 add [eax],al
035db7bd 0100 add [eax],eax
035db7bf 0128 add [eax],ebp
035db7c1 55 push ebp
035db7c2 e7ed out ed,eax

Reconstructed disassembly(Since the code here appear to be self-modifying):

035db7b0 0000 add [eax],al
035db7b2 0000 add [eax],al
035db7b4 0002 add [edx],al
035db7b6 0000 add [eax],al
035db7b8 0001 add [ecx],al
035db7ba 0000 add [eax],al
035db7bc 0001 add [ecx],al
035db7be 0001 add [ecx],al
035db7c0 2855e7 sub [ebp-0x19],dl
035db7c3 ed in eax,dx


/---------------------------\
| 4 - Analysis evaluation |
\---------------------------/

From the information available, the following points are guessed:

- The crash point was at 035db7c3.
- The caller was 0457619e by call eax.
- eax was 035db7b0.
- The machine code at 035db7b0 to 035db7c0 shows that the control flowed from 035db7b0 to 035db7c3. ((4 * 0xb0) & 0xff = c0 at 035db7b0)
- The register dump shows that the loop at loc_10076184 were executed before the crash.

However, there are 2 mysteries:

1. Why esi = 2 but it can still branch from 100761b1?
2. Why can 035db7b0 being executed when the protection info shows that there was no EXECUTE access?
Experiment shows that setting eip to memory location with the same characteristic as 035db7b0 will result in access violation.

Any comments or suggestions are welcome!

nothize
October 7th, 2009, 23:56
Since the default value of nProcessor is 0x0F, the best bet would be a racing condition that caused nProcessor to be set as 2 after the jl branch at .text:100761B1 occurred.

Thus allowing out of bound function pointer being de-referenced. With many lucks, the chain of pointers isn't broken in the middle and finally fired call eax.

Then why is 035db7b0 executable? How can I confirm this by examining from the minidump with all optional data included? (if !vprot is unreliable)

Should check the page table manually???

nothize
October 8th, 2009, 01:19
Hi Kayaker

I feel comfortable for your explanation to the page protection question since the code has really been executed. And the information from !vprot cannot be 100% interpreted as how the facts are.


35db7b0 is a very very very very very(5 passes!) lucky one if 0x00020002 is not really a pointer. Because on a live debug session, the ([edi+44])+8 is something like 0x000200xx and it can only be de-referenced one or twice or thrice and then it would point to ?????(unmapped?) memory instead of going to call eax.


Yup !! Google has many hits , the cases I have looked of didn't look like the accident of mine or important information are absent. And since Xxxxx is a famous crashing engine when run with incorrect script......there could be many ways to crash it so I chose to inspect the case exclusively after reading a few search results.

From your action(brute force the masks and doing some manual RE) I could see that you are very willing to help! Thank you!

Kayaker
October 8th, 2009, 01:27
Hi nothize

Yes, my comment about JL was garbage, which is why I deleted my post after rereading it, but obviously not before you had read it Sorry about that.

Just to repost the sensible stuff:

Two MS definitions of PAGE_READWRITE indicate that execution IS allowed (if DEP isn't enabled). DEP is always enabled in 64 bit Windows.

Code:
PAGE_READWRITE - Read, write, and execute access to the specified
region of pages is allowed. If write access to the underlying section is
allowed, then a single copy of the pages are shared. Otherwise the pages
are shared read only/copy on write.

PAGE_READWRITE
Enables read-only or read/write access to the committed region of pages.
If Data Execution Prevention is enabled, attempting to execute code
in the committed region results in an access violation.


Cheers,
Kayaker

nothize
October 8th, 2009, 01:33
HI Kayaker

Hehe take it easy!! I have deleted the assumption of race condition for the nProcessor variable in the second thought. But now I'd rather take it back again.

And you taught me to look for the reference although the symbol is verbose enough.

Thanks again!

nothize
October 12th, 2009, 13:08
After a live debug session tonight, which set breakpoints on where writes to nProcessor, it is seen that the routine of updating this variable is often being called.

Since this variable always being updated to 0xF before setting to the number of processors, 2 in my case, and nProcessor looks like a global variable(resides in the data section), race condition is likely the culprit of the crash.

Code:

.text:100871E8 83 F8 0F cmp eax, 0Fh
.text:100871EB C7 05 8C+ mov nProcessor, 0Fh
.text:100871F5 7F 05 jg short loc_0_100871FC
.text:100871F7 A3 8C E0+ mov nProcessor, eax


And as simulated, when there are more than one instance of the browser window using the plugin, a race condition of the nProcessor can be created by freezing at 100871f5 to wait for another thread to access this variable to achieve the out of bound function pointer de-referencing.

Kayaker
October 12th, 2009, 23:35
That's some nice intuitive work there nothize to come up with, and test, a race condition as a possible explanation. Since we're not discussing breaking a commercial protection or anything, I think it can be said that the file in question is Flash10b.ocx.

It makes sense in retrospect, you can see how the race condition might occur, but it's amazing that it *would* occur because of the timing involved, all within the space of a couple of instructions.

What it seems to look like is this. There are actually 2 occurences of the following code (which doubles the chances of an error), where the global variable nProcessors is changed briefly to what in most systems would be an incorrect value (at least I don't know anyone with more than a Quad-core!)


Code:

:10087686 call GetNumberPhysicalProcessors
// Call GetSystemInfo, return [SystemInfo.dwNumberOfProcessors]

:1008768B cmp eax, 0Fh
:1008768E mov nProcessors, 0Fh
:10087698 jg short loc_1008769F

:1008769A mov nProcessors, eax


In psedocode:

Code:

n = GetNumberPhysicalProcessors();
nProcessors = 15;
if ( n <= 15 )
nProcessors = n;


The problem appears to be that nProcessors briefly (over the course of 2 instructions) can have the value 0x0F.


At the same time, there are 2 separate loops which might be running (in another browser instance/thread) which *uses* the nProcessors global variable. They are simple loops of the kind:

Code:

for ( i = 0; i < nProcessors; ++i )
{

}



The race condition would occur if the loop limit nProcessors was checked, at the same time that the global nProcessors variable had been changed to 0x0F.

Wow! what are the chances of that? Nothize nicely simulated the race condition. Could there be another explanation?


I suppose this is what can happen when you get too many cooks in the kitchen. There are a few other places where GetNumberPhysicalProcessors is called (though the return value isn't stored in a global variable. Programmer A figured the "upper limit" on the number of processors would be 8, in another case Programmer B chose 0x10, and Programmer C above decided on 0x0F. Hmmm...

nothize
October 13th, 2009, 01:07
Thanks Kayaker for the supplementary explanation!

The experiment mentioned in my last post was based on Flash10c.ocx thus the offset might have a bit difference. That's because the crash was dumped on another machine but the experiment was done on my own machine.

Here are the steps to re-produce the situation:

1. Open a browser(in theory, both the .ocx and .dll version have the same problem).
2. Enter an URL that has flash embedded. (make sure the flash plugin is loaded).
3. windbg -pd -pn iexplore.exe
4. Set 3 breakpoints:
i) at the first jg (after cmp, nProcessor, 0Fh)
ii) at the second jg (after cmp, nProcessor, 0Fh)
5. Start another browser instance, load another URL that has flash.
6. g (continue execution even bp 0 or 1 hit unless ~# is not showing thread 0)
7. ~f freeze the current thread that is at jg
8. g
9. Access violation........

It should be noted that during the experiment if the jg is at thread 0, no other thread will be likely trying to read from nProcessor. But when jg is at other threads, then thread 0 will try to read it within the loop.

(I'm going to try to crash firefox using the dll version, so have to submit the reply now!!)

nothize
October 13th, 2009, 02:01
I tried with Firefox but it always use thread 0 when accessing nProcessor.(read or write)

So npswf32.dll on firefox is not affected by this problem for this test case. It could be the nature of Firefox to not call the plugin from a worker thread.

Chrome should be tried too, though it's not available on this machine.....