Log in

View Full Version : !dump in IceExt 0.62


naides
April 5th, 2004, 18:27
When I try to use the memory dumper that IceExt provides I get an error message: "error exception occurred while dumping memory"

I am asking this question because I am not sure if this is a problem with my computer, my windows setup, my SofIce version (4.27) or a bug in IceExt. Sten actually mentions that a bug was fixed in this version.

Could other people that have IceExt installed tell me if !dump works for them??

thank you

crUsAdEr
April 6th, 2004, 01:44
Hi naides, it seems to work fine on my box...

Make sure that you are dumping a valid range of memory address, as that exception occurs when RtlCopyMemory fails...

naides
April 7th, 2004, 06:52
Quote:
[Originally Posted by crUsAdEr]Hi naides, it seems to work fine on my box...

Make sure that you are dumping a valid range of memory address, as that exception occurs when RtlCopyMemory fails...


Thanks for the info Crusader. I have been experimenting and I still run into problems: Example

I run Notepad.exe. The process ImageBase address is 01000000h
The Process imageSize is 13000h I can dump it seamlessly with LordPE for instance.

But, if I try to use !dump from IceExt, it complains with the exception message.
I can dump the first 2000h bytes with no problem, but anything beyond that results in error. It happens with other processes also.
Can you dump memory segments of arbitrary length?

crUsAdEr
April 7th, 2004, 09:21
That is true... i do encounter random error with dumping, sometimes it works sometimes it doesnt... perhaps other can do more tests on this...

Kayaker
April 7th, 2004, 11:21
Hi

If you try it from a fresh reboot you might be able to dump it. ExAllocatePool doesn't necessarily allocate contiguous memory, and over time the limited non-paged memory becomes fragmented. These driver memory allocation APIs generally allocate page-sized chunks, even if you only want a few bytes of memory. If one standard 4K page is 1000h, and you're trying to dump 13000h, I suppose that's a lot of contiguous memory to expect. It *sounds* like this might be the problem.

The newsgroups talk about this issue of allocating large blocks of contiguous memory at lot. The only time you are assured of getting it is when you first turn the machine on. As a driver developer the suggestions are to use MmAllocateContiguousMemory or AllocateCommonBuffer in your DriverEntry routine to secure your slice of the non-paged pool pie. I think there are ways of 'tracking' non-contiguous memory chunks using MDL's, but may only be for standard user app+driver applications.

I was kind of wondering if this might become a problem, cuz I use ExAllocatePool for memory dumping too. Unfortunately the problem is compounded by having to schedule a file output routine in a separate thread which occurs *after* you have exited from Softice. I'm sure there's a way around the problem, dumping in pieces if necessary from within the extension, tracking the physical pages of memory allocations that need to be split, say with a doubly linked list, and somehow scheduling them all to be dumped to the same open file later on, ...yadda.

But in general that sounds like a big pain in the ass. Better to tell the user to reboot, cross your fingers and kwitcher griping. ;-)

crUsAdEr
April 7th, 2004, 11:28
Hi Kayaker,

Just wondering, do we really need a NonPagedPool for dumping? I thought Windows should be able to handle paging by itself since we are dumping at Dispatch Level...

Kayaker
April 7th, 2004, 12:33
Possibly. If you're lucky The memory allocated isn't actually freed until after the file is written, so one might hope your dump(s) is still there come WriteFile time. But once out of Softice (or maybe even within), I don't know if you can be assured there might not be paging out/paging in or reallocations of paged memory going on beforehand.

I guess it would depend on how the system handles pageable memory "earmarked" for later use. If you allocate paged memory, say across several pages, you get what in return? A pointer to the start of the memory. If between that time and WriteFile time (several seconds perhaps), the system needs some more memory for its own use... who's to say you're protected over the full range? There may be some kind of internal bookkeeping going on that tries to keep paging out of memory that hasn't been freed yet to a minimum, but I couldn't begin to guess.

I do know that after exiting Softice after queuing via a semaphore a few file output routines from an extension for later processing, the file writing isn't 'immediate', and it's even possible to shut your driver down (and the separate I/O thread it created) before the file output is complete, so synchronization is needed. Setting the thread to HIGH_PRIORITY would likely help too.

It might be worth a shot to use the paged pool just to see what *does* happen, the worst you'll get is a bad Asp dump

crUsAdEr
April 7th, 2004, 14:14
NO but even if the pages are paged out, upon writing the memory out to file, Windows will be able to handle that page fault and page in the appropriate pages?

Also, if you look at IceExt source, the error is caused by RtlCopyMem failing... the error for Reserving NonPagedPool is different... so that should not be the issue i think!

Kayaker
April 7th, 2004, 14:51
Quote:
[Originally Posted by crUsAdEr]
Also, if you look at IceExt source, the error is caused by RtlCopyMem failing... the error for Reserving NonPagedPool is different... so that should not be the issue i think!


Yeah, that's true, I see where the error message lays now, it's not with the actual allocation (I was sourceless). Do you think it's the fault of RtlCopyMemory or within SheduleDumpFile?

crUsAdEr
April 7th, 2004, 21:17
Hi Kayaker,

I have recompiled iceext with mroe error msg and the error is definitely RtlCopyMemory... I have no clue why it fails though...

"Callers of RtlCopyMemory can be running at any IRQL if both memory blocks are resident. Otherwise, the caller must be running at IRQL < DISPATCH_LEVEL"

I am sure cmd_dump is running @ DISPATCH_LEVEL so perhaps either memory blocks are not resident?

Perhaps Sten could comment on this?

Kayaker
April 8th, 2004, 02:08
Hi

So you can duplicate the error? That's handy. Out of curiousity do you have any idea what *size* of buffer you can get away with?

I had a thought about this, just a theory that might be false. Assume under *normal* driver conditions you allocate a large buffer with ExAllocatePool, while it may not return a range of pages that is *physically* contiguous, because of fragmentation or whim, it will be *virtually* contiguous and the Virtual Memory Manager should be able to handle any memory copy to it. The paging mechanism and PTEs and all that mysterious stuff being involved.

The thing is, we're not really in a *normal* driver condition here but in Softice space within an extension. While you can call most kernel APIs and such, I wonder if the VMM itself is affected and the page fault that occurs which would normally trigger a new page being brought in doesn't work as expected?

Naides mentioned being able to dump 2 pages worth but no more. Perhaps that time ExAllocatePool allocated 2 contiguous *physical* pages, which were copied into OK, but the rest were scattered physically, requiring the services of the VMM, which seemed to have gone belly up on him.


This might be hokum because Softice itself uses ExAllocatePoolWithTag (the tag being "SIce", but I can't tell what procedure it's used in or how many bytes might be allocated. It might be for small mem allocations <= 1 page so the VMM never comes into play. Sice also uses MmAllocateContiguousMemory, where the pages are guaranteed to be physically contiguous and which, again a theory, might be used for larger allocations. So much for theorizing

crUsAdEr
April 8th, 2004, 04:28
Hi naides,

Can you try this... scroll down in Data Windows of sice for the whole file... use PageDown or something ... then you will be able to dump no problem...

I think RtlCopyMemory will fail when part of the file is paged out... also !dump will fail due to any "uninitialized" data section, sometimes you will see in sice ???? in certain section of the file, using pagein command i was able to dump properly...

SO i guess the problem is Memory not resident, so we have to somehow lower IRQL before dumping... but i am not sure, will have to wait if Sten will do anything about this :/

naides
April 8th, 2004, 12:56
Quote:
[Originally Posted by crUsAdEr]Hi naides,

Can you try this... scroll down in Data Windows of sice for the whole file... use PageDown or something ... then you will be able to dump no problem...

I think RtlCopyMemory will fail when part of the file is paged out... also !dump will fail due to any "uninitialized" data section, sometimes you will see in sice ???? in certain section of the file, using pagein command i was able to dump properly...

SO i guess the problem is Memory not resident, so we have to somehow lower IRQL before dumping... but i am not sure, will have to wait if Sten will do anything about this :/


Results of the experiment:

Loaded Notepad.exe in symbol loader

ImageBase 01000000 Size 13000

Tried to dump: Only the first two pages 2000h dump.

Let me expand what I mean: If if place !dump c:\dump.dmp 01000000 100 it dumps. I gradually increase the size of the dump until I get the error message. The limit here was 2000.

I place the file in the data window and looked at it by paging down. around address 01009000 to 1009FFF and 0100E000 to 0100EFFF there were pages that were not mapped ie they showed as ????? marks

After this maneuver, I was able to dump up to 8FFFh. once I hit the area belonging to the file but not mapped-in, while I examined in Sice data window, I hit the exception error code. It seems that !dump uses an API that does not have the paging in capability, am I correct?

crUsAdEr
April 8th, 2004, 20:11
Hi naides,

Try Disable Mapping of Nonpresent Pages in sice setting... then you will be able to see all the ???? where the page is not present...

Quote:
It seems that !dump uses an API that does not have the paging in capability, am I correct?


I think it is just that !dump is executing at IRQL DISPATCH_LEVEL which the MSDN says paging will not be supported? Which is why RtlCopyMemory fails... somehow we need to lower IRQL before dumping...

Kayaker
April 8th, 2004, 23:32
Nice one crU
("Good Spot" in the birding world)

If I use KeGetCurrentIrql in regular driver code it returns 0 or PASSIVE_LEVEL, which is normal. If I use it in my Softice extension it returns at a whopping what is the 3rd highest IRQL level of 29 or IPI_LEVEL. Softice itself (or its extensions at least) must run at an elevated level so it doesn't get interrupted by anything lower, in fact the only things which could interrupt its execution are Machine checks/bus errors and Power-fail interrupts. Interestingly there is no KeRaiseIrql API in Sice code, so it must do it some other way during initialization.

Programming the MS WDM by Oney talks about IRQL and Paging:

--------------------------------------------
IRQL and Paging
One consequence of running at elevated IRQL is that the system becomes incapable of servicing page faults. The rule this fact implies is simply stated:

Code executing at or above DISPATCH_LEVEL must not cause page faults.

One implication of this rule is that any of the subroutines in your driver that execute at or above DISPATCH_LEVEL must be in nonpaged memory. Furthermore, all the data you access in such a subroutine must also be in nonpaged memory. Finally, as IRQL rises, fewer and fewer kernel-mode support routines are available for your use.

...it’s well to point out that the rule against page faults is really a rule prohibiting any sort of hardware exception, including page faults, divide checks, bounds exceptions, and so on.
-------------------------------------------


During testing I was able to change the IRQL level within a Sice extension to that of PASSIVE_LEVEL (0) or DISPATCH_LEVEL (2) in the following way.

Code:

LOCAL OldIrqlWORD
LOCAL DummyIrqlWORD

invoke KeGetCurrentIrql ; (include hal.inc/hal.lib)
mov OldIrql, eax
; returns IPI_LEVEL equ 29 ; Interprocessor interrupt level


invoke KeLowerIrql, PASSIVE_LEVEL ; // DISPATCH_LEVEL
invoke KeGetCurrentIrql
; returns PASSIVE_LEVEL or DISPATCH_LEVEL


; Do our memory copy here?


; Raise it back
lea eax, DummyIrql ; storage for returned value, we won't use
invoke KeRaiseIrql, OldIrql, eax
invoke KeGetCurrentIrql
; returns IPI_LEVEL


Btw, This ignored the DDK, again from Oney:
-----------------------------
The DDK documentation says that you must always call KeLowerIrql with the same value as that returned by the immediately preceding call to KeRaiseIrql, but this information isn’t exactly right. The only rule that KeLowerIrql actually applies is that the new IRQL must be less than or equal to the current one.
-----------------------------


The one possible caveat to lowering the Irql is that something other than the page fault(s) we want to occur might interrupt the mem copy, so ExAllocatePool should likely be done beforehand, then raise the Irql again as soon as possible:
-----------------------------
It’s a mistake (and a big one!) to lower IRQL below whatever it was when a system routine called your driver, even if you raise it back before returning. Such a break in synchronization might allow some activity to preempt you and interfere with a data object that your caller assumed would remain inviolate.
-----------------------------


It makes sense now that the error is probably coming from the source buffer side of things (generally pageable user code), and not the destination buffer (non-pageable allocated memory):
-----------------------------
Paged pool.
Driver routines running below DISPATCH_LEVEL IRQL can use a heap area called paged pool. As the name implies, memory in this area is pagable, and a page fault can occur when it is accessed.

Nonpaged pool.
Driver routines running at elevated IRQLs need to allocate temporary storage from another heap area called nonpaged pool. The system guarantees that the virtual memory in nonpaged pool is always physically resident. The device and controller extensions created by the I/O Manager come from this pool area.
----------------------------------------------------------

So it sounds like either page everything in before copying or temporarily lower the IRQL.

crUsAdEr
April 9th, 2004, 02:40
Quote:
So it sounds like either page everything in before copying or temporarily lower the IRQL


I was thinking of altering the iceext implementation a bit, instead of copying dump to a nonpagedpool then later save to file, we can schedule a file dump with context pointer, then we skip the nonpagedpool alltogether, on sice exit, before dumping, switch context, dump then switch context back...

Seems complicated :/... looks like dumping is a user-mode job really ... a quick look at ntdump sourcecode also suggests that it can only dump @ IRQL less than 2 only... ah well, with !suspend, we can always use LordPE and avoid the ugly EB FE

evaluator
April 9th, 2004, 03:07
probably for be safe, before copying memory Sten uses MmIsAddressValid.
check it, then ..

Clandestiny
April 9th, 2004, 16:28
Quote:
[Originally Posted by crUsAdEr]Hi Kayaker,

Just wondering, do we really need a NonPagedPool for dumping? I thought Windows should be able to handle paging by itself since we are dumping at Dispatch Level...


In an SI extesion, you will need to do all memory allocations from the non paged pool. Consider the fact that SI can be arbitrarily entered at any point in time. As such, no guarantees can be made regarding the state of the system.

There are 3 execution contexts for kernel mode code.

1. Trap or Exception context - normally when a ring 3 app makes a request of a kernel mode driver by "trapping" into kernel mode. In this case, the context is equivalent to the user code that caused the trap. Therefore, the kernel mode code sees memory as it is seen by the user mode requestor. Dispatch routines would run in this context.

2. Kernel mode thread context - can make no assumptions about current processes, threads, or memory.

3. Interrupt context - because the interrupt can occur asynchronously and at any time, the context of the code executing at the time of the interrupt is arbitrary. Code running in interrupt context therefore can likewise make no assumptions about the state of the page tables or current processes / threads.

What you are dealing with in an SI extension would be essentially interrupt context because you can make no assumptions about anything, including the IRQL level of the code that was just interrupted when you entered SI. Because no assumptions can be made regarding IRQL level, you will be very limited in what you can do.. You may or may not be running at or below Dispatch level so basically you will have to assume that you could potentially be running at the highest IRQL level and obey all restrictions. The only memory that will be available to a driver at this level will be the non paged pool (remember no assumptions can be made about the page tables). Also, it should be noted that a driver must be executing at or below DISPATCH_LEVEL to allocate or free even non paged memory. In order to get stuff done, an SI extension will have to delegate & synchronize a lot of stuff with kernel threads (which are running at passive level IRQL).

Cheers,
Clandestiny

Kayaker
April 10th, 2004, 06:13
It looks like lowering the Irql isn't going to work in practice. Softice has its own exception handler built in for just this scenario, when you try to access a page not paged in you get:

Extension aborted: A page fault at CS:EIP %04x:%08x occurred when address %08x was referenced SS:EBP %04x:%08x

If you were to comment out the exception handler in IceExt which covers the RtlCopyMemory call, you would probably see this default Softice one come up instead. It points exactly to the faulting MOVSD instruction and exactly to the non-paged in address where the fault occurred. You can exit gracefully afterwards.


Even if I lower the Irql before copying, the extension is aborted outright by Sice, so there's no chance of the page fault being handled properly anyway. Plus there's no way to raise the Irql back to what it should be and all that happens is a nice BSOD.

The only way I could get a successful full dump of notepad was, as crUsAdEr mentioned, to first either scroll down in the data window to page in each section that showed as NP by the PAGE command, and where necessary use the PAGEIN command.


As for paging in programatically, the PAGEIN command in Softice points us back to the same problem:

c_PAGEIN_
...
call pGetIRQLLevel
cmp eax, 2
jb short loc_19F36
push offset aIrqlMustBeBelo ; "IRQL must be below DISPATCH_LEVEL to page in memory"


Boooo...


Using MmIsAddressValid would be a good idea to test each page, at least to inform the user, unfortunately it can't be used within an extension either:
Callers of MmIsAddressValid must be running at IRQL <= DISPATCH_LEVEL.

Clandestiny
April 10th, 2004, 08:53
Quote:
[Originally Posted by crUsAdEr]
Seems complicated :/... looks like dumping is a user-mode job really ... a quick look at ntdump sourcecode also suggests that it can only dump @ IRQL less than 2 only... ah well, with !suspend, we can always use LordPE and avoid the ugly EB FE


You have a point there with the !suspend... Clearly due to the elevated IRQL you can't do much in the SI extension if the memory is not paged in already... But couldn't you *programatically* suspend the process and then queue it out to a kernel thread (runs at IRQL = PASSIVE LEVEL). You should have access to the process environment block (PEB = fs+30). I don't know all of the undocumented fields, but there may be some way to set the process to a suspended state manually.

Cheers,
Clandestiny

crUsAdEr
April 10th, 2004, 16:33
Sure Clan,

Instead of scheduling the buffer and length for dumping, we can also pass PID or context of file to dump, then use ZwOpenProcess, ZwReadVirtualMemory but then i just feel that defeat the purpose... might as well use ring3 app like LordPE which is real good at what it is doing already

crUsAdEr
April 11th, 2004, 01:44
Hey Clan,

Quote:
1. Trap or Exception context - normally when a ring 3 app makes a request of a kernel mode driver by "trapping" into kernel mode. In this case, the context is equivalent to the user code that caused the trap. Therefore, the kernel mode code sees memory as it is seen by the user mode requestor. Dispatch routines would run in this context.


This means if we enter sice via a Breakpoint on a ring3 app, we will be @ IRQL==PASSIVE_LEVEL?
I think entering sice via hotkey, we will be @ IRQL==DISPATCH_LEVEL because sice pops up via DISPATCH_INTERNAL_DEVICE_CONTROL???

Sten
April 11th, 2004, 03:55
Quote:
[Originally Posted by crUsAdEr]This means if we enter sice via a Breakpoint on a ring3 app, we will be @ IRQL==PASSIVE_LEVEL?
I think entering sice via hotkey, we will be @ IRQL==DISPATCH_LEVEL because sice pops up via DISPATCH_INTERNAL_DEVICE_CONTROL???


Current !DUMP command implementation does not support paged out memory. RtlCopyMemory generates an exception when trying to access such pages.

I try to copy memory being dumped to temporal buffer and dump it later when you have exited from SoftICE. This needed to avoid possible memory modification prior it's being dumped.

The solution to support paged out memory seems do not copy memory when SoftICE is active, but suspend process being dumped (all it's threads) dump memory from worker thread and then resume process again.

There is no way to access paged out memory when SoftICE is active since SoftICE itself executes on a hiest IRQL (for ex. when APIC is available, SoftICE sets Task Priority to 0FFh, then do some low level programming and lowers it to 0EFh to allow some IRQs it needs).

Clandestiny
April 11th, 2004, 15:22
Quote:
[Originally Posted by crUsAdEr]Hey Clan,
This means if we enter sice via a Breakpoint on a ring3 app, we will be @ IRQL==PASSIVE_LEVEL?
I think entering sice via hotkey, we will be @ IRQL==DISPATCH_LEVEL because sice pops up via DISPATCH_INTERNAL_DEVICE_CONTROL???


The context refers to what assumptions about memory / current processes the driver can make. I don't think the execution context has a direct relationship to driver IRQL level. The user app would be executing at IRQL PASSIVE_LEVEL because all user threads execute at that level, but when the bp is hit, that doesn't mean that SI will also be executing at that IRQL. This how I understand it anyway... I'm not an expert on driver development by any means... Just trying to learn myself What Sten said about SI executing on the highest IRQL makes a lot of sense though since the definition of IRQL implies that code executing at a high IRQL cannot be interrupted by code at a lower IRQL. SI must be able to freeze the system and if it were not executing at the very highest IRQL, then it could be interrupted other code.

Cheers,
Clan

crUsAdEr
April 11th, 2004, 20:37
Thanks guys... i was a bit confused by !irql command output... i realised that it actually output the irql of the popup client context, not irql of sice itself...