Bilbo, thanks for your replies. I will try the answer from your first experiment shortly.
Now onto the VirtualAlloc thing, we all know that once you reserve a huge chunk of memory then
subsequently you can obtain a ptr 0x1000 bytes after the first ptr. I didn't make it clear that in my
test program, one cannot change the amount of memory the first call allocates, i.e. the 0x1000 bytes but not 0x10000 bytes. Here is the result from my studies of the NtAllocateVirtualMemmory,
Internals of VirtualAlloc win32 API
Win32 API VirtualAlloc allocates a multiple page size memory space
from the system. The function takes 4 arguments and returns a pointer
to the allocated memory space:
ptr = VirtualAlloc(address, size, mmFlag, pgFlag)
address: 0 or fixed address (rounded up to 64kb)
size: rounded up to page size
mmFlag: MEM_COMMIT, MEM_RESERVE
pgFlag: PAGE_READ, PAGE_WRITE, PAGE_READWRITE ..
The function calls VirtualAllocEx to do the job, pseudo code for
VirtualAlloc call VirtualAllocEx(-1, address, size, mmFlag, pgFlag
VirtalAllocEx has one more argument, the process handle, -1 means the
calling process itself. VirtualAllocEx calls IVirtualAlloc. pseudo
code for VirtualAllocEx:
setup stack frame
setup SEH
if(address){
ecx = [7c5cf048] // global variable lowMemBoundary = 4MB
if(address < [ecx+13c]) // [ecx+13c] = 0x00010000
goto out_bad_below_4MB
// The 3rd argument is always 0 when called from ring3 application?
eax = IVirtualAllocEx(-1, &address, 0, &size, mmFlag, pgFlag)
label_1
remove SEH
return EAX
IVirtualAlloc:
mov eax, 10 <-- syscall index for ntoskrnl!NtAllocateVirtualMemory
edx = esp+4 <-- the return address: label_1 in VirtualAllocEx
int 2e
NtAllocateVirtualMemory:
setup kernel SEH
if(user land esp > 7fff0000) goto called_from_kernel
copy stack parameters from user land (EDI) to kernel land (ESI)
called_from_kernel:
mov ebx, 80499e48 (win2ksp4)
call ebx
if(eax != 0) eax = 0 <------ Interesting
else eax = AllocatedAddress
remove kernel SEH
SYSEXIT
This subroutine called through ebx is the real labor code that
allocates the memory. This subroutine is definitely written in higher
level language, probably C due to the presense of stack frame and
stupid compiler optimizations such as mov [ebp-20], eax; mov eax,
[ebp-20] sequence. This subroutine takes the same argument list as
IVirtualAlloc.
The pseudo code follows (when MEM_COMMIT and address are set):
setup stack frame
setup SEH
sub esp, 124 ; lots of local variables
local_stack_ebp-18 = esp
if(arg3 >= 15) goto called_from_kernel
local_mmFlag_ebp-128 = mmFlag & 200000
local_mmFlag_ebp-128 = mmFlag & 200000
if (mmFlag | MEM_RESERVE) goto attempt_to_reserve_page
if(addr > 7fff0000) goto out_bad_MMTooHigh
if(addr > eax=(7ffeffff+ffff0000)=7ffdffff) goto out_bad_MMTooHigh
if(eax - addr < 64kb ) goto out_bad_NotEnoughMem
if(size == 0) goto out_bad_invalidSize
if(pHandle = arg0 != -1) goto allocate_in_another_process
setup thread related parameters
call ExAcquireFastMutex
call NtAcquireFastMutexUnsafe
if(addr == 0) goto free_base_address_allocate
fixed_base_address_allocate
verify pgFlag and mmFlag
align size to 4k boundary
push address >> 0c
push (address + size -1) >> 0c
push ptr to memory_block_linkedlist
call sub_8044eb8a <--interesting subroutine that walks through the LL
if(eax == 0) goto out_bad_STATUS_conflicting_address
prepare to allocate memory with the information in eax
free_base_address_allocate:
allocate the memory at address
... // abbrieviated
out_bad:
mov eax, ERROR_CODE ; C0000018 STATUS CONFLICTING_ADDRESS
ret
sub_8044eb8a

address >> 0c, (address+size-1) >> 0c, MM_block_LL_head)
mov eax, [esp+0c] ; head ptr to the MM_BLOCK_LL
l_1:
test eax, eax
jz out_bad
mov ecx, [esp+4] ; address >> 0c
cmp ecx, [eax+4]
jbe l_2
mov eax, [eax+10]
jmp l_1
l_2:
mov ecx, [esp+8] ; (address + size -1) >> 0c, upper bound
cmp ecx, [eax]
jae out_good
mov eax, [eax+0c]
jmp l_1
out_bad:
xor eax, eax
out_good:
ret 0c ; 3 arguments
the structure of the MM_BLOCK is:
struct MM_BLOCK_LINKEDLIST{
DWORD linear_address >> 0c;
DWORD lower_bound_LA >> 0c;
unknown; 0
DWORD upper_bound_LA >> 0c;
DWORD next_block; <-- always points to a higher address block
}
The head of MM_BLOCK linked list is derived by:
mov eax, FS:[124] ; FS=30, ring0 TIB
mov eax, [eax+44]
mov eax, [eax+194]
mov MM_BLOCK_LL_HEAD, eax
Using the MEM_COMMIT argument and fixed address, the last MM_BLOCK is
21f0, 21f0, 0, 21f0, 0 (hypothetically in my test case), thus
sub_8044eb8a always ends up returning 0 in eax. This subsequently
causing AllocateVirtualMemory to return C0000018 in EAX and eventually
0 in EAX to IVirtualAlloc.
A simple alternation of the mmFlag argument will change the execution
path (MEM_RESERVE|MEM_COMMIT) to
attempt_to_reserve_page:
804AFC03 (IoOpenDeviceRegistryKey+0284)
mov eax, [ebp-8c] ; address
lea edx, [eax+edi-1] ; address+size-1
or di, 0fff ; align size to page boundary
and ax, 0000 ; aligh address to 64k boundary <--- MOST INTERESTING
mov [ebp-20], eax
...
push MM_BLOCK_HEAD
push address+size-1 >> 0c
mov eax, [ebp-20]
shr eax, 0c
push eax ; 02f0 instead of 02f1 due to or ax, 0000
call sub_8044eb8a(02f0, 02f1, MM_BLOCK_HEAD)
if(eax != 0) goto out_bad_STATUS_CONFLICTING_... <-------- AHHHH!!!
else proceed...
Do you see the difference here compared with the MEM_COMMIT case,
here, the kernel bails out when eax!=0 but it bails out when eax=0 in
MEM_COMMIT case. Why is that? Well, again check sub_8044EB8A, when the
code walks through the MM_BLOCK_LL and reaches the end, the jbe and
jae instruction will return 02f0 in eax thus it is never zero. What
does this tell you? When you do MEM_RESERVE|MEM_COMMIT, your address
has to be on a 64k boundary! There is no such restriction for
MEM_COMMIT, however, the kernel would still bail out regardless if
your fixed address is on a 64k boundary or not. Your call to
VirtualAlloc must be MEM_RESERVE|MEM_COMMIT to succeed if you use a
fixed address. If you start out with address 0 though, you can use
MEM_COMMIT alone and it will work. This sound a little complicated,
try to play with sample code and see what the results are.