"Descriptor tables in kernel exploitation" - a new article [Archive]

View Full Version : "Descriptor tables in kernel exploitation" - a new article

j00ru vx tech blog

January 16th, 2010, 20:18

Hi there!

Not so long (a few weeks, actually) ago, me together with Gynvael Coldwind ("http://gynvael.coldwind.pl/") had a chance to carry out a research regarding the Global and Local Descriptor Tables being used as a write-what-where target, while exploiting ring-0 vulnerabilities on 32-bit Microsoft Windows NT-family systems. The result of our work is a small article, describing the actual steps taken in order to escalate the privileges through GDT/LDT. As usual, exemplary source code snippets are available (attached to the document), so that the reader can check their effectiveness on his own.

I would like to say THANK YOU to Unavowed ("http://unavowed.vexillium.org/") and Agnieszka Zerka ("http://aishikami.wordpress.com") for their comments and help in the process of assembling this publication.

A complete package, including a PDF file "GDT and LDT in Windows kernel vulnerability exploitation" (with the source.zip file enclosed to the paper) can be downloaded from here ("http://vexillium.org/dl.php?call_gate_exploitation.pdf") (682 kB).

Content:
1. Abstract
2. The need of a stable exploit path
3. Windows GDT and LDT
4. Creating a Call-Gate entry in LDT
4.1. 4-byte write-what-where exploitation
4.2. 1-byte write-what-where exploitation
4.3. Custom LDT goes User Mode
5. Summary
+ References
+ Attachments

Have fun && Leave your comments!

http://j00ru.vexillium.org/?p=290&lang=en

Indy

January 17th, 2010, 17:45

Thanks, good article!
-
Using the descriptor table is not desirable for the escalation of privileges. Callgate has many drawbacks (high probability of crash, it is better to use IDT), such as:
o Do not reset TF and IF.
o Do not formed a trap frame(this is a great disadvantage).
Should use the mechanism that lead to the call target code after the change of the CPL, the already formed trap frame and unmask interrupts. This may be system callbacks. Should also take into account PatchGuard. I use different methods, one of which is to replace an SDT with a subsequent call to service from it. Write the link to the table in ETHREAD.ServiceTable:

Code:

ThServiceTable	equ 0E0H	; ETHREAD.ServiceTable



Pi_Start:

	_align	dd 0

; +

; SDT/SST

;

SDT_BASE	equ 4

;

; [4]:

; SERVICE_DESCRIPTOR_TABLE

; SYSTEM_SERVICE_TABLE

SST:

	PVOID (offset ServiceTable - offset SST + SDT_BASE)

	PVOID 0	; CounterTable

	ULONG 1	; ServiceLimit

	PBYTE (offset ArgumentTable - offset SST + SDT_BASE)

	; - (The shadow table does not use.)

ServiceTable:

	PVOID (offset DispatchServiceInternal - offset Pi_Start)

ArgumentTable:

	BYTE 0	; The number of parameters passed to the handler.

Align 4

; +

; o Trap frame already formed.

; o IRQL = PASSIVE_LEVEL

; o IF = 1

; 

DispatchServiceInternal proc uses ebx esi edi

; Repair the SDT.

	mov ecx,dword ptr ds:[offset OldSDT - offset Pi_Start]

	mov edx,dword ptr fs:[124H]	; KPRCB.CurrentThread:PETHREAD

	xor eax,eax

	mov dword ptr [edx + ThServiceTable],ecx

	Call R0_PayloadRoutine

	ret

DispatchServiceInternal endp

;

OldSDT	PVOID ?	; for ETHREAD.ServiceTable

	; ...

; -

; Call our handler.

	xor eax,eax

	mov edx,esp

	Int 2EH	; (You can use sysenter).

Gynvael

January 18th, 2010, 09:18

Hi Indy! Thank you for your comment!

Quote:

Using the descriptor table is not desirable for the escalation of privileges. Callgate has many drawbacks (high probability of crash, it is better to use IDT), such as:
o Do not reset TF and IF.
o Do not formed a trap frame(this is a great disadvantage).
Should use the mechanism that lead to the call target code after the change of the CPL, the already formed trap frame and unmask interrupts. This may be system callbacks. Should also take into account PatchGuard. I use different methods, one of which is to replace an SDT with a subsequent call to service from it.

You are correct about the drawback, however, let me comment on the issues you stated:
1. The TF is controllable - if one does not explicitly set the TF flag with a POPF one instruction between the call far to the call gate, then the TF flag is neglectable in my opionion
2. The IF flag is not reset, which creates a small but existing race condition windows - after the call far, but before explicitly disabling the IF (cli). I admit that it is possible, but I would judge that the probability is rather small.
3. The trap frame is not formed indeed, but it can be emulated on demand by the shellcode writer.
4. As for the PatchGuard - the Call-Gate approach we described is related to the x86 pmode systems. The PatchGuard as far as I know is present only in the x86-64 lmode versions of Windows.

Additionally, the LDT Data Segment Descriptor to Call-Gate change is local - it is done only for the given (exploit) process, not system-wide, which in my opinion is an advantage.

Anyway, please remember that our Call-Gate method proposition is not the 'ultimate best method'. It's just an alternative, and we wanted to point out that it exists ;>

Indy

January 18th, 2010, 10:03

Hi Gynvael!
Technique can be used, but not desirable.
1. If there is a call to the gateway with TF = 1, then it will lead to the emergence of the trace exception (#DB) in the kernel. Accordingly, the handler is not installed, that will cause crash. Of course this is not a critical situation, but can cause problems. But you can ignore this.
2, 3.
While not formed trap frame interrupts can not be unmasked. Although the stack will be switched to kernel from TSS, but the context of the problem is not formed, ie the problem does not exist, it will lead to crash. Of course you can reduce the probability of crash - after crossing the gateway to mask interrupts, and then manually create trap frame. Also all user memory is discharged(to swap), the page may be loaded from drive(swap) only if the interrupt unmasked and only on the first two IRQL.
Yet formed trap frame not available any kernel runtine. You can use the exception, for example - registration of a handler in the kernel, with subsequent generation of exception, or any mechanism that formed trap frame. For hand building trap frame(macro ENTER_TRAP, although it is not necessary) - the best way to morph handlers in the memory (transfer).

Gynvael

January 18th, 2010, 18:50

@Indy
Thank you for the discussion, it's very interesting ;>

Taking into account the drawbacks you've described, I can propose the following scenario (I'll digress from the trap frame emulation method we both mentioned in the previous posts):
1. Create a Call-Gate (1 byte overwrite is sufficient for that, as we've shown)
2. Jump to ring0, and overwrite some other thing (hence now you can overwrite any desirable number of bytes, and also read from the kernel memory)
3. Jump back to ring3
4. Use the more stable method crafted in point 2

Using this scenario you don't have to worry about IRQL, trap frames, etc, hence you are not waiting for a context switch (you can force context switch before the call far using various methods), you have the interrupts blocked, and you are not using other system-related function. A quick jump into the the ring0, and jump back to ring3.

I agree that there is a small time frame here for an interrupt to hit, but it's imho almost neglectable.
I also agree that there are times where other, more straight-forward methods can be used, but this is always an option ;>

Indy

January 18th, 2010, 20:23

Masked interrupts are equivalent to the highest level of IRQL (0xff). It is forbidden:
- Use memory is pumping. In particular the process address space.
- Apply to other modules and subsystems.
- Use a scheduler runtime. Waiting at the objects, working with threads, service calls, I/O operations, APC (and others, the context has not been formed

), etc., that is, in fact, we can not do anything.
Exploits for a privilege escalation are too valuable to work with some probability of crash experienced by the coder

This possibility exists because of memory paging and context switching (hardware interrupts can do after passing through the gateway, but to mask interrupts manually (cli), and hardware interrupts are completed by calling the scheduler).
All this is just my imho. I think the mechanism callgates not suitable for use in the NT, as a special case in exploits

Gynvael

January 19th, 2010, 11:01

@Indy
Thank you for your reply ;>

However, I'm afraid I must once again disagree. Furthermore, I don't recall a list of forbidden things in vulnerability exploitation

As for the drawbacks you stated:
- If swapping out pages that will be used during the short ring0 switch is an option, then these pages can be probed (forcing them to be read into the memory) before the jump. Additionally, I think this is neglectable on systems that do not have very frequent swap rotations. An intensive swapping condition is easy to detect in user mode, and can be waited to pass imho.
- "Apply to other modules and subsystems" - I don't see how this changes anything in the scenario I've posted in post #5
- As I've stated in post #5, if the Call-Gate is used only for a decent number of cycles, this is neglectable, hence no context switch would occur in this time anyway

Additionally, as far as I'm concerned the exploit may have a decent fail rate. This rate is very low while properly using the Call-Gate method.

I've addressed the hardware interrupts issue in post #3.

Additionally, once can always choose to act above the system. This would probably lead to a system-wide crash, but still can be used to install a bootkit or a similar root-preserving utility.

disavowed

January 25th, 2010, 11:52

"Unavowed", eh? What is it again that they say about imitation being the sincerest form of flattery?