Kayaker
February 9th, 2006, 23:36
Interesting idea nikolatesla. I'm not quite sure what you mean by Olly chokes though either. I made up a little test app for tracing, simply a call to MessageBeep, which uses an underlying Int2E/SYSENTER call. In either 2K or XP Olly seems to be able to step over the syscall OK, though if you F8 over the Int2E it doesn't necessarily return *directly* to the RET immediately after. For MessageBeep at least, Olly returns to the calling function User32!_NtUserCallOneParam, why it seems to miss a RET I'm not sure. I suppose an EXCEPTION_DEBUG_EVENT is generated and the handler gets that ret address from the exception record.
One interesting thing I did notice is that Olly sets a BP immediately after the MessageBeep call in user code. This effect is totally hidden unless you use Softice as well to set a BP deeper in the system code and trace back. I'd like to check if Olly does this with *all* API calls. I don't know if other debuggers do this, but it almost seems like a protection mechanism by Olly so an API call never "gets away" from it. By having a BP already set after the call, the debugger ensures control will get back to it no matter what the API might do.
For example if you set a Softice BP on MessageBeep, or deeper on the actual syscall it uses, win32k!UserSoundSentryWorker, you can single step trace back to the user code. When doing this I noticed Olly had set an Int3 0xCC opcode at 00401128 immediately after the initial API call.
00401121 6A00 PUSH 0 // MB_OK
00401123 E838010000 CALL <JMP.&user32.MessageBeep
00401128 E98A000000 JMP TestDlg.004011B7 // Olly sets a BP here
Normally you wouldn't notice this without the ring0 debugger since Olly would have already handled the BP transparently.
As for the LastBranch MSR, that's quite an interesting idea. As I remember Softice for Win9x output that info as part of its BPM command, I don't think it does anymore at least not directly. There's an internal Sice function that IceExt makes use of for its !LASTBRANCH command. This function was identified back in Icedump as pRecordLastBranchInfo and using the byte pattern given in IceExt we can find and analyse it. The proper MSR numbers can be identified from string references in Sice. It's all based on setting the LBR bit of the IA32_DEBUGCTL MSR. See sec 14.5.1 of Intel vol3.
Code:
.text:00017279 MSR_RecordLastBranchInfo proc near ; CODE XREF: .text:0001340Bp
.text:00017279 cmp cs:MSR_LBR_BitEnabled, 0
.text:00017281 jnz short @RecordLastBranchInfo
.text:00017283 retn
.text:00017284 ; ---------------------------------------------------------------------------
.text:00017284
.text:00017284 @RecordLastBranchInfo: ; CODE XREF: MSR_RecordLastBranchInfo+8j
.text:00017284 pusha
.text:00017285 mov ebp, esp
.text:00017287 push ds
.text:00017288 db 66h
.text:00017288 mov ds, cs:wNTICE_SS
.text:00017290 mov ecx, 1DBh ; LastBranchFromIP
.text:00017290 ; "Last Branch From EIP"
.text:00017295 call c_RDMSR
.text:0001729A mov dMSR_LAST_BRANCH_0, eax
.text:0001729F mov ecx, 1DCh ; LastBranchToIP
.text:0001729F ; "Last Branch To EIP"
.text:000172A4 call c_RDMSR
.text:000172A9 mov dMSR_LAST_BRANCH_1, eax
.text:000172AE mov ecx, 1D9h ; DebugCtlMSR
.text:000172AE ; "Debug Control"
.text:000172B3 call c_RDMSR
.text:000172B8 or eax, 1 ; Set LBR bit of IA32_DEBUGCTL MSR
.text:000172BB call c_WRMSR
.text:000172C0 pop ds
.text:000172C1 popa
.text:000172C2 retn
.text:000172C2 MSR_RecordLastBranchInfo endp
14.5.1. IA32_DEBUGCTL MSR (Pentium 4 Processors)
The IA32_DEBUGCTL MSR enables and disables the various last branch recording mechanisms
described in the previous section. This register can be written to using the WRMSR
instruction, when operating at privilege level 0 or when in real-address mode. A protected-mode
operating system procedure is required to provide user access to this register. Figure 14-2 shows
the flags in the IA32_DEBUGCTL MSR. The functions of these flags are as follows:
LBR (last branch/interrupt/exception) flag (bit 0)
When set, the processor records a
running trace of the most recent branches, interrupts, and/or exceptions taken
by the processor (prior to a debug exception being generated) in the last branch
record (LBR) stack. Each branch, interrupt, or exception is recorded as a 64-bit
branch record (see Section 14.5.2., “LBR Stack (Pentium 4 Processors)”

. The
processor clears this flag whenever a debug exception is generated (for
example, when an instruction or data breakpoint or a single-step trap occurs).
I don't know how something like this could best be implemented in Olly, but as you say the *mechanism* is there anyway.
Cheers,
Kayaker