To determine if your processor supports these new instructions you have to use the CPUID instruction with EAX=1 to get the "Feature Flags" returned in EDX. Bit 24 is called FXSR "Fast floating point save and restore".
This bit indicates whether the processor supports the FXSAVE and FXRSTOR instructions for a fast save and restore of the floating point coprocessor's context. Also present will be a bit within Control Register 4 (CR4.OSFXSR - bit 9) which needs to be set for these new instructions to work.
These new instructions are similar to the regular floating point FSAVE and FRSTOR instructions which save/restore the FPU context to a 94 or 108 byte memory space depending on the CPU operating mode (16 or 32 bit). FXSAVE and FXRSTOR save/restore the FPU context to a 160 byte memory space, allowing 16 bytes per ST register instead of the usual 10. Although I have not fully analysed the stored data yet, it would appear the bytes 0x20..0x9F are the ST registers. You can only use the memory encodings of ModR/M, as no register is 160 bytes long, and the address must be paragraph aligned (16 bytes). Using registers will give a GPF#06, using a non-paragraph aligned buffer will give a GPF#0D.Who uses these instructions? Can you prove this? Microsoft uses these instructions in VMCPD.VXD (Virtual Math Co-Processor Device) that is part of the Windows 98 Beta. Download a copy of DUMPLX and verify it for yourself. I have written a little demonstration program (supplied in source and object form) which shows the difference between FSAVE and FXSAVE. It is a 32-bit DOS-Extended application that can be run from DOS or a DOS prompt under Win95. Download FXSAVE.ZIP
Update 5:17pm 27-Feb-98 - A couple of things have been bugging me over the last couple of days. I'm not claiming that these instructions are related to MMX2, yet! However some of the things Intel claims about them seem quite bizarre. For instance, if you want to speed an instruction or process up, how can writing/reading close to 60% more data help you? Intel changes the clock cycles instructions take to execute all the time, why then create new opcodes and hoops to jump through to make them work? I think these instructions would better be described as eXtended save and restore. With MMX2 Intel is going to have to increase the size of the FPU "context data" to allow their SIMD-FP (Single Instruction Multiple Data) to work at a reasonable precision level. I find it hard to believe that a 32 bit FP representation will be enough, heck my prehistoric BBC Micro had 40 bit FP, and the current FPU uses 80 bit FP. Now Intel's dilemma is that they can't change the behaviour of the classic FSAVE/FRSTOR without breaking a whole slew of legacy applications and operating systems which have 108 byte buffers. Further they can't switch context unless they save all the information in the current context, if it doesn't get saved the register content you are expecting will get trashed. For now, these new instructions utilize 160 bytes, but I'm very suspicious that by the time MMX2 arrives this figure will be closer to 500+. Have to go now, I'll keep you posted.
Update 7:42pm 20-Apr-98 - Intel has posted information about these instructions, but has buried them within Manual Addendums in the Celeron section. Download these manuals from Intel. They are documented to require 512 bytes of space, of which most is undefined/reserved. Windows 98 allocates 528 bytes so that it can align a 512 byte buffer on a paragraph boundary.
modR/M |
xx000xxx | xx001xxx | xx010xxx | xx011xxx | xx100xxx | xx101xxx | xx110xxx | xx111xxx |
---|---|---|---|---|---|---|---|---|
group #B 0F AE |
FXSAVE M (160 Bytes) Deschutes |
FXRSTOR M (160 Bytes) Deschutes |
0F | AE | 06 | fxsave [esi] | ||||
0F | AE | 47 | 0C | fxsave [edi+0Ch] | |||
0F | AE | 4F | 0C | fxrstor [edi+0Ch] | |||
0F | AE | 0D | D0 | D0 | AD | DE | fxrstor [0DEADD0D0h] |