Geoff Chappell - Software Analyst
CURRENT WORK ITEM - PREVIEW ONLY
Programs and drivers have many reasons to want to know something of the processors they’re running on. Especially notable is whether this or that extension of the basic instruction set is available. The Internet has no shortage of answers along the lines of executing the cpuid instruction to test for the desired processor feature. Even Microsoft helps with its notes on and a code sample for its compiler’s __cpuid intrinsic. But a good answer, of course, is not nearly so simple. Indeed, the question isn’t either. Even if it were, there’d be a good case for an API function, if only for features that seem likely to be common queries, just to spare programmers from the mere inconvenience of writing their own tests. Throw in that such tests (and even just the interpretation of feature flags produced by cpuid) can depend on hardware-specific details that aren’t widely known, and the case for an API function becomes compelling.
Sparing programmers some inconvenience and standardising their tests is superficially what the ProcessorFeatures array in the KUSER_SHARED_DATA is for. It supports the exported functions ExIsProcessorFeaturePresent and IsProcessorFeaturePresent functions in kernel mode and user mode respectively. Each takes as its one input some predefined constant that represents one abstracted notion of some processor feature that was at least thought once upon a time to have wide interest. Each returns a simple yes or no for whether the feature is “present”. Except for two cases from the early history of 64-bit Windows, the answer comes simply from looking up the array. The kernel has done the work of determining what’s present and has placed its answers in the KUSER_SHARED_DATA so that the user-mode API function doesn’t even have the expense of transitions to and from kernel mode. For the features that are covered, the answers from this API must count as definitive: if you wrote your own test with cpuid and got a different answer, you’d do well to wonder if your test has missed some point, if not about the processor support for the feature, then about the kernel’s.
And there we might leave it, continuing only to list the predefined indices and describe each briefly as does Microsoft’s documentation. But let’s get back to why the question is not so simple.
One reason has to do with the potential for difference in what it means for a feature to be available, enabled, present, usable or whatever other adjective you can think of. For more than a few processor features, the processor itself makes the distinction. A feature flag from cpuid may say that the feature is available but not necessarily that it is enabled or that it behaves in its most general way. Configuring it is left to the operating system, typically by writing to a control register but often nowadays to a Model Specific Register (MSR). This distinction goes back a long way: the cr4 register, which is architectural to the earliest Pentium, arguably has no other reason for existence.
For an early example of how subtle this could be, consider that a set TSC bit (4) in edx from cpuid leaf 1 tells that the processor has the rdtsc instruction, but were the operating system to set the TSD bit (2) in cr4, then although the rdtsc instruction is usable in kernel mode, all attempts to execute it in user mode would cause a General Protection fault. Note that user-mode software cannot read cr4 even to find out whether an available feature is or is not enabled.
Another difficulty is that although Windows is designed for symmetric multi-processor systems, it does not require that all processors are identical, just that Windows can use them equally. Faced with different processors that all meet minimum requirements, Windows can configure its use of optional features to the least of what the processors have in common. An application’s determination of some feature’s availability on one processor does not certainly apply to all. You may think that worrying about this is impractical and even that it takes caution to extremes, and I may agree with you to some extent, yet if you write a feature test that executes cpuid multiple times, e.g., to get the vendor string and the maximum leaf number before proceeding to the leaves you want, and you don’t at least realise that each cpuid can have executed on a different processor, then I can’t escape wondering whether you should (yet) go anywhere near to testing for processor features.
The table below lists the defined indices for the API functions and the ProcessorFeatures array. Some, in the nature of abstractions, have no applicability to x86 or x64 processors and therefore none to this note. For those that an x86 or x64 kernel is known to set, in contrast to leaving as zero-initialised, the table shows which versions. Implementation details of how a feature’s presence is decided as TRUE or FALSE follow the table (but are something of an open-ended project).
|0||PF_FLOATING_POINT_PRECISION_ERRATA||4.0 and higher (x86)|
|1||PF_FLOATING_POINT_EMULATED||4.0 and higher (x86)|
|2||PF_COMPARE_EXCHANGE_DOUBLE||4.0 and higher|
|3||PF_MMX_INSTRUCTIONS_AVAILABLE||4.0 and higher|
|6||PF_XMMI_INSTRUCTIONS_AVAILABLE||5.0 and higher|
|7||PF_3DNOW_INSTRUCTIONS_AVAILABLE||5.0 and higher|
|8||PF_RDTSC_INSTRUCTION_AVAILABLE||5.0 and higher|
|9||PF_PAE_ENABLED||5.0 and higher|
|10||PF_XMMI64_INSTRUCTIONS_AVAILABLE||5.1 and higher|
late 5.2 and higher
|13||PF_SSE3_INSTRUCTIONS_AVAILABLE||6.0 and higher|
|14||PF_COMPARE_EXCHANGE128||6.0 and higher (x64)|
|17||PF_XSAVE_ENABLED||6.1 and higher|
|20||PF_SECOND_LEVEL_ADDRESS_TRANSLATION||6.2 and higher|
|21||PF_VIRT_FIRMWARE_ENABLED||6.2 and higher|
|22||PF_RDWRFSGSBASE_AVAILABLE||6.2 and higher (x64)|
|23||PF_FASTFAIL_AVAILABLE||6.2 and higher|
|28||PF_RDRAND_INSTRUCTION_AVAILABLE||6.3 and higher|
|32||PF_RDTSCP_INSTRUCTION_AVAILABLE||10.0 and higher|
The kernel started as long ago as version 3.50 to accumulate feature bits in an internal variable as its own record of features that are present on all processors. Version 3.51 made this record available outside the kernel as the ProcessorFeatureBits in the SYSTEM_PROCESSOR_INFORMATION structure which is filled in by the ZwQuerySystemInformation and NtQuerySystemInformation functions when given SystemProcessorInformation (0x01) as the SystemInformationClass argument.
This internal variable—which, by the way, is named KeFeatureBits—was designed from the start such that each bit is an intersection of the corresponding feature over all processors. At first, the features for each processor were discarded. The kernel truly was interested only in which features it found on all processors and prepared for use on all processors. Version 4.0 starts saving each processor’s features as the FeatureBits in the processor’s KPRCB and saving them from there to the registry:
The individual bits, however, seem never to have been documented. Microsoft’s names for a handful appear in assembly-language headers KS386.INC and KSAMD64.INC from various development kits. A few others are defined in NTOSP.H from early editions of the Windows Driver Kit (WDK) for Windows 10. None correspond directly to feature flags in registers from cpuid. Their relevance to the ProcessorFeatures in the KUSER_SHARED_DATA is that most of the ProcessorFeatures are set to TRUE or FALSE according to whether some corresponding bit is set or clear in the KeFeatureBits variable.
The PF_FLOATING_POINT_PRECISION_ERRATA feature is explicitly set to FALSE in x86 version 6.1 and higher.
It can be TRUE in versions 4.0 to 5.0 if any processor’s ability at floating-point arithmetic has the particular defect that dividing 4,195,835 by 3,145,727 and then multiplying by it doesn’t get back to what was started with. To be specific, these numbers are loaded into the Floating Point Unit (FPU) from the qwords 0x4150017E`C0000000 and 0x4147FFFF`80000000, and the test is done with interrupts disabled, with the MP, EM and TS bits of cr0 all clear, and with the FPU newly initialised—all, presumably, to make sure that the test really is of the FPU’s arithmetic, not some disruption of it. The division, using either the fdiv or fdivr instruction, depending on the version, is well-known (since 1994) to be incorrect on some early Pentium processors.
The oldest Intel documentation that I have of this is a Pentium® Processor Specification Update (order number 242480-002, dated March 1995) in which this problem is titled Slight Precision Loss for Floating Point Divides on Specific Operand Pairs. As errata 20 for the 60- and 66-MHz Pentium processor, it is “fixed in the D stepping”. As errata 23 for the 75-, 90- and 100-MHz Pentium processors, it “affects B1 and B3 steppings” and “is fixed in B5 and later steppings.” In terms of the family, model and stepping that are produced in eax from cpuid leaf 1, this means the defect is in family 5 model 1 steppings 3 and 5, and model 2 steppings 1 and 2. This, however, is only background. The kernel does not infer mis-computation by partcular steppings: it tests an actual computation. .
The 32-bit kernel has code to test for this defect as early as Windows NT 3.50 SP3 from June 1995. (It is not in the original, and neither SP1 nor SP2 has yet been found for study.) What it does about the defect, and even whether it tests for the defect, depends on a registry value that the kernel reads while initialising:
|Data:||0 to emulate only if FPU not present;
1 to emulate if FPU not present or fdiv defect discovered;
else to emulate always
|Versions:||3.50 to 5.0|
Microsoft documented this registry value in Knowledge Base article Q122323, titled WinNT 3.5 Software Update for the Pentium Floating Point Error, now apparently long gone from Microsoft’s web site. Its introduction was there dated to Windows NT 3.50 SP1, which I see no reason to disbelieve.
Floating-point emulation (see the next feature) is an ancient provision. Even an 8086 that has no numeric coprocessor generates an interrupt in response to attempted execution of any floating-point instruction. The interrupt then allows software to emulate the instruction and compensate for the FPU’s absence. The 80386 can be configured—via the EM bit (2) in cr0—so that executing an FPU instruction generates this Coprocessor Not Available interrupt (7) even if an FPU is present and working. Emulation can then compensate for an FPU that is present but defective.
All versions up to and including 5.0 have an emulator so that user-mode software can, for all practical effect, execute floating-point instructions without an FPU. Given that an FPU is present, versions 3.50 and 3.51 test for the defect if ForceNpxEmulation is 1, intending to compensate for it by continuing as if no FPU is present. Versions 4.0 and 5.0 have the added need to report the defect as this PF_FLOATING_POINT_PRECISION_ERRATA feature, and so they test for it no matter what ForceNpxEmulation is set to. The feature shows as TRUE only if the defect is found but is not worked around by forcing emulation.
Versions 5.1 to 6.0 are odd. They retain code to test for the defect and they retain ForceNpxEmulation as a string. But they don’t look it up in the registry and they return to testing for the defect only if the internal variable that ForceNpxEmulation would have been loaded into is 1. Further study may show differently, but it is presently thought that testing for the defect never can happen in these versions.
The PF_FLOATING_POINT_EMULATED feature is explicitly set to FALSE in x86 version 6.2 and higher.
The days of emulating the floating-point instruction set are long gone. Still, it’s perhaps as well that Microsoft’s documentation labours even now to tell us that PF_FLOATING_POINT_EMULATED gives “a non-zero value if floating-point operations are emulated”. That Microsoft spells this out covers for a coding oversight that once had it the other way round (and which, in fairness, Microsoft did admit to in documentation of the IsProcessorFeaturePresent function before the SDK for Windows Vista). Before Windows NT 4.0 SP4, PF_FLOATING_POINT_EMULATED is TRUE if user-mode floating-point operations actually are left to a physical FPU.
So, what is this emulation? That each processor now has its own FPU is nowadays taken as granted, but versions 3.10 to 5.0 anticipate running on processors that pre-date the Pentium. Some such processors have a built-in FPU. Others can instead execute floating-point instructions through a numeric coprocessor. Such a thing need not be present, of course, and in the days of the 80386 typically wasn’t. Early versions of Windows can do without one, though they do require symmetry: if any processor has access to an FPU, then all must, else Windows stops at startup with the bug check MULTIPROCESSOR_CONFIGURATION_NOT_SUPPORTED (0x3E).
If no FPU is present even as a coprocessor, these versions set up for floating-point emulation to support user-mode software that tries to execute floating-point instructions. Note that this truly is just for user-mode support. The interrupt handler is in NTDLL.DLL. The kernel’s role in setting this up is mostly to locate the interrupt handler in NTDLL and write a suitable trap gate into the Interrupt Descriptor Table (IDT) so that occurrence of the Coprocessor Not Available interrupt (7) in ring 3 is handled without leaving ring 3. Note the implication for the use of floating-point instructions in kernel mode. Executing these instructions in kernel mode when emulation is enabled has absolutely no support and would crash Windows. Well-written kernel-mode code in version 4.0 and higher is spared this worry because it never executes any floating-point instruction without a successful call to KeSaveFloatingPointState, which fails if the kernel has configured itself for the absence of an FPU.
As noted above, versions 3.50 to 5.0 can force this emulation even with an FPU by the simple expedient of having the kernel proceed as if without an FPU. In all versions 4.0 to 6.1, what TRUE for the PF_FLOATING_POINT_EMULATED feature really means is not that floating-point instructions are emulated or that no physical FPU is present, but that the kernel is configured as if no physical FPU is present.
This difference in meaning is real or not, depending on perspective and taste, in versions 5.1 to 6.1. These versions are odd. The kernel continues to test for presence of an FPU and would set PF_FLOATING_POINT_EMULATED as TRUE if no FPU is detected. As noted above, versions 5.1 to 6.0 have code that would treat the FPU as absent, including because the FPU has a known defect with one instruction which can be worked around by emulating all instructions. In these cases too, the kernel would set PF_FLOATING_POINT_EMULATED as TRUE. Either way, there is no emulation to enable! Though NTDLL retains the handler in the original release of version 5.1, it does not export the NPXEMULATORTABLE variable through which earlier versions of the kernel locate the handler, and the build from Windows XP SP1 loses the handler (though it retains much of the supporting code and data).
Fortunately, the removal of emulation in version 5.1 looks like it never can matter. As noted above, the registry value for forcing emulation in earlier versions is not read in versions 5.1 to 6.0. The kernel’s code that would set PF_FLOATING_POINT_EMULATED as TRUE because emulation is forced can never run. (If you start these versions under a kernel-mode debugger and set the internal variable KeI386ForceNpxEmulation to 2, as if the registry value had been read to force emulation, then you will confirm that the kernel sets PF_FLOATING_POINT_EMULATED as TRUE but you will also see Windows crash when user-mode code first executes a floating-point instruction—well, unless you have a Cyrix processor, which is another story.) As for the kernel’s code that would set PF_FLOATING_POINT_EMULATED as TRUE when no FPU actually is present, this too is thought never to execute. Version 6.1 requires floating-point support from the boot processor, else Windows stops at startup with the bug check UNSUPPORTED_PROCESSOR (0x5D). Although versions 5.1 to 6.0 do not formally require an FPU, they do require the cmpxchg8b instruction (see below), which in effect requires at least a Pentium and therefore a built-in FPU. Further study may show differently, but it is presently thought that PF_FLOATING_POINT_EMULATED can only be FALSE in versions 5.1 to 6.1.
Whether absence of an FPU is real or simulated—or, put another way, whatever causes PF_FLOATING_POINT_EMULATED to show as TRUE (or as FALSE before Windows NT 4.0 SP4)—it forces other features to be FALSE:
This has real-world effect in versions 4.0 and 5.0. A naive test with cpuid can show that the processor has MMX instructions, but this doesn’t mean they are usable. Their execution instead causes an Invalid Opcode exception (6) because the kernel has set the EM bit in cr0 to enable floating-point emulation, which the processor regards as incompatible with MMX Technology. Though version 5.0 ordinarily does support the SSE instructions if cpuid shows both the SSE and FXSR bits, your own test of cpuid for this does not suffice: as a side-effect of being configured to act as if no FPU is present, the kernel will not have set the OSFXSR bit (9) in cr4 and most SSE instructions therefore cause the Invalid Opcode exception.
The PF_COMPARE_EXCHANGE_DOUBLE feature is necessarily TRUE in the x86 version 6.0 starting from Windows Vista SP1, and in all later x86 versions, and in all x64 versions.
In x86 versions 4.0 to 6.0, this feature is TRUE if all processors have the 0x00000080 (KF_CMPXCHG8B) feature bit. Broadly speaking, this bit is set for a processor for which edx from cpuid leaf 1 has a set CX8 bit (8). AMD names this the CMPXCHG8B bit. Whatever its name, its purpose is to declare the processor as having the cmxchg8b instruction.
The ultimate cause of the “broadly speaking” caveat is that when Windows first started using the cmpxchg instruction, in version 4.0, Microsoft recognised it only in processors from Intel, AMD and Cyrix. For processors from other vendors, a set CX8 bit had the unpleasant consequence that early builds of version 4.0 stopped at startup. That the bug check is MULTIPROCESSOR_CONFIGURATION_NOT_SUPPORTED even on a single-processor machine may or may not have meant the unpleasantness was just a coding oversight. (Remember, these were years when anti-competitive trickery by Microsoft was not much hidden.) The vendor-specific testing of CX8, though not of other cpuid feature bits, was relaxed for Windows NT 4.0 SP4.
The affected vendors inevitably developed means by which their processors that have cmpxchg8b instructions would have cpuid show a clear CX8 bit so that early builds of Window NT 4.0 could at least run, even if not optimally. Starting with version 5.1, Windows recognises cases of this happening and either reprograms the processor to enable cmpxchg8b or just accepts that the processor actually does have cmpxchg8b. Thus can it be that the PF_COMPARE_EXCHANGE_DOUBLE feature shows as TRUE though a naive test for the cmpxchg8b instruction via cpuid would fail.
In version 5.1 and higher, cmpxchg8b is required of the boot processor. If its intialisation does not produce the KF_CMPXCHG8B feature bit, Windows stops with the bug check UNSUPPORTED_PROCESSOR. If the same initialisation of any other processor misses this feature bit, then Windows stops with the bug check MULTIPROCESSOR_CONFIGURATION_NOT_SUPPORTED. In all practical cases therefore, PF_COMPARE_EXCHANGE_DOUBLE can only be TRUE in x86 versions 5.1 to early 6.0.
That the PF_COMPARE_EXCHANGE_DOUBLE feature is hard-coded to TRUE in all x64 versions is because they won’t run without it. If any processor has a clear CX8 bit in edx from cpuid leaf 1, 64-bit Windows stops with the UNSUPPORTED_PROCESSOR bug check.
The PF_MMX_INSTRUCTIONS_AVAILABLE feature is necessarily TRUE in all x64 versions.
In x86 versions 4.0 and higher, the feature is TRUE if all processors have the 0x00000100 feature bit and the kernel hasn’t disabled its use of the feature. Broadly speaking, the kernel sets this feature bit for a processor if edx from cpuid leaf 1 has a set MMX bit (23). This, of course, indicates that the processor has the mm0 to mm7 registers and the instructions that work with them.
Again, one aspect to “broadly speaking” is that vendor-specific constraints once applied. Version 4.0 recognises the MMX bit only in processors from Intel, AMD and Cyrix. Also familiar should be that this feature bit is cleared from the overall collection of feature bits if the kernel anticipates ever setting the EM bit in cr0. This happens in versions before 6.1 if no FPU is present or in versions before 5.1 if the kernel is configured to behave as if no FPU is present. Put floating-point emulation aside as archaeology, and the MMX instruction set is perhaps the least unsafe processor feature to test by executing cpuid instead of querying through IsProcessorFeaturePresent.
That the PF_MMX_INSTRUCTIONS_AVAILABLE feature is hard-coded to TRUE in all x64 versions is because they won’t run without it. If any processor has a clear MMX bit in edx from cpuid leaf 1, 64-bit Windows stops with the UNSUPPORTED_PROCESSOR bug check.
The PF_XMMI_INSTRUCTIONS_AVAILABLE feature is necessarily TRUE in x86 version 6.2 and higher, and in all x64 versions.
In x86 versions 5.0 to 6.1, the feature is TRUE if all processors have both the 0x00000800 and 0x00002000 feature bits.
The PF_3DNOW_INSTRUCTIONS_AVAILABLE feature is TRUE if all processors have the 0x00004000 feature bit.
The PF_RDTSC_INSTRUCTION_AVAILABLE feature is necessarily TRUE in x86 version 6.0 and higher, and in all x64 versions.
In x86 versions 5.0 to 5.2, this feature is TRUE if all processors have the 0x00000002 (KF_RDTSC) feature bit.
Whether the PF_PAE_ENABLED feature is TRUE for 32-bit Windows depends entirely on which kernel is loaded. Windows is supplied with as many as four kernels:
|ntoskrnl.exe||5.0 to 5.2||no|
|6.0 to 6.1||no||built as ntkrnlmp.exe, renamed at installation|
|6.2 and higher||yes||built as ntkrpamp.exe, renamed at installation|
|ntkrnlpa.exe||5.0 to 5.2||yes|
|6.0 to 6.1||yes||built as ntkrpamp.exe, renamed at installation|
|ntkrnlmp.exe||5.0 to 5.2||no|
|ntkrpamp.exe||5.0 to 5.2||yes|
The feature is TRUE for a PAE kernel, else FALSE. For 64-bit Windows, the PF_PAE_ENABLED feature is necessarily TRUE.
The PF_XMMI64_INSTRUCTIONS_AVAILABLE feature is necessarily TRUE in all x64 versions.
In x86 version 6.2 and higher, it is TRUE if all processors have the 0x00010000 feature it. Earlier versions require both this feature bit and 0x00000800.
The kernel’s use of the NX feature is subject to a large handful of configurable options. Before version 6.2, the PF_NX_ENABLED feature is TRUE if a particular feature bit is set: 0x80000000 for x86 but 0x20000000 for x64.
In x86 version 6.2 and higher and in all x64 versions, the PF_SSE3_INSTRUCTIONS_AVAILABLE feature is TRUE if all processors have the 0x00080000 feature bit. Earlier x86 versions require both this feature bit and 0x00000800.
The PF_COMPARE_EXCHANGE128 feature is necessarily TRUE in x64 version 6.3 and higher.
It is TRUE in earlier x64 versions if all processors have the 0x00100000 feature bit.
In version 6.0 only, the PF_CHANNELS_ENABLED feature is TRUE if all processors have the 0x00100000 (x86) or 0x00200000 (x64) feature bit. These feature bits have other meanings in later versions.
The PF_XSAVE_ENABLED feature is TRUE if all processors have the 0x00400000 (x86) or 0x00800000 (x64) feature bit.
The PF_SECOND_LEVEL_ADDRESS_TRANSLATION feature is TRUE if all processors have the 0x04000000 feature bit.
The PF_VIRT_FIRMWARE_ENABLED feature is TRUE if all processors have the 0x08000000 feature bit.
The PF_RDWRFSGSBASE_AVAILABLE feature is TRUE if all processors have the 0x10000000 feature bit.
The PF_FASTFAIL_AVAILABLE feature is necessarily TRUE (in the applicable versions).
The PF_RDRAND_INSTRUCTION_AVAILABLE feature is TRUE if all processors have the 0x02000000 (x86) or 0x00000001`00000000 (x64) feature bit.
The PF_RDTSCP_INSTRUCTION_AVAILABLE feature is TRUE if all processors have the 0x00000001`00000000 (x86) or 0x00000004`00000000 (x64) feature bit.