Geoff Chappell, Software Analyst
If only for now, this article is specific to 32-bit Windows (i386 or x86).
The 32-bit Windows kernel started using the 8-byte compare-exchange instruction (cmpxchg8b) in version 4.0. At first, the instruction had only a few uses, for more efficient coding of the following exported functions:
and of one internal routine that improves on the ExInterlockedAddLargeInteger function (exported in version 3.51 and higher). With successive versions, cmpxchg8b found ever more use, not just in more exported functions, such as ExInterlockedFlushSList (added in version 5.0), but especially internally, e.g., for working with 64-bit page table entries when using Physical Address Extension (PAE).
Curiously, the Driver Development Kit (DDK) for Windows NT 4.0 left ExInterlockedCompareExchange64 undocumented, which is conspicuous because that particular function is little but a wrapper to get C-language arguments into appropriate registers for executing the instruction:
The Lock argument provides for the “interlocked” functionality to be implemented without the cmpxchg8b instruction. All use of cmpxchg8b can be coded without the instruction, but at the price (in a multi-processor coding, for which temporarily disabling interrupts does not suffice) of having the caller provide storage for a primitive synchronisation object known as a spin lock. For instance, the single instruction
lock cmpxchg8b qword ptr [esi]
is replaceable with the following sequence
pushfd try: cli lock bts dword ptr [edi],0 jnb acquired popfd pushfd wait: test dword ptr [edi],1 je try pause ; if available jmp wait acquired: cmp eax,[esi] jne fail cmp edx,[esi+4] je exchange fail: mov eax,[esi] mov edx,[esi+4] jmp done exchange: mov [esi],ebx mov [esi+4],ecx done: mov byte ptr [edi],0 popfd
provided that the 8 bytes at esi are never modified without acquiring the spin lock at edi which is in turn never used for any other purpose than guarding those 8 bytes. Even putting aside the undesirability of depending on all users of those 8 bytes to cooperate regarding the spin lock, there is the problem that the replacement is a lot of code, not just in terms of space but of execution time. Just for its savings on this point, cmpxchg8b is clearly a nice feature to have to hand, and it was a natural addition when Intel’s processors started working with a 64-bit external bus.
In the early days of Windows NT, however, not all the extant processors implemented the cmpxchg8b instruction. In versions before 5.1, every function that uses the instruction has an alternate coding for processors that do not support the instruction. Very early during its initialisation, the kernel checks whether the boot processor supports the cmpxchg8b instruction. If the support is missing, the kernel patches jmp instructions at the start of each of those functions to redirect execution to their alternates. Conversely, if the boot processor does support the instruction, and the functions are left unpatched, then the kernel requires all processors to support the instruction, under pain of the bug check MULTIPROCESSOR_CONFIGURATION_NOT_SUPPORTED (0x3E). Version 5.1 dropped the alternate codings and made it mandatory that the boot processor support the cmpxchg8b instruction. Without this support, these versions of the kernel raise the bug check UNSUPPORTED_PROCESSOR (0x5D).
If reading only Intel’s literature, one might think that testing for the cmpxchg8b instruction is a simple matter of executing the cpuid instruction with 1 in eax and testing for the CX8 bit (masked by 0x0100) in the feature flags that are returned in edx. However, there have always been quirks.
Versions 4.0 and 5.0 test for the cmpxchg8b instruction twice. The first test applies only to the boot processor. Its purpose is to determine whether functions that use the instructions must be patched, as described above. If the CPUID bit (masked by 0x00200000) in the eflags register cannot be changed, then there is no cpuid instruction let alone cmpxchg8b. The kernel then sets the CPUID bit and executes the cpuid instruction with 1 in eax to produce the feature flags in edx. If these feature flags have the CX8 bit set, then cmpxchg8b is supported and no patches are needed. The second test is done for all processors, including the boot processor, as part of a wider examination of processor features. In builds of version 4.0 from before Windows NT 4.0 SP4, however, this second test recognises the CX8 bit only if the processor’s vendor string is GenuineIntel, AuthenticAMD or CyrixInstead. (The vendor string is the sequence of characters obtained by executing cpuid with 0 in eax and then storing the values of ebx, edx and ecx at successive memory locations.)
For processors that set the CX8 bit but are not made by Intel, AMD or Cyrix, the two tests do not agree and the processor falls foul of the requirement that if the boot processor supports cmpxchg8b then all processors must. The result is the bug check MULTIPROCESSOR_CONFIGURATION_NOT_SUPPORTED—yes, even if there is only one processor. In the Knowledge Base article CMPXCHG8B CPUs in Non-Intel/AMD x86 Compatibles Not Supported, Microsoft is at best disingenuous in suggesting that the first test is only a rough guess from the processor’s “type”, which a second test must “verify” by querying for “specific features”: both tests are specifically for the CX8 bit.
Despite its acknowledgement of trouble caused by restricting one CPU feature to known vendors, Microsoft took its time to relax the restrictions for other features. Although the version 4.0 from Windows NT 4.0 SP4 removes the vendor restrictions from its test for the CX8 bit, other feature flags of interest to it are recognised only for particular vendors:
Only with version 5.0 did Microsoft stop its routine practice of excluding unknown (or unfavoured) CPU manufacturers from having Intel-compatible features be usable by Windows.
Meanwhile, of course, the CPU manufacturers who were initially excluded from having Windows use their cmpxchg8b instruction will naturally have wanted to sell their processors. Even once they obtained recognition in new Windows versions, they will just as naturally have wanted to sell their processors even to people who might want to run an early build of Windows NT 4.0. To avoid the bug check on these versions, the processor must start with CX8 clear. Inevitably, these vendors developed ways to turn the CX8 bit off, and even to have it turned off by default. After all, if they’re going to sell a CPU for use in computers that might be sold to just about anyone, then to compete at all they need that all Windows versions are at least usable on their processors, even if less than optimally. Competing equally might then be possible if Microsoft would make up for the earlier omission and build into new versions of its kernel some recognition of these processors to turn the CX8 bit back on. Again, Microsoft took its time.
Starting with version 5.1, which is also the first version that won’t start without support for cmpxchg8b, the Windows kernel makes special cases for processors that may implement the cmpxchg8b instruction but do not show the CX8 bit in the feature flags. The processors that are catered for are identified by the vendor strings GenuineTMx86, CentaurHauls and (a little later) RiseRiseRise. The following notes describe the provisions as actually made by the Windows kernel, and are in no way concerned with how well the implementation corresponds with documentation by the vendors.
For GenuineTMx86 processors starting with family 5 model 4 stepping 2, if the cmpxchg8b instruction is not indicated in the CPU feature flags, it is enabled by setting the 0x0100 bit in the model-specific register 0x80860004.
The previous paragraph describes the presumed intention. As actually coded, the model and stepping, taken together, must be at least 4 and 2 even if the family is greater than 5. A hypothetical family 6 model 1 stepping 1 would need to have the CX8 bit set or cleared in advance of booting Windows, depending on which version is to be run.
If the CPU feature flags for a CentaurHauls processor do not show support for cmpxchg8b, then the support is enabled by slightly different methods depending on the family:
Clearing the 0x01 bit is omitted in early builds. It begins with the version 5.1 from Windows XP SP2 and the version 5.2 from Windows Server 2003 SP1.
Starting with the version 5.1 from Windows XP SP2 and the version 5.2 from Windows Server 2003 SP1, all RiseRiseRise processors are treated as supporting cmpxchg8b, even without the CX8 bit in the feature flags from cpuid.