Geoff Chappell, Software Analyst
The spin lock is the kernel’s simplest synchronisation object. Indeed, it is not formally an object, just a pointer-sized variable. This is formalised by its definition: the KSPIN_LOCK is not a structure, just a ULONG_PTR.
The simplicity of spin locks means that they are in one sense the least demanding of all synchronisation objects. The more arbitrary the circumstances of your execution, the more likely that other synchronisation objects are out of bounds. Generally speaking, no other synchronisation objects are available to kernel-mode code that can’t be certain of executing in a non-arbitrary thread at no higher an Interrupt Request Level (IRQL) than APC_LEVEL. The exceptions at DISPATCH_LEVEL are very few: some synchronisation objects can be signalled, but waiting never is permitted. Spin locks, in contrast, can be waited on at any IRQL, not just at DISPATCH_LEVEL but at the even higher IRQL of hardware interrupt routines.
In another sense, spin locks are the most demanding. They are simple only because they greatly restrict everything else. In ordinary execution at PASSIVE_LEVEL or even at APC_LEVEL, just asking to acquire a spin lock raises the IRQL to DISPATCH_LEVEL. This of itself constrains the thread that now owns the lock. Notably, it must cause no exception even from touching pagable memory. But at least this thread can do something: no other can execute at all on the same processor. And if any thread on another processor contends for the lock, then that thread is made to spin in a tight loop, also at DISPATCH_LEVEL, and no other thread gets to run on this processor either. The potential disruption for other people’s software while holding a spin lock, and especially while multiple processors contend for the same lock, is presumably significant, for Microsoft has been strikingly consistent with its warnings that “No routine should hold a spin lock for longer than 25 microseconds” (even if this warning is not always obeyed by Microsoft itself).
Also special to spin locks is that their simplicity does not allow for re-entering (as if, for instance, a spin lock is some sort of mutex for processors). If a thread tries to acquire a spin lock that it already owns, the thread (and thus the processor that it executes on) hangs. Not for nothing is Microsoft’s documentation of spin locks loaded with warnings about how careful programmers must be when using spin locks.
There are nowadays two distinct types of spin lock as implemented by the Kernel Core and exposed to programmers through exported functions. The basic spin lock that’s discussed below is ancient. Let’s call it classic. It has to some extent been supplanted by the queued spin lock. This dates from Windows 2000 internally but was not developed for general use until Windows XP. Both these types of lock are a KSPIN_LOCK. A newer type is implemented in the Executive but although its implementation is not so very much different and has the same instrumentation, it is instead an EX_SPIN_LOCK and is presently left to be taken up elsewhere.
The kernel’s two types were for many years implemented together, i.e., in the one source file. The .DBG files for the Windows NT 3.1 kernels confirm that spin locks were originally coded in assembly language. The 32-bit source code plausibly was still in assembly language for Windows 7. For 64-bit Windows, meanwhile, the implementation started in C and separated the functions for queued spin locks into their own source file. Not until Windows 8 is this code obviously used for 32-bit Windows.
For the coding in C, the exported functions for the classic spin lock are defined inline in headers. This has practical consequences for those who debug or reverse engineer the kernel: many large stretches of code—as long as several dozen instructions in 64-bit Windows 7—are nothing but an inlined acquisition or release of a spin lock.
The classic spin lock really is as basic as can be. Its entire state is just the 0x01 bit of the KSPIN_LOCK. This bit is set while the lock is owned, and clear while the lock is available.
That said, initialisation clears the whole lock to zero. Starting with the version 5.2 from Windows Server 2003 SP1, the spin lock’s state is tested by whether the whole lock is non-zero, but this is only for efficiency and for commonality with the queued spin lock (so that the simple coding of the KeTestSpinLock function works with both types). While making asides, it may be as well to mention that the whole spin lock is meaningful to debug builds, which these notes ordinarily ignore as being infeasible to cover. Debug builds set the whole lock to the address of the owner’s KTHREAD but still with the 0x01 bit set. Release by any other thread would be a serious error that the release builds do not catch but which the debug builds pick up as a bug check.
Given that the IRQL is already at or above DISPATCH_LEVEL, acquiring an unowned spin lock is a simple matter of setting the 0x01 bit and finding that it wasn’t already set. Even the 80386 has the lock bts instruction for doing this atomically. A contending processor need just execute this same instruction over and over for as long as it keeps setting the carry flag. On exiting the loop, the processor owns the lock. There are elaborations, of course, especially to reduce the impact from contention. From the start, the lock bts instruction is executed only when trying to claim the lock. When this shows the lock was owned, the better loop is just to keep testing the bit, avoiding the inter-processor effects of a lock prefix, until the lock seems ready to try claiming again. Version 5.0 helps further with a pause between the bit tests and the version 6.0 from Windows Vista SP1 conditionally notifies a hypervisor. Version 6.1 adds performance counting and, in 64-bit Windows, instrumentation (which 32-bit Windows picks up in version 6.2). Through all this development, acquiring the spin lock remains at its heart that while the processor is stopped from switching to another thread, a lock bts instruction to set the low bit of the lock is repeated until it clears the carry flag.
Releasing a spin lock is simple: just clear the bit. If any other processor wants the lock, whether it has been spinning on the lock or happens to ask just then, one of them will find that it has set the bit and become the lock’s new owner. To avoid reading from the lock just to clear a bit, even the earliest version clears a whole byte of the lock. Version 5.2 adds the lock prefix. The C-language implementation extends this to clearing the whole lock.
Historically, if not still, the IRQL is the HAL’s business. Spin locks are creatures of the kernel but the HAL’s management of the IRQL was dealt with originally by exporting some of the functionality from the kernel and some from the HAL. The kernel exports only the functions that work with lock but have nothing to do with the IRQL: first for initialising spin locks; and then for acquiring and releasing them while the IRQL stays at or higher than DISPATCH_LEVEL. Compound functions that both adjust the IRQL and manage the lock are all exported from the HAL. This division got reorganised for 64-bit Windows so that all x64 builds have spin locks entirely as kernel functionality. For 32-bit Windows, this reorganisation waited until version 6.2 adopted the coding in C. The HAL’s functions then moved to the kernel: they continue to be exported from the HAL, but only as forwards to the kernel.
|Function||HAL Versions (x86 Only)||Kernel Versions|
|KeAcquireSpinLock||all||6.2 and higher (x86 only)|
|KeAcquireSpinLockRaiseToDpc||5.2 from Windows Server 2003 SP1, and higher (x64)|
|KeAcquireSpinLockRaiseToSynch||4.0 and higher||5.2 from Windows Server 2003 SP1, and higher (x64);
6.2 and higher (x86)
|KeReleaseSpinLock||all||5.2 from Windows Server 2003 SP1, and higher (x64 only);
6.2 and higher (x86)
|KeTestSpinLock||5.2 and higher|
|KeTryToAcquireSpinLockAtDpcLevel||5.2 from Windows Server 2003 SP1, and higher|
|KefAcquireSpinLockAtDpcLevel||3.50 and higher (x86 only)|
|KefReleaseSpinLockFromDpcLevel||3.50 and higher (x86 only)|
|KfAcquireSpinLock||3.50 and higher||6.2 and higher (x86 only)|
|KfReleaseSpinLock||3.50 and higher||6.2 and higher (x86 only)|
|KiAcquireSpinLock||all (x86 only)|
|KiReleaseSpinLock||all (x86 only)|
The undocumented KefAcquireSpinLockAtDpcLevel and KefReleaseSpinLockFromDpcLevel are coded exactly as the documented KeAcquireSpinLockAtDpcLevel and KeReleaseSpinLockFromDpcLevel functions except for having the __fastcall convention. The KfAcquireSpinLock and KfReleaseSpinLock functions are similarly related, though less exactly for the former, to KeAcquireSpinLock and KeReleaseSpinLock. All four of these undocumented functions have C-language declarations at least as early as the DDK for Windows NT 3.51 to support macro redefinitions of the older functions so that new drivers use the presumably faster new functions without needing to change the source code.
Though their names do not say so explicitly, the KiAcquireSpinLock and KiReleaseSpinLock functions do not change the IRQL. In version 3.10, if only in the release builds, they exactly duplicate KeAcquireSpinLockAtDpcLevel and KeReleaseSpinLockFromDpcLevel. Version 3.50 changed them to the __fastcall convention, which left them as exact duplicates of what were then the new KefAcquireSpinLockAtDpcLevel and KefReleaseSpinLockFromDpcLevel (and also made them the oldest examples of Microsoft breaking an exported function from one version to another). Version 6.2 essentially eliminated them: they continue as kernel exports, but plausibly only as aliases defined in an EXPORTS section in the kernel’s module definition file.
Why KiAcquireSpinLock and KiReleaseSpinLock ever existed is unclear. The kernel calls them internally, and even liberally in the early versions, but also calls the documented functions (or, later, their __fastcall counterparts), also liberally. The difference is in the debug builds. The oldest obtained for inspection is version 3.51. It has KefAcquireSpinLockAtDpcLevel enforce what was then documented as a requirement: if the IRQL is not exactly DISPATCH_LEVEL, the debug build of this function stops Windows with the IRQL_NOT_GREATER_OR_EQUAL bugcheck. The internal function KiAcquireSpinLock, by contrast, does not.