SKETCH OF HOW RESEARCH MIGHT CONTINUE AND RESULTS BE PRESENTED

RTL_BARRIER

The RTL_BARRIER (formally _RTL_BARRIER) is the control structure for the user-mode synchronisation barrier. A process may want that some number of its threads share some work and none proceed until all have done their share. The RTL_BARRIER is memory that the process provides for managing the participating threads as they reach the barrier, each in their own time, wait until all catch up, and are all released together. The process initialises the barrier, especially to set the number of participating threads, by calling the API function InitializeSynchronizationBarrier. Participating threads each call EnterSynchronizationBarrier to signal that their execution has reached the barrier. This call does not return until the configured number of participating threads have all called this function. Apparently essential to the design is that the work that is shared ahead of the barrier can be merely one phase. Having each called EnterSynchronizationBarrier to end one phase and been released when all have ended this phase, the participating threads can each call EnterSynchronizationBarrier again but now to mark that they have ended the next phase, and so on. Behavour is undefined if the function is called for the same barrier by more threads than the barrier is initialised for. When the barrier is no longer needed, the process calls DeleteSynchronizationBarrier.

Though the RTL_BARRIER structure has no known kernel-mode use and is ordinarily known to user-mode source code from a definition in WINNT.H, i.e., the standard header for user-mode programming, private symbol files that Microsoft has published for a handful of user-mode modules say the structure is defined in a header, named NTRTL_X.H, that is included by the kernel’s own source code.

Documentation Status

The RTL_BARRIER is not itself documented. It is meant to be treated as opaque, being operated on only by calling the documented API functions InitializeSynchronizationBarrier, EnterSynchronizationBarrier and DeleteSynchronizationBarrier. What these take as their SYNCHRONIZATION_BARRIER is just a typedef for the RTL_BARRIER. The SYNCHRONIZATION_BARRIER is not documented, either.

The documented API functions date from version 6.2 as exports from KERNELBASE.DLL. They repackage lower-level exports from NTDLL. These date from version 6.0 and have never been documented. The strong suggestion is that RTL_BARRIER is the original name, if only for matching the original functions: RtlInitBarrier, RtlBarrier and RtlDeleteBarrier.

Layout

The RTL_BARRIER is 0x14 or 0x20 bytes in 32-bit and 64-bit Windows, respectively. Since callers provde the memory as an uninitialised blob, these sizes are fixed in stone for as long as new versions of Windows support the functions’ use by old code. Given that callers conform to the documentation and do not interpret the memory they provide, the structure is free to change internally with any re-implementation of the functions—which was indeed done for the 1607 release of Windows 10.

Defined Structure

It happens, though, that the implementation’s changes have not redefined the structure. The only known changes of the RTL_BARRIER are in the very formal sense of how the structure is presented to programmers outside Microsoft. The published definition in WINNT.H would have it that all the structure’s members are reserved:

Offset (x86) Offset (x64) Definition Versions
0x00 0x00
DWORD Reserved1;
6.2 and higher
0x04 0x04
DWORD Reserved2;
6.2 and higher
0x08 0x08
ULONG_PTR Reserved3 [2];
6.2 and higher
0x10 0x18
DWORD Reserved4;
6.2 and higher
0x14 0x1C
DWORD Reserved5;
6.2 and higher

It is here thought that this definition was introduced for Windows 8, with the new higher-level documented API functions and the exposure of the RTL_BARRIER as the SYNCHRONIZATION_BARRIER. The definition that Microsoft uses for its own code is in the unpublished header NTRTL_X.H. This is known from private symbol files that Microsoft has released in packages of public symbols, e.g., OLE32.PDB, starting from Windows 8. In this definition, the structure is an unnamed union of two unnamed structures. The first has the structure’s meaningful members. The second has the dummy members from the public definition. The relevant lines in the unseen NTRTL_X.H will look very much like

typedef struct _RTL_BARRIER {                       // winnt
    union {
        struct {
            /* meaningful members, see below  */
        };
        struct {
            ULONG Reserved1;                        // winnt
            ULONG Reserved2;                        // winnt
            ULONG_PTR Reserved3[2];                 // winnt
            ULONG Reserved4;                        // winnt
            ULONG Reserved5;                        // winnt
        } DUMMYRESERVEDSTRUCTNAME;
    };
} RTL_BARRIER, *PRTL_BARRIER;                       // winnt

The comment in the body of the first nested structure is mine. It stands for definitions of five members, spread over probably 12 lines. Symbol files for the original Windows 10 place the opening brace of the _RTL_BARRIER definition at line 332 and of the second nested structure at line 348. There may be blank lines, comments or who knows what else in the remainder too, but not very many, since symbol files place another structure’s definition at line 370. The single-line “winnt” comments are almost certainly in Microsoft’s NTRTL_X.H with exactly this spacing. These are the lines that survive in WINNT.H, each with trailing white space to where the comment is stripped. That this structure’s extraction to WINNT.H involves a translation from ULONG to DWORD, such as known for lines that WINNT.H shares with headers such as NTDEF.H, is just a supposition: symbol files tell much but not all.

The meaningful members as known from these symbol files for version 6.2 and higher are consistent with the NTDLL implementation in version 6.0. It is here thought these were originally the whole definition, i.e., that the union, first unnamed struct and the whole DUMMYRESERVEDSTRUCTNAME member were added for version 6.2 when exposing the structure as opaque for higher-level use:

Offset (x86) Offset (x64) Definition Versions
0x00 0x00
LONG volatile Barrier;
6.0 and higher
0x04 0x04
LONG LeftBarrier;
6.0 and higher
0x08 0x08
HANDLE WaitEvent [2];
6.0 and higher
0x10 0x18
LONG TotalProcessors;
6.0 and higher
0x14 0x1C
ULONG Spins;
6.0 and higher

Strip aside some elaborations and the beginnings of the implementation are that the Barrier and LeftBarrier count participating threads on their way in to and out from the barrier. When no threads are at the barrier, as when the barrier is newly initialised, both these counts equal the number of participating threads. When a thread enters the barrier, Barrier is decremented and the thread is ordinarily made to wait. The exception is for the thread whose entry brings Barrier to zero. It resets Barrier from LeftBarrier (which, being the count of threads that left the previous phase should be also the count of participating threads), sets LeftBarrier to one (to count itself), and signals the others to leave. Each other thread that leaves the barrier increments LeftBarrier on its way out.

If what the waiting threads wait on is to be specifically an event, then because of the intention that threads can repeat their calls to RtlBarrier as all progress from phase to phase, one event does not suffice. A thread that is released when the event is set could call RtlBarrier for its next phase before the ending thread of the previous phase has yet cleared the event. The first elaboration, then is that the barrier has two events and an indicator of which is the one that threads wait on for the current phase. When a phase ends, the event for the next phase is cleared and the indicator is toggled before the event for the ending phase is set. The second elaboration is that the indicator is encoded as the high bit of the Barrier. A phase ends when decrementing the Barrier brings the low 31 bits to zero.

WRITING IN PROGRESS

New Implementation

As noted above, the implementation changed for the 1607 release of Windows 10. The definition, however, has not. At least to the 2004 edition, symbol files retain either or both of the reduced and full definitions from earlier versions. Microsoft’s name for the structure that’s interpreted by the new implementation is not known.

This new structure is 0x10 or 0x18 bytes in 32-bit and 64-bit Windows, respectively. In both, the structure has 8-byte alignment. Callers do not know this. They continue to think they are providing memory for an RTL_BARRIER, with only 4-byte alignment in 32-bit Windows, and so the new structure begins at the first 8-byte boundary at or above the given address.

Offset (x86) Offset (x64) Description Versions
0x00 0x00 volatile 64 bits in two parts:
low 32: number of threads in barrier;
high 32: phase number
1607 and higher
0x08 0x08 an RTL_SRWLOCK for safety of RtlDelete 1607 and higher
0x0C 0x10 32-bit number of participating threads 1607 and higher