Geoff Chappell, Software Analyst
Starting with Windows 8, the public symbol files for the kernel have numerous LF_UDT_SRC_LINE (0x1606) records that tell where the kernel got its definitions of structures, unions and enumerations. By where, I mean source file and line number. In all cases but one, the source file is a header file, meaning a source file that’s intended for inclusion in other source files, and its location is given as a full pathname. This allows for reconstruction of the directory tree that holds the Windows kernel header files that are included by the Windows kernel source files—and for a catalogue of unseen headers in which Microsoft defines numerous undocumented structures for kernel-mode programming.
The “but one” in this record of headers is named only by the relative pathname
That it’s just one anomaly in an otherwise rich collection might easily let it be passed over. This would be a mistake. The NTSYM.C file is central to how the public symbol files for the kernel have type information. That they do is depended on by all kernel-mode programmers who don’t have favoured access, e.g., to Microsoft’s source code or at least to private symbol files. Even when the public symbol files do have type information, some ! commands in Microsoft’s debuggers do not work as advertised. Without it, many don’t work at all—as attested by protests in Internet forums whenever the public symbol file for a recently updated kernel (or NTDLL) accidentally doesn’t have type information. Against this background of type information in public symbol files as a practical necessity, the question of how Microsoft gets type information into public symbol files has received curiously little attention.
It is here thought that NTSYM.C truly is a source file, not a header, and that the full path of the directory that contains this source file is
where srcroot is a version-dependent root for the source tree:
Also thought to be in this directory are roughly a dozen other source files:
The seven in the first set are each compiled to object files that go into a library named init.lib. This is known certainly from the the public symbol files, since module information names the object files and the library, placing them all in:
Here, objroot is a version-dependent root for build products:
Within this root, build is x86fre or amd64fre, respectively for 32-bit and 64-bit Windows. For the checked builds, also known as debug builds, which this website mostly ignores and which Microsoft itself promotes far less than in years past, build is x86chk or amd64chk, and objfre is instead objchk. The mp placeholder remains from when the kernel was built in as many as four varieties. It is paemp for 32-bit Windows versions 6.2 and 6.3, but is otherwise just mp. The arch placedholder stands straightforwardly for the processor architecture, which is either i386 or amd64. All this is consistent with Microsoft’s practice, long established in programming kits, of compiling into subdirectories according to the type of build. The least certain inference is that the source files are all in C: I do not discount that they are in C++ with extern "C" directives.
The second set may also contain files named ntoskrnl.c and ntkrnlpa.c as relics of earlier versions for which the kernel is built both with and without PAE support (in 32-bit Windows) and with and without multi-processor support. Single-processor kernels were dropped with Windows Vista and since 32-bit Windows 8 requires PAE, it’s very possible that only two types of kernel continue to be built: ntkrpamp.exe for 32-bit Windows and ntkrnlmp.exe for 64-bit Windows. Both anyway get renamed to plain old ntoskrnl.exe for installation. Whichever is wanted, the corresponding ntkrpamp.c or ntkrnlmp.c is compiled to an object file. Each is nearly trivial, for although the object file is linked into the binary (see below), it contributes no code or data.
The third set has the least certain inference about the source files. The module information tells of ntkrpamp.exp and ntkrnlmp.exp, respectively, for 32-bit and 64-bit Windows. If these are anything like typical, they are object files produced by the linker acting as the librarian in response to a specification of exports. The most easily maintained specification of the kernel’s thousands of exports would be as module definition files, presumably named ntkrpamp.def and ntkrnlmp.def, which might in turn be generated by the compiler’s pre-processor acting on conditional-compilation directives in a common source file. In Microsoft’s practice for this, again as long established in programming kits, such source files have the .src extension. Only one is needed: it would presumably be named ntoskrnl.src.
Finally, ntoskrnl.rc is natural as the source file from which the Resource Compiler produces the ntoskrnl.res that is named in the module information.
The srcroot\minkernel\ntos\init directory is in some sense the home directory of the kernel’s source code. The object file ntkrpamp.obj or ntkrnlmp.obj is linked with init.lib and many other libraries, mostly from other subdirectories of source files that are specifically for the kernel, and with ntkrpamp.exp or ntkrnlmp.exp, and with ntoskrnl.res, and thus is the kernel created as a binary.
Note that NTSYM.C has no place in this sketch of how the kernel gets built.
The point to caring where NTSYM.C fits among Microsoft’s source files on Microsoft’s build machine—which we can’t expect ever to see—and to caring even more what NTSYM.C contains and how it’s built, is that only by the separate compilation of NTSYM.C do the public symbol files for the kernel have any type information at all.
Except that the merging of the HAL into the kernel for 64-bit Windows 10 Version 2004 brings in an NTHALSYM.C as a complication to put aside for now, NTSYM.C, taken together with the headers that it includes, is the source of all type information in public symbol files for the Windows kernel. Although the included headers surely must be involved in building the kernel, NTSYM.C itself is not. Stricly speaking, the type information in the kernel’s public symbol files is not the kernel’s type information: it is the NTSYM.C file’s.
This is not without implications, most notably that type information in public symbol files is not as certainly correct as many suppose it must be. Type information in a binary’s private symbol file is in there from compiling and linking the binary. Its correctness is that of the compiler and linker. Type information in a public symbol file is in there from separate compilation. Its correctness, relative to how the code in the binary uses the types, depends on how closely the separate compilation matches the binary’s compilation. Discrepancies must be rare, perhaps even rare enough not to worry about, but avoiding them needs care at Microsoft, perhaps more care than is prudently taken for granted.
That nothing from NTSYM.C is linked in to the kernel is a safe deduction from the public symbol files. Even a stripped PDB has module information and section contributions (in PDB stream 3). The public symbol files for the kernel have these for very many object files, but none for any object file that’s obviously compiled from an NTSYM.C. Yet compilation of NTSYM.C is recorded in the public symbol files. The PDB stream (4) that has the relatively new LF_UDT_SRC_LINE records of which headers define which types also has an LF_BUILDINFO (0x1603) record. Full PDB files typically have many such records, one for each source file that got compiled. Stripped PDB files ordinarily have none, compilation details surely counting as stuff “you would not want to ship to your customers” (as Microsoft puts it when documenting the linker’s switch for creating a stripped PDB). Yet the public symbol files for the kernel in Windows 8 and higher have this one record of compiling this one source file. Much in this record has no immediate value for the present discussion, but since the business of this page is to note what can be deduced—or at least be inferred with high confidence—that might otherwise be thought secret, it is perhaps as well to be detailed:
|Current Directory:||srcroot\minkernel\ntos\init\mp||6.2 and higher|
|Build Tool:||vcpath\x86\cl.exe||6.2 and higher (x86)|
|vcpath\amd64\cl.exe||6.2 and higher (x64)|
|Source File:||objroot\minkernel\ntos\init\mp\objfre\arch\typeinfo\ntsym.c||6.2 and higher|
|Program Database File:||objroot\minkernel\ntos\init\mp\objfre\arch\typeinfo\kernel.pdb||6.2 and higher|
|Command Arguments:||-DPASS2_COMPILE||6.3 and higher|
|-nologo||6.2 and higher|
|-Gz||6.2 and higher (x86)|
|-Zi||6.2 and higher|
|-c||6.2 and higher|
|-Zc:wchar_t-||6.2 to 1803|
|-Zc:implicitNoexcept-||1607 to 1709|
|-Zc:threadSafeInit-||1607 and higher|
|-Zc:sizedDealloc-||1607 and higher|
|-Wv:17||1607 to 1709|
|-d1vc7dname||1607 to 1703|
|-vc7dname||1709 to 1903|
|-d1vc7dname||2004 and higher|
|-d1NonStandardNewDelete||1607 to 1703|
|-NonStandardNewDelete||1709 to 1903|
|-d1NonStandardNewDelete||2004 and higher|
|-Zc:wchar_t-||1809 and higher|
|-MT||6.2 and higher|
|-TC||6.2 and higher|
|-X||6.2 and higher|
The compiler’s location on Microsoft’s build machine is a good example of a detail that plausibly is completely irrelevant, but for completeness I note that the vcpath varies as follows:
Perhaps some value—or slight amusement—will come from the strong suggestion that the x64 kernel is compiled by an x86 program, i.e., by a cross-compiler, even years after Microsoft’s development of a native x64 compiler.
Listing all the compiler switches is also for completeness, mostly. Some may be incidental. Since NTSYM.C does not contribute code to the kernel (nor, almost certainly, to any binary), the switches for disabling recently standardised C++ behaviour look to be superfluous, except perhaps to mark that Visual Studio’s increasing accommodation of standards that have been developed for convenience in application programming threatens to leave kernel-mode programmers with ever more need for switches to turn this stuff off. That said, some of the switches are clues, including for being not present. As listed in this PDB record, they are typically are not exactly the switches as given. Notably, -Fd and -Fo, if given, are not recorded, but -FI and -I would be, and -I switches would ordinarily be confected for directories named by the INCLUDE environment variable. The three from -MT onwards look to me to be compiler-generated, the -X recording that there is no INCLUDE variable.
This last observation is a clue that the compilation that’s recorded in the LF_BUILDINFO is of a source file that has already been pre-processed. The current directory at the time of this compilation is the mp subdirectory, so that the ..\ntsym.c in the LF_UDT_SRC_LINE record mentioned at the outset is apparently an NTSYM.C in srcroot\minkernel\ntos\init. This is the original NTSYM.C source file. It is pre-processed, with output captured as the intermediate source file, also named NTSYM.C but deep into the tree of build products. In this intermediate source file, #include directives are gone, having been replaced by the contents of the included headers, and #line directives identify these headers by full pathnames. Content from the original source file, in contrast to included headers, is represented by #line directives that identify the original source file by its relative pathname. The compilation that’s recorded in the LF_BUILDINFO is of this intermediate NTSYM.C, and is incorrect if it needs any means of finding more headers to include.
The last clue to note is what PDB file is recorded as the output of compiling the intermediate NTSYM.C file. See that it has the name of the already built kernel, i.e., ntkrpamp or ntkrnlmp. The whole point is that this PDB file is not new output from compiling NTSYM.C but is instead a stripped PDB into which this separate compilation will merge its otherwise private type information.
Thus do the public symbol files for the kernel record how they were built. Given that the kernel is compiled and linked, to have produced a stripped PDB as objroot\minkernel\ntos\init\mp\objfre\arch\typeinfo\kernel.pdb (and presumably also a full PDB as objroot\minkernel\ntos\init\mp\objfre\arch\kernel.pdb), compilation of NTSYM.C goes something like
cl kernel_switches -E ..\ntsym.c > objroot\minkernel\ntos\init\mp\objfre\arch\typeinfo\ntsym.c
cl typeinfo_switches -Fdobjroot\minkernel\ntos\init\mp\objfre\arch\typeinfo\kernel.pdb -Foobjroot\minkernel\ntos\init\mp\objfre\arch\typeinfo\ntsym.obj objroot\minkernel\ntos\init\mp\objfre\arch\typeinfo\ntsym.c
Note that step 2 has no direct evidence in the PDB. The typeinfo_switches for the compiler at step 4 are the switches from the LF_BUILDINFO but the kernel_switches for the compiler in step 2 are unknown.
No NTSYM.C is available for inspection, but some of its content can be known with reasonable certainty from the public symbol files. In both 32-bit and 64-bit Windows, these have it that the following type definitions are in the ..\ntsym.c file itself, not from inclusion of any header:
|70 (6.2 to 1607);
80 (1803 to 1809);
|struct _ETIMER||6.2 and higher|
|97 (6.2 to 10.0)||struct _POOL_BLOCK_HEAD||6.2 to 10.0|
|102 (6.2 to 10.0)||struct _POOL_HACKER||6.2 to 10.0|
|107 (6.2 to 10.0)||struct _SEGMENT_OBJECT||6.2 to 10.0|
|119 (6.2 to 10.0)||struct _SECTION_OBJECT||6.2 to 10.0|
The reason the symbol files show these types as defined in a source file, not a header, may be that they actually aren’t defined in any header. These five structures are odds and sods. All are known in public symbol files starting from Windows 2000 SP3 but four are dropped after the original Windows 10.
The _ETIMER certainly remains in use—it is the timer object to which a handle can be obtained even from user mode by calling NtCreateTimer—but its definition believably isn’t needed anywhere in the kernel’s source code except for the Executive’s TIMER.C (here presumed as the source file for the TIMER.OBJ that the public symbol files identify as the linker’s source not only of functions such as NtCreateTimer but of all internal routines that I can see as relevant). If the _ETIMER is in fact defined in a kernel source file that is not a header, then the definition that shows in the public symbol files for the kernel is a copy-and-paste from the source file into NTSYM.C. Presumably, it’s needed in NTSYM.C so that _ETIMER shows in the public symbol files and the KDEXTS debugger extension can offer !timer as a command that works without needing private symbols.