Geoff Chappell, Software Analyst
The public symbol file NTKRPAMP.PDB for the original release of Windows 10 tells that the kernel is built with the NTDEF.H header at
and draws from it the type definitions that are tabulated below.
Nowadays, NTDEF.H is among the headers in the Software Development Kit (SDK). It is there in the “shared” subdirectory with many other headers that are intended for use in both kernel-mode and user-mode programming. The SDK is intended to be installed before the Windows Driver Kit (WDK), typically from having installed Visual Studio first. Before Windows 8, kernel-mode programming kits were self-standing and so NTDEF.H is in the WDK and is earlier in the Device Driver Kit (DDK). Indeed, NTDEF.H is ancient, being one of relatively few headers in the DDK for Windows NT 3.1.
NTDEF.H defines many of the basic types for all of Windows programming. It is directly included by each of the standard headers for kernel-mode programming, here meaning WDM.H, NTDDK.H and NTIFS.H. Much of its content is duplicated in other headers, both for kernel-mode and user-mode programming. The line numbers on the left are known from the symbol file. All agree with the NTDEF.H from the SDK for the original release of Windows 10. The line numbers on the right are from headers that are readily available in the WDK or SDK and which are thought to acquire their definitions of these types as duplicates from NTDEF.H.
|1086||unnamed struct for u in _LARGE_INTEGER||867||873||796|
|1104||unnamed struct for u in _ULARGE_INTEGER||885||891||814|
Whether the kernel’s source code includes NTDEF.H directly or through some other header is not known, though the latter seems more likely. It is known, however, that this header is not WDM.H, NTDDK.H or NTIFS.H.
However NTDEF.H gets included by the kernel’s source code, it’s included early. This is true also for nearly all kernel-mode source code since a standard header is typically the first inclusion and it in turn includes NTDEF.H very early. What type information is present in Microsoft’s public symbol files for kernel-mode executables therefore begins identically in almost all of them and in very much the same way in the handful of exceptions. If you’re a reverse engineer—or even if you’re a Microsoft programmer who is concerned about what’s revealed by public symbol files or has an opinion about what can only be learnt from source code—you might do well to familiarise from NTDEF.H as the readiest example of type information getting into symbol files from headers.
In the public symbol files for the kernel, and in most others that have type information, NTDEF.H is the first header to be named in the PDB stream (4) that tells which user-defined types came from which headers, but it is only the second to contribute type information (to stream 2). How this happens is that before NTDEF.H defines any class, structure, union or enumeration, it includes BASETSD.H, which in turn defines several inline routines that use built-in types with just enough elaboration to need their own entries in the type information. Use of void const * and void const * __ptr64 by PtrToPtr64 and Ptr64ToPtr in BASETSD.H accounts for the first three type-information entries in public symbol files for the 32-bit kernel. These routines are macros when building for 64-bit Windows. Instead, the first seven entries in public symbol files for the 64-bit kernel record that the inline routines HandleToULong, ULongToHandle, LongToHandle, IntToPtr, UIntToPtr and Ptr32ToPtr use void const *, unsigned long const, long const, int const, unsigned int const and void const * __ptr32. Why should you trouble about simple compounds of built-in types? Not for themselves, of course, but to learn from the start that just using a type in the body of an inline routine is enough to get type information created, regardless of whether the inline routine is ever referenced anywhere else in any code.
Next come the first entries from NTDEF.H itself. These demonstrate the same point but now for user-defined types, here specifically structures. Type information is created for LIST_ENTRY64 and LIST_ENTRY32 because of their use by the inline routine ListEntry32To64. The only way the kernel is unusual in this respect is that its symbol file would eventually pick up these structures from later use, .e.g., from the TlsLinks member of the _TEB64 structure. Almost every kernel-mode executable whose public symbol file has any type information at all has it for LIST_ENTRY32 and LIST_ENTRY64 just for including NTDEF.H. See that it doesn’t matter that the executable’s own code makes no use of the structures, just that NTDEF.H uses them in inline routines even if these routines never are inlined into any of the executable’s code. For these particular structures, the immediate result is nothing but a small waste of space in the symbol files. For other structures, it’s a possibly unwanted disclosure.
Unnoticed use of a type by otherwise unused inline routines in headers seems all too possible as the main mechanism by which Microsoft’s programmers intend that a structure is their internal plaything, yet the structure’s name and the names, types and offsets of its members end up in public symbol files and then as common knowledge. Microsoft’s programmers have even opined in public that inline routines in headers that Microsoft doesn’t publish are secrets whose knowledge outside Microsoft is explained only by leaked source code. The reality is that if a routine is declared in a header and then is used anywhere else, even just in the same or another header, then if the public symbol file has type information, the routine’s type is disclosed. So too is its name, if building with the compiler from Visual Studio 2012 or later. This is very much the sort of disclosure that might be missed by programmers, even Microsoft’s, but also by reverse engineers (since their work seems to depend ever more on what their tools tell them and less on actually knowing their craft).
This disclosure of inline routines can be seen at work just from NTDEF.H as the most ready example. Indeed, it is shown by the very next entries in the type information in the public symbol files for the kernel. It comes about because NTDEF.H includes GUIDDEF.H which in turn includes STRING.H from the kernel-mode implementation of the C Run Time (CRT). This STRING.H defines strnlen as an inline routine. Its use of char const * creates type information, as pointed out above. What’s new in our quick survey of simple examples is that strnlen is called from another inline routine, strnlen_s, not much further into STRING.H. That the one routine is called from another counts as use of a type. Specifically, it creates type information for a pointer to a function with the prototype of the referenced routine. Moreover, the referenced routine gets named in the PDB stream that tells which headers supplied which types. See that it doesn’t matter whether either routine is used anywhere else in the kernel’s source code. Mere inclusion of STRING.H is enough to leave a trace of the inline routine.
NTDEF.H is also a ready illustration of how some, if not many, headers in the WDK and SDK are created from some sort of script or master header that extracts from yet more headers. This applies especially to some of the most prominent headers: WDM.H, NTDDK.H and NTIFS.H for kernel-mode programming and WINNT.H for user-mode programming. As noted above, those three standard headers for kernel-mode programming include NTDEF.H directly. Some specialised drivers, known as minidrivers, interact with the kernel through a port driver. Ideally, they have no direct interaction with the kernel and so their source code does not need any of the big three standard headers. What they instead include from the WDK is either MINIPORT.H or MINITAPE.H. Rather than include NTDEF.H, these headers duplicate much of it. This duplication can also be seen in the WUDFWDM.H that is a standard inclusion in source code for user-mode drivers and in WINNT.H which almost all user-mode Windows source code includes indirectly through WINDOWS.H.
Each of these headers has one contiguous region in which each line is a duplicate or slight edit of a corresponding line in NTDEF.H. For the headers from the WDK and SDK for the original release of Windows 10, these regions are:
Moreover, the correspondence is well-ordered: for each line in succession in these headers, the corresponding line is further into NTDEF.H. These corresponding lines in NTDEF.H make disjoint regions. Since NTDEF.H is not very large, the full map is perhaps instructive without being too tedious:
The map is consistent with a process of extraction. At some point in preparing each output header, the extraction selects NTDEF.H as input, parses successive lines in NTDEF.H, and extracts some to the output. Be aware, though, that this is only the simplest process that is consistent with the files as observed. The input could instead be another header that is also the input for constructing NTDEF.H. Either way, the choosing of which lines to extract is fully accounted by directions within NTDEF.H.
These directions for selecting which lines are in both NTDEF.H and another header are keywords in single-line comments. What can be seen in NTDEF.H are keywords to
It is observed that the keywords for these three cases are begin_key, end_key and key, where the placeholder key differs for each output header:
|Begin Range||End Range||Same Line||Versions||Output Header|
|begin_ntminitape||end_ntminitape||4.0 and higher||MINITAPE.H|
|begin_ntndis||end_ntndis||ntndis||3.51 to 5.2||NDIS.H|
|6.0 and higher|
|begin_ntoshvp||end_ntoshvp||6.2 and higher|
|begin_r_winnt||end_r_winnt||r_winnt||4.0 and higher||WINNT.RH|
|windbgkd||3.10 to 4.0||WINDBGKD.H|
|begin_windbgkd||end_windbgkd||5.0 and higher|
|begin_wudfwdm||end_wudfwdm||6.2 and higher||WUDFWDM.H|
Whether Microsoft still has a header named WINDBGKD.H is not known. None is supplied with any WDK or SDK nowadays, but a header with this name was supplied among the directories of sample code up to and including the DDK for Windows NT 4.0 and in the ordinary INC directory in the DDK for Windows 2000. Its contents in these versions are consistent with extraction directed by windbgkd comments. It is not impossible that the windbgkd comments remain in NTDEF.H even though no WINDBGKD.H is ever created or that they now govern extraction to some unpublished header that superseded WINDBGKD.H.
Comments for extraction to NDIS.H must be vestigial. They evidently were active for the NDIS.H that Microsoft supplied first among the NETWORK samples in the DDK for Windows NT 3.51 and later among the general headers in the DDK for Windows 2000. Before Windows Vista, NDIS.H was not just a standard inclusion for network drivers: it also aimed to limit these drivers to interacting with NDIS.SYS, and thus only indirectly with the kernel, much as if network drivers are miniport drivers and NDIS.SYS is the corresponding port driver. Much like MINIPORT.H still, NDIS.H in these early versions has its own knowledge of the kernel, substantially less than defined in WDM.H. This reduction shows in NDIS.H as its own section of INTERNAL DEFINITIONS. The first thousand lines or so are duplicated from NTDEF.H, consistently with extraction according to ntndis comments. For Windows Vista, however, this section of NDIS.H was reworked to include NTDDK.H.
No header from any WDK or SDK is known to have lines in common with NTDEF.H such as selected by ntoshvp comments. An obvious guess is that Microsoft has a header named NTOSHVP.H and even that it’s something like NTOSP.H but for Hyper-V components. The guess may even be sound. Headers other than NTDEF.H have comments that are similarly suggestive of an NTHAL.H and the public symbol files for the HAL do indeed confirm that an NTHAL.H is compiled when building the HAL. No such sign, however, is known of an NTOSHVP.H. If it exists, Microsoft is keeping it very private.
Lines in MINIPORT.H and MINITAPE.H that have corresponding lines in NTDEF.H are exact duplicates from NTDEF.H, but some lines in WINNT.H and WUDFWDM.H differ very slightly from their corresponding lines in NTDEF.H. What editing, if any, is done of each line in the output is apparently specified as part of the process, not from directions in the input. No evidence is known for the mechanism, only for what it must be capable of.
One option for editing the output concerns the comments that direct which lines to extract. Inasmuch as this extraction is a detail of construction, an ideal might be that these comments stay in master headers, which Microsoft keeps private, and are eliminated from headers that Microsoft publishes. Instead, some such comments are in plain sight all the way back to the DDK for Windows NT 3.1. Just as plain is that elimination is provided for but also that it is applied imperfectly. Exactly how it works is unclear. At one extreme, all lines that MINIPORT.H and MINITAPE.H have in common with NTDEF.H have the comments intact. At the other extreme, WUDFWDM.H has none. For instance, where NTDEF.H has begin_wudfwdm and begin_ntoshvp comments on successive lines, the second contributes to WUDFWDM.H only as an empty line: the comment is stripped. Contrast with WINNT.H, which does not have this filtering: where NTDEF.H has a begin_winnt and begin_ntoshvp on successive lines, WINNT.H has the whole begin_ntoshvp line.
WINNT.H demonstrates a much more significant translation. All Microsoft’s literature for kernel-mode programming uses UCHAR, USHORT and ULONG for unsigned integral types. The headers do not even define the BYTE, WORD and DWORD that were long preferred in user-mode programming (even before any Windows NT existed). Since NTDEF.H is written for kernel-mode programming, it uses UCHAR, etc., and never the others. For all the lines that WINNT.H has in common with NTDEF.H, every UCHAR, etc., in NTDEF.H is instead a BYTE, etc., in WINNT.H. Again, how the translation is specified is unclear. It even translates TUCHAR and MAXUCHAR to TBYTE and MAXBYTE.