Geoff Chappell, Software Analyst
OLD MATERIAL BEING UPDATED - TAKE CARE
Windows 7 brings a significant reorganisation of the lower levels of the Win32 subsystem. Long-familiar ADVAPI32 functions are moved to KERNEL32. Many functions from both of those are moved to a new DLL named KERNELBASE. Other ADVAPI32 functions are moved to a new DLL named SECHOST. Very many executables in Windows 7 import functions from new DLLs that have unusually long names such as API-MS-Win-Core-LocalRegistry-L1-1-0.dll. This importing is done by ADVAPI32 and KERNEL32, by DLLs for general support such as MFC42, MSVCRT and OLE32, by many services, and by all sorts of other executables all through the lower levels of Windows out as far as SHELL32 and SHLWAPI. Whatever it is that’s going on, it’s developed much further in Windows 8. Not only are there very many more of those DLLs with the long names but they apply in kernel-mode too. The NT kernel itself (NTOSKRNL), which once upon a time imported only from the HAL, now imports from such DLLs as ext-ms-win-ntos-ksigningpolicy-l1-1-0.dll.
There is not much official documentation of this. The original Software Development Kit (SDK) for Windows 7 makes just the one mention of KERNELBASE, in a brief page about New Low-Level Binaries, and says nothing about the new DLLs with the unusual names. If not much more ever is documented about it, then in one sense there should not be much surprise. After all, higher-level executables distributed with Windows continue to import as before from such DLLs as KERNEL32, and since the SDK has no import libraries for the new DLLs, the intention is surely that programs written outside Microsoft, and probably also most that are written inside Microsoft, will know nothing of the new DLLs and should be unaffected. The new DLLs with the long names are anyway just stubs in which all exported functions are implemented no more than needed for hard-coded failure. Moreover, these failing implementations have not all received great care: see for instance that CreateFileW in API-MS-Win-Core-File-L1-1-0.dll returns a hard-coded NULL (0) instead of INVALID_HANDLE_VALUE (-1).
In another sense, the lack of documentation may astonish, depending on what one expects to be told about the Windows architecture in order to assess its security and robustness. These new DLLs are part of a small but significant embellishment of how NTDLL resolves imports when loading user-mode modules. It turns out that all imports from any DLL whose (case-insensitive) module name starts with API- are checked for a new form of redirection. Windows 8 adds EXT- as an applicable prefix and applies the same embellishment to the kernel’s resolution of imports when loading kernel-mode modules and to the loader’s when loading the kernel (and HAL, etc). Since very many Windows executables import from modules that have these prefixes, and especially since KERNEL32 and ADVAPI32 do so for the initial handling of several hundred of the most commonly used Windows API functions, software that can interfere with this new redirection could be very powerful in terms of modifying behaviour throughout Windows for relatively little effort.
Perhaps both Microsoft and the computer security industry were just slow to formalise or assess, respectively, this huge change in what’s where in Windows. Though the introduction for Windows 7 seems to have passed almost entirely unnoticed outside Microsoft, the SDKs for Windows 8 and Windows 8.1 bring import libraries with which non-Microsoft programmers can import from the new DLLs and there’s even some documentation, of Windows API Sets generally, and of Windows 8 API Sets and Windows 8.1 API Sets specifically.
Curiously—or not, depending perhaps on whether you look at it as a practising Windows programmer or as the designer of a competing operating system—far and away the most documentation of API Sets is in a patent: Dynamic Management of Composable API Sets (filed on 7th June 2013, i.e., a little before the release of Windows 8.1). It may be too cynical to infer that Microsoft is more concerned to stop its invention from being copied than to help its users understand how they’re affected, but even so, something important is being done to the Windows architecture and it’s mostly happening without disclosure by Microsoft or comment from outside.
In Windows 7, the new redirection of imports from DLLs is managed by NTDLL as a preferred alternative to isolation through activation contexts. Whether the imports from any particular API- or EXT- module are redirected depends entirely on the contents of a new file, named ApiSetSchema.dll in the System32 directory. Although ApiSetSchema is a DLL, it is wanted only for data. The whole file is mapped into kernel-mode address space by the NT kernel during phase 1 of system initialisation. From there, the wanted data is mapped into the user-mode address space of each newly initialised process and a pointer to this data is placed in a new member, named ApiSetMap (at offset 0x38 and 0x68 in x86 and x64 builds respectively), of the process’s semi-documented PEB structure. The kernel recognises the data only as the whole contents of a section that is named “.apiset” and is aligned to 64KB (i.e., whose VirtualAddress member in the IMAGE_SECTION_HEADER has the low 16 bits clear). The kernel has nothing to do with interpreting these contents: it just provides them for NTDLL to interpret. Conversely, NTDLL knows nothing of where the contents came from. To NTDLL, whenever it is to resolve an import to one module from another, whatever is at the address given by ApiSetMap is accepted as a map from which to learn whether to resolve the import from somewhere else instead.
Windows 8 complicates this neat division by having the kernel join the redirection game. Indeed, the kernel’s own imports are subject to redirection. So that this redirection is done before any kernel code executes, the work of loading ApiSetSchema is brought forward to the Windows loader (WINLOAD), which loads the kernel. Again, the map is the whole contents of the “.apiset” section (though now with no alignment requirement). Again, these contents are just assumed to have the correct format. WINLOAD both loads the map and uses it (to resolve imports by the kernel itself, by the HAL and by other modules that must be ready before the kernel first executes). The kernel does not load the map, but it does know that the map comes from the ApiSetSchema file. WINLOAD leaves ApiSetSchema in the list of loaded modules that it passes to the kernel through the undocumented LOADER_PARAMETER_BLOCK structure. The kernel then finds the loaded image, extracts its own copy of the “.apiset” section, and unloads the image. Except for these changes in preparation and that the kernel, too, interprets the map for redirecting imports in kernel-mode modules such as drivers, the mechanism (even down to the file format) is that of Windows 7.
In Windows 8.1 and higher, WINLOAD has the entire responsibility for building the map. ApiSetSchema is already unloaded when the kernel initialises. The kernel knows nothing of where the map comes from. It just gets an address and size in an extension of the LOADER_PARAMETER_BLOCK and it accepts whatever’s there.
In these versions, the map of API Sets to hosts need not come from the one DLL. The file named ApiSetSchema.dll in the System32 directory is required, but only as a base schema. Unless a flag in that file marks this map as sealed WINLOAD looks in the registry for more files from which to extend the map. There can be arbitrarily many schema extensions:
The subkey is irrelevant except that each different subkey allows a different extension. WINLOAD recognises Name and FileName values in the subkey, but interprets only the latter. Data for the FileName value names a file in the System32 directory. The file has the same format as ApiSetSchema but with a different interpretation. Where the base schema is a list of API Sets and gives for each the rules for redirecting the API Set to a host (depending possibly on who’s importing), a schema extension is a list of hosts and gives for each a list of API Sets that this host implements. WINLOAD has the job of merging the extensions into the base. The principle seems to be that if an API Set is implemented by a host that is listed in a schema extension, then the API Set redirects to that host, not to whatever was specified in the base schema. The composed schema, in the same format as the base schema, is what WINLOAD itself uses for resolving imports and is all that the kernel and NTDLL ever receive as the map they’re to use for resolving imports.
The map begins as a header followed immediately by an array of entries which each describe one API Set. Beware, however, that a different interpretation applies to schema extensions in Windows 8.1. Throughout the description below, structures are presented first for the base and composed schemas and the difference for schema extensions is left to comments after each structure.
The fixed-size header is 8 bytes originally, but version 6.3 expands it to 0x10. A file named APISET.H in the SDKs for Windows 8 and Windows 8.1 documents that Microsoft’s name for the map’s header, including the array, is API_SET_NAMESPACE_ARRAY. (All the symbolic names given in this article are from that file.)
|Offset (6.1)||Offset (6.3)||Size||Symbolic Name||Description|
|0x00||0x00||dword||Version||ignored before 6.3, observed to be 2;
3 or higher for recognition as schema extension in 6.3;
observed to be 4 in 6.3
|0x04||dword||Size||size of map in bytes|
|0x08||dword||Flags||0x01 bit set in ApiSetSchema if base schema is “sealed”;
0x02 bit set in schema extension
|0x04||0x0C||dword||Count||number of API Sets described by array that follows|
|0x08||0x10||unsized||Array||array of namespace entries|
The only known interpretation of the Version member is by WINLOAD and only then if looking for schema extensions. The only known interpretation of the Size member is also by WINLOAD and only then if extension actually occurs (such that the name of a new host, at least, is appended to the base schema).
The Flags member is meaningful only to WINLOAD. The 0x01 bit (API_SET_SCHEMA_FLAGS_SEALED) matters only in the ApiSetSchema.dll from the System32 directory. If it is set, then the base schema from this file is the whole map. WINLOAD does not look in the registry for schema extensions. The 0x02 bit (API_SET_SCHEMA_FLAGS_HOST_EXTENSION) matters only in a file that is named as a schema extension. It must be set, else WINLOAD ignores the file.
In a schema extension, the Count entries in the Array list hosts, not API Sets.
Each entry in the array is an API_SET_NAMESPACE_ENTRY. Each is 0x0C bytes orginally, expanded to 0x18 in version 6.3. Each names an API Set but without the API- prefix (or EXT- prefix in version 6.2 and higher) and without a file extension. Names are in Unicode and are not null-terminated. The array is assumed to be already sorted in case-insensitive alphabetical order.
|Offset (6.1)||Offset (6.3)||Size||Symbolic Name||Description|
|0x00||dword||Flags||0x01 bit set in ApiSetSchema if API Set is sealed;
0x02 bit observed to be clear for API- and set for EXT-
|0x00||0x04||dword||NameOffset||offset from start of map to name of API Set|
|0x04||0x08||word in version 6.1;
|NameLength||size, in bytes, of name of API Set|
observed to be same as NameOffset
observed to be NameLength less 8
|0x08||0x14||dword||DataOffset||offset from start of map to structure that lists the API Set’s hosts|
The Flags member is meaningful only to WINLOAD. The 0x01 bit (API_SET_SCHEMA_ENTRY_FLAGS_SEALED) matters only in the ApiSetSchema.dll from the System32 directory. If it is set, then the API Set described by this entry cannot be overridden by a schema extension.
Though the NameLength is formally a ULONG in the headers that Microsoft publishes with SDKs for later versions, the Windows 7 implementation of NTDLL uses only the low 16 bits.
The structure that lists the API Set’s hosts is an API_SET_VALUE_ARRAY, described next.
In a schema extension, an API_SET_NAMESPACE_ENTRY names a host, i.e., a DLL that imports from one or more API Sets may be redirected to. Names are again in Unicode and not null-terminated. No sorting of the array is assumed. The DataOffset is again the offset from the start of the map to an API_SET_VALUE_ARRAY, but to list the API Sets that the host implements (and which should redirect to this host, not to whatever host is specified in the base schema).
If the module to be imported from is an API Set as found in the array, then the import may be redirected to some host module. Before version 6.3, NTDLL assumes that at least one host is specified (else why list the API Set). Later versions allow that an API Set can be defined but inactive, in the sense of naming no host (presumably anticipating that a host will be specified in a schema extension). The hosts for an API Set are described by a header and an array. Microsoft’s name for the header, including the array, is API_SET_VALUE_ARRAY. Originally, the header contains only a count of entries in the array. Version 6.3 expands this header to 8 bytes.
|Offset (Original)||Offset (New)||Size||Symbolic Name||Description|
observed to be 0
|0x00||0x04||dword||Count||number of hosts described by array that follows|
|0x04||0x08||unsized||Array||array of entries for hosts|
In a schema extension, the Count entries in the Array list API Sets, not hosts.
Each entry in the value array is an API_SET_VALUE_ENTRY. Each is 0x10 or 0x14 bytes, depending on the version. The first entry in the array describes a default host. Subsequent entries, if any, are selected according to the name of the importing module. Entries for these exceptional hosts are assumed to be already sorted in case-insensitive alphabetical order of the importing module. Note, however, that no schema has yet been seen that defines more than two hosts for any one API Set.
|Offset (6.1)||Offset (6.3)||Size||Symbolic Name||Description|
observed to be 0
|0x00||0x04||dword||NameOffset||offset from start of map to name of importing module, in Unicode|
|0x04||0x08||word in version 6.1;
|NameLength||size, in bytes, of name of importing module|
|0x08||0x0C||dword||ValueOffset||offset from start of map to name of host module, in Unicode|
|0x0C||0x10||word or dword||ValueLength||size, in bytes, of name of host module|
Both names are in Unicode and are not null-terminated. For a default host, with no importing module to specify, the NameOffset and NameLength members are irrelevant and are observed to be 0.
Though the ValueLength is formally a ULONG in the headers that Microsoft publishes with SDKs for Windows 8 and Windows 8.1, annotations for static analysis tools document that the length must fit 16 bits, and NTDLL uses only the low 16 bits. When WINLOAD processes value entries in schema extensions, it takes the whole dword.
In a schema extension, the ValueOffset and ValueLength name an API Set. WINLOAD requires that this API Set be defined already in the base schema.
Version 10.0 changes the data format enough that it seems better described fresh. The most notable change is the introduction of a hash table so that the binary search of namespace entries can be faster for comparing 32-bit hashes rather than case-insensitive strings. Another is a simplification that removes the API_SET_VALUE_ARRAY. Perhaps even more notable to some is a change not in the data format itself but in what Microsoft formally reveals of it: in the SDK for Windows 10, APISET.H drops the structural definitions.
The header is 0x1C bytes in which the first 0x10 are compatible with the header from version 6.3. The map now has two arrays. Even though the array of namespace entries does still follow the header, if only in the one example that is yet observed, both arrays are located by giving their offsets.
|0x00||dword||5 or higher for recognition as schema extension in 10.0;
observed to be 6 in 10.0
|0x04||dword||size of map in bytes|
|0x08||dword||0x01 bit set in ApiSetSchema if schema is sealed|
|0x0C||dword||number of API Sets|
|0x10||dword||offset from start of map to array of namespace entries for API Sets|
|0x14||dword||offset from start of map to array of hash entries for API Sets|
|0x18||dword||multiplier to use when computing hash|
The algorithm for hashing a sequence of characters is to start with zero and then for each character multiply the previous hash by the multiplier from the header and add the character’s lower-case conversion. Each hash entry is 0x08 bytes:
|0x00||dword||hash of API Set’s lower-case name up to but not including last hyphen|
|0x04||dword||index of API Set in array of namespace entries|
The hash entries are assumed to be already sorted in increasing order of the hash. To find the API_SET_NAMESPACE_ENTRY for a supposed API Set, NTDLL first hashes the supposed name up to but not including the last hyphen and then searches the array of hash entries for one that has the same hash. Only when a matching hash is found are the names themselves compared. Note that in version 10, the last part in the name of an API Set, i.e., from the last hyphen onwards, is insignificant.
The API_SET_NAMESPACE_ENTRY, which describes a single API Set, changes a little. The API Set’s name is again in Unicode, with no null terminator, but now includes the prefix that earlier versions omit. The hosts for the API Set are described directly by an array of API_SET_VALUE_ENTRY structures instead of indirectly through an API_SET_VALUE_ARRAY. The latter’s count of hosts moves to the namespace entry.
|0x00||dword||0x01 bit set in ApiSetSchema.dll if API Set is “sealed”|
|0x04||dword||offset from start of map to name of API Set|
observed to be size, in bytes, of name of API Set
|0x0C||dword||size, in bytes, of name of API Set up to but not including last hyphen;
thus also number of bytes hashed from name of API Set for corresponding hash entry
|0x10||dword||offset from start of map to array of value entries for hosts|
|0x14||dword||number of hosts|
The API_SET_VALUE_ENTRY is unchanged from version 6.3, except that non-zero Flags are observed for one API set (0x18 for api-ms-win-security-provider-l1-1-0 and only then in the x86 build).