Geoff Chappell, Software Analyst
This function collects into an XML file various sorts of information about files that seem related to a given executable. The intended use seems to be that the executable needs a patch, shim or similar support in particular circumstances and the collected descriptions of the related files, or some extract from them, can help those circumstances be recognised later (and elsewhere) by the presence of so-called matching files.
INT SdbGrabMatchingInfoEx ( PCWSTR szExe, DWORD dwFilter, PCWSTR szFile, BOOL (*lpCallback) (PVOID, PCWSTR, PCWSTR, ATTRINFO *, PCWSTR), PVOID lpContext);
The szExe argument addresses a pathname of a file-system object, such as an executable file, and the function is to collect information about the matching files in the same directory and (typically) beneath. The pathname can instead specify just the directory. Depending on other arguments, szExe can be ignored.
The low 16 bits of the dwFilter argument define what type of files count as matching files, what type of information to collect, and may even define where to look (overriding szExe). The high 16 bits of dwFilter are bit flags that can more generally vary the function’s behaviour, e.g., to stop the search’s recursion through subdirectories, and to arrange for the one output file to collect descriptions from multiple calls to the function. The types and flags that are supported for the Matching Information Filter are described separately.
The szFile argument names the output file.
The optional lpCallback argument is the address of a function that is to be called for each file that the function examines. For each such call, the lpContext argument is the callback function’s first argument.
The function returns 1 if wholly successful, 0 for complete failure, and -1 if output is aborted by the callback function.
The SdbGrabMatchingInfoEx function is exported by name from APPHELP.DLL in version 5.1 and higher, and before then from SDBAPIU.DLL.
The SdbGrabMatchingInfoEx function is not documented.
The names given above for the dwFilter and szFile arguments are plausibly Microsoft’s, being extracted from text that APPHELP can write to a log file. Other names are mere placeholders awaiting the discovery of Microsoft’s.
The following notes on the function’s behaviour are deduced from inspecting the implementation from the original release of Windows 10. Since the behaviour is complex, it may help to begin with what the function aims for as its externally visible effects, notably for the output file and the callback function.
The output to be produced from one or more calls to the function is an XML file in Unicode characters. There is first the two-byte signifier of byte ordering, thus 0xFF 0xFE for both the x86 and x64 platforms. The XML content is a header and a DATABASE tag that contains yet more tags to present the “matching” information for one or more executables, one per call to the function:
<?xml version="1.0" encoding="UTF-16"?> <DATABASE> exe-descriptions </DATABASE>
Each of the exe-descriptions is an EXE tag whose content describes whatever count as matching files for the one executable:
<EXE NAME="exe-name" FILTER="filter-type">
The exe-name is typically the filename from the szExe argument, but it may also be either of two literals: Exe Not Specified or SYSTEM INFO. The filter-type is the symbolic name of the type that’s specified in the low 16 bits of dwFilter.
Each of the file-descriptions is a self-closing MATCHING_FILE or SYS tag:
<MATCHING_FILE NAME="file-name" attribute-value-items />
<SYS NAME="file-name" attribute-value-items />
in which the file-name is in general a relative pathname—relative to the search path that is specified by or inferred from the szExe argument—and the attribute-value-items are arbitrarily many, including zero, descriptions in the form
Typical for attribute are SIZE and CHECKSUM, but there can be very many more (up to 33 in total for Windows 10), as when the matching file is an executable with a version resource.
Do not let it escape your attention that except for SYS, which is anyway specific to one type of filter, all the XML tags and all the possibilities for attribute are string representations of supported values for the TAG as known for Shim Database (SDB) files.
The callback function gives the caller some control over what goes into the XML output and certainly over when to end it.
BOOL Callback ( PVOID Context, PCWSTR FileName, PCWSTR RelativeFileName, ATTRINFO *AttrInfo, PCWSTR XmlTag);
The Context argument is whatever was passed to SdbGrabMatchingInfoEx as its last argument.
The FileName argument addresses a pathname for the matching file, constructed by extending whatever search path was specified by or inferred from szExe. The RelativeFileName argument points into this pathname to that part that follows the search path, i.e., to what gets used as file-name in the XML output.
The AttrInfo argument addresses an array of ATTRINFO structures that have been used to prepare the MATCHING_FILE or SYS tag that is addressed by the XmlTag argument.
The callback function returns FALSE to tell SdbGrabMatchingInfoEx that no more matching files are wanted.
Microsoft’s names for the arguments are not known. Neither are the types. That the pointers to strings are specifically pointers to const strings is inferred from the symbol file for COMPATUI.DLL, this DLL having used C++ for its callback function. It is not clear, however, that this is what APPHELP expects. Much of the point, if not to the callback function itself, then surely to its last two arguments, would be to allow that the XML output for the matching file can be edited. The string at XmlTag is not written to the output file until the callback function returns, and APPHELP recomputes the length. That said, if the callback function is permitted to edit the tag, it would either have to know how much space is available (0x1000 characters in all known versions) or be constrained only to reducing the tag. Tight coupling seems to be presumed: see, for instance, that the AttrInfo argument is entirely useless to the callback function without independent knowledge of how many ATTRINFO structures are in the array. (The ATTRINFO structure, incidentally, is documented by Microsoft but only online.)
The callback function is hardly used even by Microsoft. Few known callers of SdbGrabMatchingInfoEx specify a callback function and these anyway ignore most of the arguments. Much, not just of names and types but even of functionality, may simply be unknowable.
The dwFilter and szFile arguments are required in all cases. If szFile is NULL or if the low 16 bits of dwFilter are not among the supported types, the function fails.
The szExe argument is not formally required to be non-NULL, since it can be ignored. In general, however, it may supply both a search path and filename. The latter typically becomes the exe-name in the XML output. The function allows MAX_PATH characters for it on the stack. For the search path, and for whatever the function appends to it to make pathnames of matching files, the function uses a dynamically allocated buffer with a capacity of 0x1000 Unicode characters. If the function cannot get this buffer, it fails. For the remainder of these notes, it is to be understood that if preparation of a pathname in this buffer ever needs a bigger buffer, the function fails. (However, this is not quite true: if this problem occurs during the function’s recursion through subdirectories, the effect can be that the function moves on to the next subdirectory.)
If the low 16 bits of dwFilter are GRABMI_FILTER_SYSTEM (4), the matching files are necessarily in the Windows system directory. If the function cannot locate the Windows system directory, it fails. For this filter, the szExe argument is ignored, and exe-name in the XML output will be SYSTEM INFO.
For all other types of filter, the szExe argument is required to name either a file or directory. If the function cannot get file attributes for the supposed file or directory, it fails. If szExe turns out to name a directory, as learnt from the file attributes, then this is the directory to search for files and the exe-name in the XML output will be Exe Not Specified. Ordinarily, the pathname at szExe is both the search path and the exe-name, separated at the last backslash. In the special case where szExe names just a file, with no path, as learnt from the absence of any backslash, then the function adopts the current directory as the search path.
If the low 16 bits of dwFilter are GRABMI_FILTER_THISFILEONLY (5), then szExe is required to name a file. If instead it named a directory, the function fails. Or so seems to be the intention. What the function actually tests as the case to reject is that the filename that was just extracted for use as exe-name in the XML output is empty, but this can never happen because of the default to the fake name Exe Not Specified. (The obvious experiment is to create a file named Exe Not Specified in an arbitrary directory and then give the function just the directory as szExe and GRABMI_FILTER_THISFILEONLY as dwFilter. The function succeeds, with output that describes the contrived file, when surely it is not meant to.)
The function ordinarily creates the output file as new, overwriting the file if it already exists. However, if the 0x20000000 bit is set in dwFilter, the function appends to the file if it already exists. If the function cannot create or open the output file, it fails. (The function requires write access and does not share.) For the remainder of these notes, it is to be understood that whenever the function writes to the output file, success is simply assumed: it never checks for success or failure.
To complete the output file, the function will go through potentially many cycles of preparing some amount of XML and writing it to the output file. All such preparation is done in the one buffer whose capacity is 0x1000 Unicode characters. If the function cannot get this memory, it fails. For the remainder of these notes, it is to be understood that if preparation of any XML to write to the output file ever needs a bigger buffer, the function fails. Again, however, this description of the error handling is not quite true of problems that occur during the function’s recursion into subdirectories.
In each directory that the function examines, it looks first for matching files and then (possibly) for subdirectories. If the low 16 bits of dwFilter are GRABMI_FILTER_THISFILEONLY (5), then the only file to consider as a matching file is the one that is named by szExe. Otherwise, the function examines all files in the directory. Failure to find a first file is failure for the function at the top level of recursion, else causes the function to proceed to the next subdirectory.
If recursion is not yet in progress, the function at least begins its XML output. Ordinarily, the first output is the XML header and an open DATABASE tag. If the 0x20000000 bit is set in dwFilter, the function infers that it is being called in a sequence to generate the one output file for multiple searches, such that the output file has the XML header and DATABASE tag already. Either way, the function writes an open EXE tag, including its NAME and FILTER.
Among the files the function finds in its search of the directory, the ones that count as matching files that are to be described in the output file are determined by the low 16 bits of dwFilter. The desired match may be with particular file extensions, as with GRABMI_FILTER_NORMAL (0), GRABMI_FILTER_PRIVACY (1) and GRABMI_FILTER_DRIVERS, or with particular file names, as with GRABMI_FILTER_SYSTEM. Some filter types require this match of any file they list in the output. Others count up to 10 non-matching files as matching. For GRABMI_FILTER_VERBOSE (3), all files match.
For each matching file, the function prepares a MATCHING_FILE tag, typically, and fills it with formatted representations of whatever attributes it can obtain (via the documented SdbGetFileAttributes function, formatting the results via the documented SdbFormatAttribute function). The significant variation is that for GRABMI_FILTER_DRIVERS only, the tag is SYS and is kept to only a small selection of attributes. The function then describes the matching file to the callback function, if one is provided, and notes the result. Only then is the tag for this matching file written to the output file. If the 0x40000000 bit is set in dwFilter, then enumeration of this directory, both for files and subdirectories, is abandoned if the total count of matching files has reached 25. If the callback function returned FALSE, then all enumeration is abandoned, and the function returns -1.
Enumeration of a directory typically continues into its subdirectories, though only to a depth of 3 and not at all if the low 16 bits of dwFilter are GRABMI_FILTER_SYSTEM or GRABMI_FILTER_THISFILEONLY or if the 0x80000000 bit is set.
The function ends its enumeration of matching files by closing the EXE tag and, unless the 0x10000000 bit is set in dwFilter, the DATABASE tag too.