Geoff Chappell - Software Analyst
This function sets parameters for an enumeration of URL cache containers and produces information about the first entry.
HANDLE FindFirstUrlCacheEntryEx ( LPCTSTR lpszUrlSearchPattern, DWORD dwFlags, DWORD dwFilter, GROUPID GroupId, LPINTERNET_CACHE_ENTRY_INFO lpFirstCacheEntryInfo, LPDWORD lpcbEntryInfo, LPVOID lpGroupAttributes, LPDWORD lpcbGroupAttributes, LPVOID lpReserved);
This function has multi-byte and wide-character forms distinguished by A and W suffixes respectively. The function is natively ANSI. The notes below (mostly) do not address variations for Unicode.
The lpszUrlSearchPattern argument provides the address of a string that selects which cache container (or, exceptionally, containers) to enumerate. This argument can be NULL to search all the fixed containers, i.e., Content, Cookies and History.
Bits in the dwFlags argument vary the behaviour of the function, most notably to ask that some types of information not be produced for the enumerated entries. The following are meaningful:
|0x01||unmatched pattern that doesn’t begin with a colon selects all fixed containers|
|0x02||omit all variable-sized information;
not valid for Unicode form
|0x04||omit variable-sized information except for local file name;
overrides the 0x02 flag;
not valid for Unicode form
Mostly, the dwFilter argument confines the search to entries whose cache entry type matches the given filter: a bit should be set in the filter if entries that have that bit set in their cache entry type are wanted in the enumeration.
The GroupId argument confines the search to entries that belong to a particular group. This argument may be 0 to enumerate entries regardless of group membership.
The lpFirstCacheEntryInfo argument provides the address of a buffer that is to receive information about the first enumerated entry. The information is produced as a fixed-size header followed by variable-sized data. The lpcbEntryInfo argument provides the address of a dword whose value on input is the size, in bytes, of the buffer. On output, the dword at lpcbEntryInfo may have changed to show how much information has been produced into the buffer or could be (were the buffer sufficiently large). The lpFirstCacheEntryInfo argument can be NULL (unless the 0x02 or 0x04 flag is specified) as an explicit query for how much information is available, but the size declared on input must in this case be zero, else the function misbehaves.
The lpGroupAttributes and lpcbGroupAttributes arguments are ignored. The lpReserved argument must be NULL.
If successful, the function returns a handle which can then be passed to the FindNextUrlCacheEntry or FindNextUrlCacheEntryEx functions to discover more entries and must be passed to FindCloseUrlCache when further enumeration is not wanted. The buffer at lpFirstCacheEntryInfo contains information about the first entry. The dword at lpcbEntryInfo tells how much information was produced into that buffer.
Failure is indicated by returning NULL. An error code is available from GetLastError. Two error codes are particularly important: ERROR_NO_MORE_ITEMS means the function has behaved correctly but found that no entries match the given criteria; and ERROR_INSUFFICIENT_BUFFER indicates that the function would have succeeded if given a buffer at least as large as now reported in the dword at lpcbEntryInfo.
The function expects to produce information in a buffer described by the lpFirstCacheEntryInfo and lpcbEntryInfo arguments. If lpcbEntryInfo is NULL, the function has no means to report how much memory is used (or needed) for the requested information, and so the function fails. The function also fails if a buffer is given at lpFirstCacheEntryInfo but its size as given through lpcbEntryInfo is zero. (Curiously, if no buffer is given, the function does not reject a non-zero size.) The last three arguments are all documented as reserved and the very last actually is checked: if lpReserved is not NULL, the function fails. The error code in all these cases is ERROR_INVALID_PARAMETER.
If URL caching is not yet initialised, it gets initialised as part of this function. Among other things, this involves loading the registry configuration of all cache containers in the applicable registry set and creating default groups in the Content container. If this initialisation fails, so too does the function (having set ERROR_INTERNET_INTERNAL_ERROR as the error code).
This function has two jobs. One is to find a URL entry that matches the search criteria. The other is to arrange that subsequent calls to the FindNextUrlCacheEntry or FindNextUrlCacheEntryEx functions can find other entries that match the same search criteria. This function obtains memory for holding whatever needs to persist between such calls, and represents this memory by an opaque handle (actually a 1-based index into an array of pointers to such memory). If the necessary memory cannot be obtained or if a handle cannot be generated, the function fails (with ERROR_NOT_ENOUGH_MEMORY as the error code).
The search criteria are saved in this memory and thus apply to the whole enumeration. The flags, filter and group ID are explicit search criteria. The URL search pattern is also a search criterion, but indirectly. When a pattern is given, the function treats it as a URL for which an entry might be created, and looks through all containers, both fixed and extensible, in the applicable registry set, to find the container in which that entry would be created. In general, that container is the one container that the whole enumeration is restricted to. There are two exceptions. If the selected container is the Content container and the 0x01 flag is set, the enumeration is widened to the three fixed containers, in the order Content, Cookies and History. Enumeration of all the fixed containers is also understood if no pattern is given, i.e., if the lpszUrlSearchPattern argument is NULL.
Container selection for a URL depends on how the URL starts. Configuration of URL cache containers is planned as the subject of a separate article. Particularly relevant here is the specification of a case-insensitive prefix. The Content container has no prefix. The Cookies and History containers have the prefixes “Cookie:” and “Visited:” respectively. Extensible containers get their prefix from the CachePrefix value in their registry key. A URL matches a container if the URL begins with the container’s prefix. An unmatched URL selects the Content container by default, except if the URL begins with a colon: in that case, it would not be stored in any container, and the function fails for having no matching entry to describe (and so ERROR_NO_MORE_ITEMS is the error code).
Entries are enumerated in the order that their hash items appear in the container’s hash table. For the relevant structures, see the separate article on the INDEX.DAT file format. Only hash items for URL entries are considered: hash items for redirection entries are ignored. If the search criteria include a group, then a hash item that is not marked for group membership is ignored. If a hash item does not have a valid file offset for a URL entry, it is freed for reuse, and then ignored. The URL entry is ignored if the search criteria has no filter, i.e., if dwFilter is zero. (Indeed, in this case, no entries can ever be found.) If the entry has any bit set in its cache entry type that is not set in the filter and is not in the INCLUDE_BY_DEFAULT_CACHE_ENTRY collection (0x2200F1C0), it is ignored. If a group is specified for the search and the entry does not belong to that group, it is ignored. If the entry is in the Content container, it is ignored unless OTHER_USER_CACHE_ENTRY is set in the filter or the header information for the entry contains the case-insensitive characters ~U:username, in which the placeholder stands for the current user’s logon name. The first entry that survives all these tests is considered found. If no entry survives these tests, the function fails, with ERROR_NO_MORE_ITEMS as the error code.
Now that an entry is found, the function’s success or failure is essentially just a matter of whether information about the entry can be copied to the given buffer. However, there are a few quirks.
Believe it or not, but only now are the 0x02 and 0x04 flags rejected if called through the Unicode form or if the lpFirstCacheEntryInfo argument is NULL. The error code is ERROR_INVALID_PARAMETER.
The information to be produced in the buffer is a fixed-sized header, in the form of an INTERNET_CACHE_ENTRY_INFO structure, to be followed by as many as four variable-sized items:
If the entry does not have a URL name, which should not be possible, the function ignores it, i.e., returns to the search. A URL entry need not have the others, however. The function tallies how much space it requires in total for the fixed-sized header and whichever of the variable-sized items are both wanted by the caller and possessed by the entry. The information to be copied to the buffer, and thus also the space to be required beyond the INTERNET_CACHE_ENTRY_INFO structure, depends on the flags. If the 0x04 flag is set, then the only variable-sized item that is wanted is the pathname for the local file. The function then requires additional space for MAX_PATH characters, no matter how long the pathname turns out to be. If the 0x04 flag is clear but the 0x02 flag is set, then no variable-sized data at all is wanted, and thus no extra space. When neither flag is set, which is the ordinary case, additional space is required for the URL name and for as many of the other three variable-sized items as the entry possesses. Moreover, each item is to be dword-aligned when copied into the buffer.
If the computed requirement exceeds the size of buffer, as declared through the lpcbEntryInfo argument, the function fails. The error code in this case is ERROR_INSUFFICIENT_BUFFER and the dword at lpcbEntryInfo is changed so that the caller may know how much space is required. Importantly, in the persistent state accessed through the handle, markers of where the enumeration has reached are reset so that if the function is called again, it will resume its search from the previously found entry, not from this one.
Having established that the buffer is sufficiently large, the function copies each of the applicable variable-sized items in turn from the URL entry to the space after the INTERNET_CACHE_ENTRY_INFO structure and then fills in the structure. Most members of the INTERNET_CACHE_ENTRY_INFO are copied directly from counterparts in the URL entry. The exceptions are:
If an exception occurs while copying to the buffer, e.g., because only now is it found that the address given for the buffer is invalid for write access, the function not only fails, as one would hope, but sets ERROR_FILE_NOT_FOUND as the error code, which takes old jokes about cryptic error codes to new heights.
For a function that has existed so long and been documented all the while, this one has surprisingly many quirks.
If the 0x04 flag is set, then as noted above, the only variable-sized item wanted is the pathname for the found entry’s local file. Unfortunately, that the entry has a local file is merely assumed. If in fact the entry does not have a local file, a pathname is produced in the buffer but is spurious. Of course, this greatly reduces the usefulness of the 0x04 flag (not that this seems likely as the reason the flag is not documented).
The spurious pathname is partly predictable. The URL entry, as saved in its container file, has a member whose value is the offset from the start of the entry to the filename, or is zero to mean that the entry has no local file. When zero is accepted as the offset, the filename appears to be whatever is at the start of the URL entry. This is the “URL ” signature followed by a dword whose value is necessarily small enough that the first few bytes of the URL entry make a null-terminated string. Appended to the path for all the container’s local files, this phantom filename actually is copied to the buffer and pointed to by the lpszLocalFileName member.
It is already noted that when the function is called with no buffer, it does not insist that zero be declared as the buffer size. Not noted is what the function does about a non-zero size for a non-existent buffer. This anomalous combination isn’t even noticed until an entry has been found. If the 0x02 or 0x04 flags are set, then the lack of a buffer is itself an error. If the declared size is too small for all the information that might be produced for the entry, then the function complains of an insufficient buffer and all is well. But what if the declared size would be large enough for success had a buffer of that size actually been supplied?
The response in this case is to reduce the dword at lpcbEntryInfo to the size of information that could be produced for this entry, and then to ignore this entry and search for another! Since there is still no buffer, this case will recur until an entry is found for which there is more information to copy to the buffer than for the entry found before it. The function will then fail with ERROR_INSUFFICIENT_BUFFER as the error code but reporting the size required for information about the last entry that was found. This need not be adequate for the first entry that was found, and which would be found again by a repeat call that actually does provide a buffer.
Some other curious coding has no consequence outside the function but may as well be noted as an example of how Microsoft’s introduction of string-safe functions is no substitute for getting programmers to think through what they’re doing rather than grasp mindlessly at supposed aids to security. Many components of Windows and Internet Explorer, and probably many other Microsoft products, were treated in the early 2000s to a revision in which calls to CRT functions such as strlen and strcpy were replaced with calls to new functions such as StringCchLength and StringCchCopy. WININET appears to be no different. I perhaps miss the point but it seems to me that the new functions are as open to abuse and oversight, albeit in new ways, as are the ones they replace. Changing to the new functions must sometimes change good code to bad. TO BE CONTINUED
The FindFirstUrlCacheEntryEx function is exported by name (with ANSI and Unicode suffixes) from WININET version 4.71 and higher. It has long been documented. Supposedly, no flags are currently implemented. More importantly, the documentation gives the wrong error code, ERROR_NO_MORE_FILES, for the case of failure that is actually a successful discovery that no files match the criteria.
The behaviour described in this note is of version 7.0 from the original Windows Vista.