Geoff Chappell, Software Analyst
The KPROFILE is the structure in which the kernel keeps information about an active request to examine and act on a profile interrupt.
The name KPROFILE is hypothesised, Microsoft’s name not being known even from symbol files. That is some measure of the structure’s being internal to the kernel. Against that, however, is that the structure is in the formal scheme of kernel objects that start with a type from the KOBJECTS enumeration and a size. Many such objects are documented, if only as being opaque, and have full C-language definitions in headers from as far back as the Device Driver Kit (DDK) for Windows NT 3.51. The main difference is that the documented kernel objects can be caller-supplied but the only creator of a KPROFILE is the kernel itself.
Historically, a KPROFILE is created only when user-mode software has called the undocumented NtCreateProfile or NtCreateProfileEx function to describe what execution to sample via what sort of profile interrupt, subject to what conditions, with what storage of results, and has then proceeded to start this profiling by calling NtStartProfile. The kernel creates a KPROFILE, which then carries its conditions for which interrupts it will act on and its parameters for what’s to be done as this action. Among the possible conditions is that profiling can be specific to a process or may apply globally. The KPROFILE goes into a per-process list, whose head is the ProcessListHead very near the start of the KPROCESS, or into a global list, whose head is in the kernel’s own data section. The KPROFILE is removed from its list and destroyed when the corresponding profiling is stopped, typically by a call to NtStopProfile.
Profile interrupts are arranged with the Hardware Abstraction Layer (HAL), either to recur periodically or when some limit is reached for a processor-specific Performance Monitoring Counter (PMC). Whenever the kernel learns of a profile interrupt’s occurrence, from the HAL via KeProfileInterrupt or KeProfileInterruptWithSource, the global list of profile objects and the list for the current process are both examined and acted on.
Originally, and even still for a profile object that is created as described in the preceding paragraphs, the examination and action are tightly constrained by the inputs to the NtCreateProfile and NtCreateProfileEx functions. The examination matches the circumstances of the interrupt against the conditions that are recorded in the profile object. That the interrupted execution is for the process that was specified at the profile’s creation is known from the object’s presence in the list for the current process at the time of the interrupt. Other conditions are that:
If all these conditions are met, the action is simply to increment an execution count in a specified buffer according to where the interrupted execution lies within the profiled address range. The set of these execution counts is then a frequency distribution of execution within the profiled region, as sampled by the recurring profile interrupts.
Except that qualification by profile source and executing processor had to wait for version 3.51, all this is in place right from version 3.10. Moreover, this basic profiling has changed remarkably little in the decades since. For present purposes, arguably the main change is simply in the object type at the beginning of every KPROFILE that is created as described above: it is 0x0F up to and including version 3.51 but 0x17 ever after.
As Windows developed, however, the kernel allowed other reasons to ask the HAL to generate profile interrupts and thus acquired more things to do on learning of a profile interrupt’s occurrence. Except for processing the applicable lists of profile objects, all that the kernel originally did with profile interrupts was to count them. Starting with Windows XP, however, the kernel allows that profile interrupts can be arranged not for building a profile in a specified buffer, as above, but for recording each one in an event trace. Such special cases in the handling of profile interrupts had accreted enough by Windows 8 that some unification must have seemed worthwhile. This took the form of introducing a second type of profile object, apparently thought of as a profile callback object.
The object type at the beginning of every KPROFILE that is created specifically as a profile callback object is 0x11. For a profile callback object, the examination is less specific but the action is very general. The only condition to meet is whether the interrupt was generated from the expected profile source. The action to be taken is left to an essentially arbitrary callback routine. The Windows 10 kernel supplies three routines for profile callback objects. One is for internal bookkeeping (to do with cache errata support) but two are for behaviour that can be (and typically is) directed from user mode for event tracing.
A built-in profile callback object for a periodically recurring profile interrupt is “started” by enabling PERF_PROFILE (0x20000002) in the group mask for an NT Kernel Logger session. The documented way to do this from user mode is to set EVENT_TRACE_FLAG_PROFILE (0x01000000) in the EnableFlags member of an EVENT_TRACE_PROPERTIES structure that is given to the StartTrace and ControlTrace functions when starting or controlling an NT Kernel Logger session. The event that results on each interrupt has the hook ID PERFINFO_LOG_TYPE_SAMPLED_PROFILE (0x0F2E).
An array of up to four profile callback objects can be dynamically allocated for similar event tracing on receipt of profile interrupts that are generated from processor-specific performance monitoring counters. Little or nothing is documented about the steps required for arranging this. The counters must be specified in advance. The only known way from user mode is through TraceSetInformation with the information class TraceProfileSourceConfigInfo (0x06). The profiling of these sources, however, is not supported through the EnableFlags, only the group mask. The bit to set is PERF_PMC_PROFILE (0x20000400), again through TraceSetInformation but for the information class TraceSystemTraceEnableFlagsInfo (0x04). The event that results on each interrupt has the hook ID PERFINFO_LOG_TYPE_PMC_INTERRUPT (0x0F2F).
Though the KPROFILE is internal, it is almost as stable as many a documented structure, presumably as a side-effect of its very tightly constrained use. After version 3.51 allowed for specification of the profile source and of which processors will have their execution profiled, the only formal change is for Windows 7 to support more than 32 or 64 processors by way of processor groups. That the size then increases for 64-bit Windows 8 is simply from allowing for more processor groups.
|Version||Size (x86)||Size (x64)|
|3.10 to 3.50||0x28|
|3.51 to 6.0||0x2C||0x58|
|6.2 to 10.0||0x34||0xF8|
The layout below does not attempt any C-language definition of members as if such things can be inferred from type information in symbol files. This avoids making up names but complicates the structure’s description in its two roles. Broadly speaking, members at the start and end are common to both roles.
|Offset (x86)||Offset (x64)||Type||Description|
|0x00||0x00||word||object type from KOBJECTS enumeration|
|0x02||0x02||word||size, in bytes, of this kernel object|
|0x04||0x08||LIST_ENTRY||linkage in per-process or global list|
EPROCESS for profiled
|0x10||0x20||pointer||basic profile object:
start address of profiled region
|profile callback object:
address of callback routine
|0x14||0x28||pointer||basic profile object:
non-inclusive end address of profiled region
|profile callback object:
context argument for callback routine (first argument is address of interrupt’s KTRAP_FRAME)
|0x18||0x30||dword||basic profile object:
two less than the logarithm base 2 of size, in bytes, of bucket for sampling the profiled area
|0x1C||0x38||pointer||basic profile object:
address of buffer that is to receive execution counts
|0x20 (3.10 to 3.50)||byte||non-zero while started;
later at offset 0x2A
|0x24 (3.10 to 3.50);
0x20 (3.51 and higher)
|0x40||dword||basic profile object (x86 only):
segment address of profiled region, else zero
|0x24 (3.51 and higher)||0x48||KAFFINITY (3.51 to 6.0);
KAFFINITY_EX (6.1 and higher)
|processors to be profiled|
|0x28 (3.51 to 6.0);
|0x50 (5.2 to 6.0);
|signed word||profile source for generation of profile interrupt|
|0x2A (3.51 to 6.0);
|0x52 (5.2 to 6.0);
|byte||non-zero while started;
previously at offset 0x20
Note that the profile source in this structure does not have the full width of the KPROFILE_SOURCE enumeration. Since it is sometimes sign-extended when read, it is here thought to be a SHORT rather than USHORT. The one-byte boolean indicator of whether profiling has started moved for version 3.51 to space that was left by alignment after the narrowed profile source.
Special mention must be made of what the profile object records of the profiled region’s end address. As input to NtCreateProfile and NtCreateProfileEx, the profiled region is described by its address and size. Adding the two produces a non-inclusive end address, which is what’s described in the layout above. The intention seems plain that an interrupted instruction lies in the profiled region if its address is greater than or equal to the start address and less than the non-inclusive end address.
You may be wondering why an article that can exist only for advanced programmers troubles over so simple a point. And then you might infer that there must be at least an ambiguity, if not an outright defect, in the implementation. And so there is, but in the reverse direction from usual.
For most of the history of Windows there’s not even ambiguity. Up to and including Windows 7, the end address that’s saved in the profile object is the sum of the address and size, and when the profile object is examined on receipt of a profile interrupt this end address is interpreted as non-inclusive. Had the code been left like that, then the layout above would say “non-inclusive end address of profiled area” and that would be that. I certainly don’t want to treat my readers as if basic knowledge of their craft would better be spelt out in laboured detail.
Unfortunately, when the introduction of profile callback objects for Windows 8 brought a reworking of the code for KeProfileInterruptWithSource, the reworking introduced a simple error of arithmetic. The end address that’s saved in a profile object is still the sum of address and size, but when the profile object is examined at interrupt time this non-inclusive end address is instead interpreted as inclusive. A consequence is that after a sequence of correctly formed calls to create and start a profile for which the buffer that receives the execution counts just happens to end at a page boundary, chance execution at exactly the non-inclusive end of the profiled area crashes Windows!
This defect persists at least to the 1709 release of Windows 10. Presumably, Microsoft will correct it some time, but don’t hold your breath: it was explained to them in December 2016. Perhaps I didn’t explain it well enough. Perhaps it really is too subtle. Perhaps it just got lost as incidental to the different coding error that can cause the same crash and which I reported concurrently. Still, while this defect remains unfixed, it remains true that one or another simple error in the kernel’s coding allows that even a low-integrity user-mode program can crash every known version of Windows.