Geoff Chappell - Software Analyst
Windows 95 introduces a scheme for presenting statistics on system performance. The essential component in this scheme is a VxD named PERF.VXD, which is supplied by Microsoft in the standard Windows package. PERF acts as a performance statistics server. A VxD may register with the server as a performance statistics provider. A performance statistics client is an application that retrieves statistics from PERF more or less regularly for presentation to the user. The particular client that Microsoft supplies in the standard Windows package is called the System Monitor.
A statistic is any 32-bit performance measure that a VxD cares to provide. A statistic may be specified as requiring differentiation by the client, meaning that instead of reporting the statistic as provided by the VxD, the client is to compute and present the average rate per second at which the provided statistic has changed since the last sampling. In what follows, the term counter is used for the statistic as provided by the VxD and the term rate (of counted events per second) for the differentiated statistic presented to the user by the client.
When differentiated statistics are presented by the particular performance statistics client known as System Monitor, the rate may be rounded down to whole thousands.
Inspection reveals a coding error in the SYSMON.EXE program. Specifically, if the increase in the counter between samples is 65535 or more, then to get the rate, SYSMON first divides by the elapsed time in milliseconds and then multiplies by 1000. The intention appears to be the avoidance of overflow in 32-bit registers, but a consequence is to pick up a rounding error instead.
Observation of the effect is more likely and more significant when SYSMON is configured to sample at longer intervals. For instance, when sampling once per second, rounding down to whole thousands occurs only if the rate is at least 65535 events per second; but when sampling every 10 seconds, rounding down to whole thousands occurs if 65535 or more events were counted over the 10 seconds between samples, with the consequence that an average rate of 6554 events per second over the 10 seconds is presented to the user as just 6000 events per second.
The coding error is observed in SYSMON.EXE versions from Windows 95 and Windows 98. File sizes, dates and times for the versions inspected are:
|Version||Size||Date and Time||Package|
|4.00.950||65,024||09:50, 11th July 1995||Windows 95 upgrade|
|11:11, 24th August 1996||Windows 95 OSR2|
|4.10.1998||81,920||19:01, 11th May 1998||Windows 98|
The problem can be corrected by patching better arithmetic into the SYSMON.EXE file. The three patch sites, given as offsets in bytes from the start of the file, vary with the version:
|4.00.950 (Windows 95)||262Fh, 263Fh and 2645h|
|4.10.1998 (Windows 98)||30B9h, 30C9h and 30CFh|
At the first site, the expected byte is 72. It is to be changed to EB. At the second site, the expected bytes are 69 C0. They are to be changed to C7 C2. At the third site, the expected bytes are 2B D2 for the Windows 95 version and 33 D2 for the Windows 98 version. They are to be changed to F7 E2.
If you are even slightly uncertain how to patch a file, do not try it.
The following table presents on the left some instructions from near the patch site and on the right the instructions that change by applying the patch. Differences in version are accommodated by use of some symbols: zero stands for the sub or xor instruction in the Windows 95 and Windows 98 versions respectively; time and result are both ebx for Windows 95 but are ecx and esi respectively for Windows 98.
cmp eax,0000FFFFh jb @f zero edx,edx div time mov result,eax imul result,result,1000 jmp done @@: imul eax,eax,1000 zero edx,edx div time mov result,eax jmp done
jmp @f @@: mov edx,1000 mul edx div time mov result,eax jmp done
The effect of the patch is therefore first to render redundant the set of instructions that would divide first then multiply, and second, to change from using the imul instruction to mul. The imul instruction, in the form shown above, multiplies a 32-bit variable by a 32-bit constant and stores the result in a 32-bit register. If the result is too large for the 32-bit register, then the overflow is lost. The mul instruction multiplies a 32-bit variable by the contents of the 32-bit register eax and stores the result in a 64-bit combination of edx and eax. There can be no overflow to lose.