Strange Things LINK Knows about 80x86 Processors

The history of the dominant operating systems designed for Intel’s processors is such that software intended to work at or near the level of the operating system is sometimes developed by programmers who did not write the operating system itself. These programmers mean to add features to the operating system—and it may be that inventive additions by programmers outside Microsoft have contributed much to the stability and success of Microsoft’s operating systems in the consumer market.

Because they work more closely with the processor than some think should ever be necessary in a well-designed operating system, low-level programmers of Intel’s 80x86 processors are occasionally faced with the practical problem of identifying the processor more precisely than simply distinguishing an 80386 from an 80486 from a Pentium. It is also the nature of these programmers to wonder whether they are working from the full story—and suspicions of this sort have not been allayed by the occasional revelation that Intel’s processors support instructions that Intel does not document.

Going at least as far back as the use of an undocumented LOADALL instruction for a RAM disk driver in an early DOS version, Microsoft has been seen to know more about Intel’s processors than could be learned just by studying Intel’s 80x86 manuals. This is only to be expected in programs that Microsoft writes as components of the operating systems that Intel’s processors are most often bought to support, though it does raise the question of whether similar information is as readily available to developers of utility programs that replace or enhance operating system functionality or to other designers of operating systems for Intel’s processors. That unusual knowledge of these processors occasionally finds its way into applications (most notably, development tools such as assemblers, compilers and linkers) is more clearly unsatisfactory, since in the market of these programs, Microsoft is generally just one of many.

In 1997, I examined two Microsoft programs for unusual knowledge of Intel’s processors, intending to write a paper in two parts. The first showed Microsoft’s 32-bit linker, which was then fairly new, as knowing opcodes for as many as 15 instructions that do not seem to have been documented for everyone. A second part was to look at the operating system kernel from Windows NT 4.0 for some finer points of CPU identification, but it never got beyond a draft. (That said, see CPU Identification by the Windows Kernel, which treads some of the same ground but is up-to-date for Windows Vista.)

This article is essentially what was originally published as a Word document titled Strange Things That Microsoft Knows About Intel’s 80x86 Processors. It describes how a linker that was supplied with different versions of Microsoft Visual C++ knows the opcodes and operand requirements of a dozen or so 80x86 instructions that Intel does not seem to have documented for general knowledge: LOADALL, CFLSH, WRECR, RDECR, SVDC, RSDC, SVLDT, RSLDT, SVTS, RSTS, SMINT, XBTS, IBTS, ZALLOC

(I am grateful to Robert Collins and Christian Ludloff for their separate information that seven of these, namely SVDC, RSDC, SVLDT, RSLDT, SVTS, RSTS and SMINT, are not actually Intel’s instructions but are documented as instructions for Cyrix’s 80x86 look-alike processors. I would never have thought to look.)

Versions

In the versions of Microsoft Visual C++ for developing 32-bit applications, the linker has an option to dump the contents of its input files (which may be object files or executables). You turn LINK into a COFF Binary File Dumper by giving LINK the /dump switch on the command line or, more usually, by running LINK indirectly from a stub program called DUMPBIN.

The following table shows the versions of LINK that have been examined for this paper. All come from releases of Microsoft Visual C++, the linker supplied with the Windows 95 DDK being apparently an amendment of the linker from some version of Microsoft Visual C++ 2.0.[1]

File Version Source
2.60.5046 Windows 95 DDK
3.00.5270 Microsoft Visual C++ 4.0
3.10.6038 Microsoft Visual C++ 4.1
4.20.6164 Microsoft Visual C++ 4.2

Among the features offered for the file dump is a simple disassembly of code sections. This option is invoked by also giving the linker the /disasm switch. Microsoft’s linker can recognise files developed for many processors—indeed, for a few more than are listed in relevant Microsoft documentation—and can disassemble code for all but one of them.[2]

Machine ID Description Is Disassembly Supported?
014Ch i386 yes
0162h R3000 yes
0166h R4000 yes
0168h R10000 yes
0184h Alpha AXP yes
01F0h PPC yes
0268h M68K yes
0290h PARISC no
0601h MPPC yes

The R10000 machine is a relatively recent addition to the list of supported machines: it is not recognised by LINK version 2.60.

The Undocumented Instructions

Over the four versions studied, the disassembler for Intel’s processors knows of 15 instructions that do not appear in opcode maps supplied with Intel’s widely available manuals. The following table shows these unusual opcodes and the corresponding instructions, using placeholders to represent the operands that LINK includes with the instruction mnemonic when disassembling.

Opcode Mnemonic Operands
0F 05 LOADALL  
0F 07 LOADALL esi16 or esi32
0F 0A CFLSH  
0F 34 WRECR  
0F 36 RDECR  
0F 78 SVDC mem80,sreg
0F 79 RSDC sreg,mem80
0F 7A SVLDT mem80
0F 7B RSLDT mem80
0F 7C SVTS mem80
0F 7D RSTS mem80
0F 7E SMINT  
0F A6 XBTS reg16,r/m16 or reg32,r/m32
0F A7 IBTS r/m16,reg16 or r/m32,reg32
0F AE ZALLOC mem256

Placeholders for operands are adapted from the convention used in Intel’s manuals. Thus, reg16 and reg32 stand for 16-bit and 32-bit general registers, and sreg stands for a segment register. The r/m16 and r/m32 combinations may be filled by a register or by a memory reference. The mem80 and mem256 placeholders are for references to memory only, specifically to ten-byte and 32-byte variables. The esi16 and esi32 placeholders are for references to memory but with DS:SI or DS:ESI as the implied address.

Strictly speaking, recognition of opcodes by LINK does not imply that the corresponding instructions ever existed for any of Intel’s processors. It could be, for instance, that the programmer who prepared LINK’s disassembly tables worked from an opcode map that reflected only some intention at Intel. On the other hand, Microsoft’s use of an opcode map with more detail than the one Intel makes available to most programmers is clearly no one-shot: different versions of the linker use disassembly tables that support different selections from the preceding table and which do not match up easily with published opcode maps for successive processors.

Consider that LINK version 2.60 knows of all the documented Pentium instructions. It does not know of the FCOMI, FCOMIP, FUCOMI and FUCOMIP instructions that Intel documents as being introduced for the Pentium Pro, nor of the RDPMC instruction that Intel documents as being available on the Pentium Pro and the Pentium with MMX technology. Yet this version of LINK does recognise opcodes for the CMOVcc and FCMOVcc instructions that Intel documents as having been introduced for the Pentium Pro.[3]

It is possible that the disassembly tables in LINK version 2.60 were prepared for the Pentium Pro, but some instructions were omitted by oversight. It may be that the conditional move instructions were designed first among new instructions for the Pentium Pro and were merely anticipated when the disassembly tables were prepared. Finally, it could be that the conditional move instructions existed, undocumented, on at least some Pentium processors and that this was known to whoever prepared the disassembly tables for LINK version 2.60.

LINK version 3.00 knows of all instructions that are documented for the Pentium Pro, but not of any MMX instructions. It is also the only version studied that recognises opcodes for the seven instructions SVDC, RSDC, SVLDT, RSLDT, SVTS, RSTS and SMINT. Note that disassembly of the opcode 0Fh 7Eh as SMINT conflicts with Intel’s (presumably later) assignment of that opcode as a MOVD instruction for reading a dword from an MMX register. LINK version 3.10, which introduces support for MMX instructions, drops all seven of these instructions, which we may surmise exist only on the Pentium Pro, if at all.

Operands that LINK gives for the instructions SVDC and RSDC are consistent with an interpretation of the mnemonics as suggesting that the instructions save and restore the internal descriptor cache that corresponds to a given segment register. The descriptor cache would presumably consist of a dword each for the base and limit, and a word of access rights and other flags. A similar interpretation of SVLDT, RSLDT, SVTS and RSTS as operating on the internal descriptors for the current LDT and TSS would have those instructions also access ten bytes of memory.

A Grain of Salt

The correctness of LINK’s knowledge of operands should not be taken for granted, however. LINK makes rather too many errors even when disassembling documented instructions:

There are also instructions that LINK decodes correctly but represents inappropriately (not that the difference is anything but a fine point). Examples that involve some significant loss of information from the disassembly are:

Of the undocumented instructions recognised by LINK, the 80386 LOADALL (opcode 0Fh 07h) is certainly disassembled with the wrong operand. For the purpose of listing operands, LINK treats the 80386 LOADALL the same as it treats LODSW and LODSD, so that the operand is shown to be a word or dword at DS:SI or DS:ESI (depending on the operand and address sizes). The reality is that the 80386 LOADALL takes its operand from ES:DI or ES:EDI (depending on the address size) and works with a 0127h-byte region of memory rather than a word or dword.[6] Curiously, Microsoft has better information in another of its programs: the WDEB386.EXE debugger that Microsoft includes with various Windows SDKs and DDKs shows the 80386 LOADALL as taking the byte at ES:DI or ES:EDI as its operand.

Software Analysis

From the perspective of Software Analysis as a technique of software development or as a future academic discipline, it is very interesting that LINK’s opcode tables have so many errors, even for instructions that are well-known. Indeed, this article’s primary motivation was not to list some undocumented CPU instructions but to demonstrate the feasibility and practical value of having a second person check a first person’s programming work for errors.

Successive versions of the program have corrected some errors but not others, which suggests that the program’s manufacturer has a will to have the correct tables but a difficulty in detecting the errors. Moreover, with successive upgrades to support instructions for new processors, new errors have been introduced. The program’s opcode tables are presumably generated through macros. Although these may be convenient for development, they may also obscure errors from someone who reviews the program’s source code. The manufacturer could have detected more errors by having someone review the relevant code and data as actually generated in the program. The article demonstrates that this can be done even by someone external to the manufacturer without already knowing the format of those tables, still less without the source code. It may even be that such a process of review is commercially feasible.


[1] See the README.TXT file in the MSVC20 directory for a hint that none of the linkers from the several Microsoft Visual C++ 4.x releases can be relied on to link object files correctly if building Virtual Device Drivers (VxDs), even though documentation in these products continues to describe a /vxd switch.

[2] Machine types are documented in Microsoft’s Portable Executable and Common Object File Format (PE/COFF) Specification 4.1, which is on the MSDN Library CD, and as symbols beginning with IMAGE_FILE_MACHINE in the WINNT.H header file supplied with both Microsoft Visual C++ and the Win32 SDK.

[3] For information about instruction compatibility, see the Intel Architecture Software Developer’s Manual, Volume 2: Instruction Set Reference for the Pentium Pro, Order Number 243191, and available via anonymous FTP at download.intel.com in the directory design/pro/manuals.

[4] A Microsoft Knowledge Base (MSKB) article that describes these errors and notes any fixes is something I should be happy to cite, but I could not find one, for instance by looking among articles listed as containing “DUMPBIN” in the Knowledge Base collections on the MSDN Library CD (January 1997), the Microsoft Visual C++ 4.2 CD and the TechNet CD (February 1997). Analysts of Microsoft’s bug-reporting practices may care to consider that no matter how many articles in the MSKB describe bugs, there is no reason to suppose that the MSKB lists even a tiny proportion of bugs known to Microsoft.

[5] See for instance the opcode maps presented as Appendix A in Intel’s 386 DX Microprocessor Programmer’s Reference Manual (Order Number 230985) or, more recently and available on-line from www.intel.com, the Pentium Pro Family Developer’s Manual, Volume 3: Operating System Writer’s Guide (Order Number 242692). The latter at least has a footnote to mark opcode 82h as reserved.

[6] This instruction seems to have been brought to wide attention by Robert Collins in an article published in Tech Specialist, October, 1991, and available on-line at www.x86.org (which is probably where most interested readers will find it for the first time). Before publication of that article, the instruction was certainly known to BIOS developers, the most notable use being for emulation of the 80286 LOADALL on 80386 machines.