Geoff Chappell - Software Analyst
SKETCH OF HOW RESEARCH MIGHT CONTINUE AND RESULTS BE PRESENTED - PREVIEW ONLY
The introductory sequence for a character constant is
An S in the position of prefix does not introduce a character constant, but is a fatal error (C1190) unless compiling for managed code (as with the /clr option) with either the /E option active or the System.Object symbol successfully imported as metadata (as from the #using <mscorlib.dll> directive).
The optional c-char-sequence begins immediately after the opening single-quote. It is any number of elements of the following types in any order:
A single-quote where a c-char or escape-sequence is permitted terminates the c-char-sequence. It is an error (C2001) if the line ends without this closing single-quote.
Data for the character constant is built initially as a string, with a null character appended. It is an error (C2026) if this string data gets too long. The present limit is roughly 2048 bytes. (The imprecision applies when the string data is produced as wide characters. Conversion of one or two source-set characters to wide characters is done using space further into the same buffer in which the string data is built. The limit is therefore reached when the string data is a few bytes short of 2048. Exactly how short depends on the mixture of single-byte and double-byte characters as the limit is approached.)
If prefix is absent, then each c-char or escape-sequence specifies one character or byte, respectively, of string data. Be aware however, of a subtlety to the ordering. Once an escape sequence is encountered in the c-char-sequence, the bytes that correspond to each of any more escape sequences are added not to the end of the string data but to the start. With a prefix, each c-char or escape-sequence specifies one wide character for the string data, in the natural order.
The character constant becomes one token, a constant token, whose value is obtained from the string data specified by c-char-sequence (less the null terminator).
When prefix is absent, it is an error (C2137) if there is no string data, i.e., if c-char-sequence is omitted, and it is an error (C2015) if there are more than 4 bytes of string data. The number formed from the string data, interpreting successive bytes as base-256 digits from most significant to least, becomes the value of the token. The type of the constant is a char if there is one byte of string data, else an int.
With a prefix, the first wide character of string data (else zero, if c-char-sequence is omitted) becomes the value of the token. The type of the constant is a wchar_t if the /Zc:wchar_t option is active, else an unsigned short. Wide characters other than the first are ignored, with a warning (C4066).
In general, the characters of a string constant are read (and the preceding notes on syntax are to be interpreted) as if trigraphs and line splices are already translated.
An exception exists, whether by design or oversight. Where the introduction to a character constant has both a prefix and single-quote, the two must be consecutive in the actual input stream. Separation by a trigraph or line splice prevents recognition of prefix as starting a character constant.