Geoff Chappell, Software Analyst
The RtlUnicodeStringToInt64 function parses a 64-bit integer from a string.
NTSTATUS RtlUnicodeStringToInt64 ( UNICODE_STRING const *String, ULONG Base, LONG64 *Number, PWSTR *EndPointer);
The required String argument indirectly provides the size and address of an array of Unicode characters. These input characters seem intended to be as many at Buffer as fit within Length bytes, but subtleties apply. The Length, MaximumLength and Buffer members of the input structure all matter. The characters are treated as read-only.
The optional Base argument is the numerical base to use for parsing characters as digits. The supported bases are 2 to 36 inclusive. This argument can be zero to direct that the base be inferred from a prefix in the string else be defaulted to 10.
The required Number argument is the address of a variable that is to receive the integer that the characters evaluate to.
The optional EndPointer argument is the address of a variable that is to receive a pointer to the first character that is not used for the evaluation. This argument can be NULL if the pointer is not wanted.
The function returns STATUS_SUCCESS if successful, else a negative error code.
It looks to be deliberate that variables at Number and EndPointer are set even on failure.
The RtlUnicodeStringToInt64 function is exported by name from the kernel in version 10.0 and higher.
The RtlUnicodeStringToInt64 function is not documented but a C-language declaration is published in WDM.H from the Windows Driver Kit (WDK).
The essence of the RtlUnicodeStringToInt64 function is to dress the C Run-Time (CRT) routine _wcstoi64 for kernel-mode programming. The broad strokes are:
If the UNICODE_STRING is prepared by a successful RtlInitUnicodeStringEx function, then the characters to parse are the Length bytes at Buffer. If Buffer is not NULL, then these characters are followed by a null, and the Length bytes at Buffer are the non-null characters of a null-terminated string exactly as suitable for the _wcstoi64 routine. Other valid preparations of the UNICODE_STRING can be problematic. If the function thinks that Buffer might not address a null-terminated string, it double-buffers to ensure that what it passes to _wcstoi64 is null-terminated, but the implementation is quirky.
Remember that the UNICODE_STRING structure tells of MaximumLength bytes of memory at the address Buffer, the first Length bytes of which are in use as Unicode characters. Microsoft’s documentation of the structure has always been clear that the Length bytes need not (and ordinarily do not) contain a null character and need not be followed by a null character. Moreover, although addresses are valid for MaximumLength bytes at Buffer, the contents beyond the first Length bytes are undefined.
If MaximumLength is at least two more than Length, then the Buffer can contain a null character beyond the first Length bytes, but the function looks for this null character only at the end of the buffer, i.e., as the last whole character in the MaximumLength bytes. If this is indeed a null, then what _wcstoi64 parses is the Unicode characters in the first Length bytes plus the possibly undefined contents after the first Length bytes, up to the first null character. Unless this happens to be immediately after the Length bytes, the parsing may extend into the undefined (or stale) contents.
Double-buffering is done in all other cases, i.e., if MaximumLength is too small to allow for a null character beyond the first Length bytes or if the last whole character in the MaximumLength bytes happens not to be a null. The double buffer is on the stack. What _wcstoi64 parses is as many as 0x40 of the Unicode characters in the first Length bytes, up to the first null character.
An unsupported Base is not failure for the function, but just results in evaluation as zero.
Since Microsoft documents the parsing by _wcstoi64, a summary ought to suffice here. The parsing allows for the following elements in sequence, each being optional:
The sign indicator is a plus (0x002B) or minus (0x002D). The base indicator begins with a zero (0x0030). If this is followed by an upper- or lower-case X (0x0058 or 0x0078), the base is 16. If not, the base is 8. Valid digits for a base are those characters that evaluate to less then the base. Characters from zero (0x0030) to nine (0x0039) count as 0 to 9, Characters from 'A' to 'Z' (0x0041 to 0x005A) and 'a' to 'z' (0x0061 to 0x007A) count as 10 to 35.
Failure to parse into these elements is not failure for the function, but just results in evaluation as zero. Unless the parsing finds at least one valid digit, the address returned through the EndPointer argument is the Buffer member from String.
Evaluation starts as zero and accumulates as an unsigned 64-bit integer for as many characters as are valid for the base, including none. If a minus sign is present as the sign indicator, this unsigned evaluation is negated to produce the returned evaluation.
If a minus sign is present, overflow occurs if the unsigned evaluation exceeds 0x80000000`00000000. With no minus sign, overflow occurs if the unsigned evaluation exceeds 0x7FFFFFFF`FFFFFFFF. Overflow is failure. The function returns STATUS_INTEGER_OVERFLOW. The overflow limit becomes the evaluation. The address returned through the EndPointer argument is that of the digit that caused the overflow.
Without overflow, the address returned through the EndPointer argument is that of one character past the last valid digit. Note that this need not be a valid address!