|
207 | 207 | \terminal{\textbackslash U} hex-quad hex-quad
|
208 | 208 | \end{bnf}
|
209 | 209 |
|
210 |
| -The character designated by the \grammarterm{universal-character-name} \tcode{\textbackslash |
211 |
| -U00NNNNNN} is that character |
212 |
| -that has \tcode{U+NNNNNN} as a code point short identifier; |
213 |
| -the character designated by the \grammarterm{universal-character-name} |
214 |
| -\tcode{\textbackslash uNNNN} is that character |
215 |
| -that has \tcode{U+NNNN} as a code point short identifier. |
216 |
| -If a \grammarterm{universal-character-name} does not correspond to |
217 |
| -a code point in ISO/IEC 10646 or |
218 |
| -if a \grammarterm{universal-character-name} corresponds to |
219 |
| -a surrogate code point, |
220 |
| -the program is ill-formed. Additionally, if |
221 |
| -a \grammarterm{universal-character-name} outside |
| 210 | +A \grammarterm{universal-character-name} |
| 211 | +designates the character in ISO/IEC 10646 (if any) |
| 212 | +whose code point is the hexadecimal number represented by |
| 213 | +the sequence of \grammarterm{hexadecimal-digit}s |
| 214 | +in the \grammarterm{universal-character-name}. |
| 215 | +The program is ill-formed if that number is not a code point |
| 216 | +or if it is a surrogate code point. |
| 217 | +Noncharacter code points and reserved code points |
| 218 | +are considered to designate separate characters distinct from |
| 219 | +any ISO/IEC 10646 character. |
| 220 | +If a \grammarterm{universal-character-name} outside |
222 | 221 | the \grammarterm{c-char-sequence}, \grammarterm{s-char-sequence}, or
|
223 | 222 | \grammarterm{r-char-sequence} of
|
224 | 223 | a character or
|
|
228 | 227 | \grammarterm{r-char-sequence}\iref{lex.string} does not form a
|
229 | 228 | \grammarterm{universal-character-name}.}
|
230 | 229 | \begin{note}
|
231 |
| -ISO/IEC 10646 code points are within the range 0x0-0x10FFFF (inclusive). |
232 |
| -A surrogate code point is a value in the range 0xD800-0xDFFF (inclusive). |
| 230 | +ISO/IEC 10646 code points are integers in the range $[0, \mathrm{10FFFF}]$ (hexadecimal). |
| 231 | +A surrogate code point is a value in the range $[\mathrm{D800}, \mathrm{DFFF}]$ (hexadecimal). |
233 | 232 | A control character is a character whose code point is
|
234 |
| -in either of the ranges 0x0-0x1F or 0x7F-0x9F (both inclusive). |
| 233 | +in either of the ranges $[0, \mathrm{1F}]$ or $[\mathrm{7F}, \mathrm{9F}]$ (hexadecimal). |
235 | 234 | \end{note}
|
236 | 235 |
|
237 | 236 | \pnum
|
|
1144 | 1143 | provided that the code point value
|
1145 | 1144 | can be encoded as a single UTF-8 code unit.
|
1146 | 1145 | \begin{note}
|
1147 |
| -That is, provided the code point value is in the range 0x0-0x7F (inclusive). |
| 1146 | +That is, provided the code point value is in the range $[0, \mathrm{7F}]$ (hexadecimal). |
1148 | 1147 | \end{note}
|
1149 | 1148 | If the value is not representable with a single UTF-8 code unit,
|
1150 | 1149 | the program is ill-formed.
|
|
1163 | 1162 | provided that the code point value is
|
1164 | 1163 | representable with a single 16-bit code unit.
|
1165 | 1164 | \begin{note}
|
1166 |
| -That is, provided the code point value is in the range 0x0-0xFFFF (inclusive). |
| 1165 | +That is, provided the code point value is in the range $[0, \mathrm{FFFF}]$ (hexadecimal). |
1167 | 1166 | \end{note}
|
1168 | 1167 | If the value is not representable
|
1169 | 1168 | with a single 16-bit code unit, the program is ill-formed.
|
|
1685 | 1684 | character requiring a surrogate pair, plus one for the terminating
|
1686 | 1685 | \tcode{u'\textbackslash 0'}. \begin{note} The size of a \tcode{char16_t}
|
1687 | 1686 | string literal is the number of code units, not the number of
|
1688 |
| -characters. \end{note} Within \tcode{char32_t} and \tcode{char16_t} |
1689 |
| -string literals, any \grammarterm{universal-character-name}{s} shall be within the range |
1690 |
| -\tcode{0x0} to \tcode{0x10FFFF}. The size of a narrow string literal is |
| 1687 | +characters. \end{note} |
| 1688 | +\begin{note} |
| 1689 | +Any \grammarterm{universal-character-name}{s} are required to |
| 1690 | +correspond to a code point in the range |
| 1691 | +$[0, \mathrm{D800})$ or $[\mathrm{E000}, \mathrm{10FFFF}]$ (hexadecimal)\iref{lex.charset}. |
| 1692 | +\end{note} |
| 1693 | +The size of a narrow string literal is |
1691 | 1694 | the total number of escape sequences and other characters, plus at least
|
1692 | 1695 | one for the multibyte encoding of each \grammarterm{universal-character-name}, plus
|
1693 | 1696 | one for the terminating \tcode{'\textbackslash 0'}.
|
|
0 commit comments