|
208 | 208 | \end{bnf}
|
209 | 209 |
|
210 | 210 | The character designated by the \grammarterm{universal-character-name} \tcode{\textbackslash
|
211 |
| -UNNNNNNNN} is that character whose character short name in ISO/IEC 10646 is |
212 |
| -\tcode{NNNNNNNN}; the character designated by the \grammarterm{universal-character-name} |
213 |
| -\tcode{\textbackslash uNNNN} is that character whose character short name in |
214 |
| -ISO/IEC 10646 is \tcode{0000NNNN}. If the hexadecimal value for a |
215 |
| -\grammarterm{universal-character-name} corresponds to a surrogate code point (in the |
216 |
| -range 0xD800--0xDFFF, inclusive), the program is ill-formed. Additionally, if |
217 |
| -the hexadecimal value for a \grammarterm{universal-character-name} outside |
| 211 | +U00NNNNNN} is that character |
| 212 | +that has \tcode{U+NNNNNN} as a code point short identifier; |
| 213 | +the character designated by the \grammarterm{universal-character-name} |
| 214 | +\tcode{\textbackslash uNNNN} is that character |
| 215 | +that has \tcode{U+NNNN} as a code point short identifier. |
| 216 | +If a \grammarterm{universal-character-name} does not correspond to |
| 217 | +a code point in ISO/IEC 10646 or |
| 218 | +if a \grammarterm{universal-character-name} corresponds to |
| 219 | +a surrogate code point, |
| 220 | +the program is ill-formed. Additionally, if |
| 221 | +a \grammarterm{universal-character-name} outside |
218 | 222 | the \grammarterm{c-char-sequence}, \grammarterm{s-char-sequence}, or
|
219 | 223 | \grammarterm{r-char-sequence} of
|
220 | 224 | a character or
|
221 |
| -string literal corresponds to a control character (in either of the |
222 |
| -ranges 0x00--0x1F or 0x7F--0x9F, both inclusive) or to a character in the basic |
| 225 | +string literal corresponds to a control character or |
| 226 | +to a character in the basic |
223 | 227 | source character set, the program is ill-formed.\footnote{A sequence of characters resembling a \grammarterm{universal-character-name} in an
|
224 | 228 | \grammarterm{r-char-sequence}\iref{lex.string} does not form a
|
225 | 229 | \grammarterm{universal-character-name}.}
|
| 230 | +\begin{note} |
| 231 | +ISO/IEC 10646 code points are within the range 0x0-0x10FFFF (inclusive). |
| 232 | +A surrogate code point is a value in the range 0xD800-0xDFFF (inclusive). |
| 233 | +A control character is a character whose code point is |
| 234 | +in either of the ranges 0x0-0x1F or 0x7F-0x9F (both inclusive). |
| 235 | +\end{note} |
226 | 236 |
|
227 | 237 | \pnum
|
228 | 238 | The \defnx{basic execution character set}{character set!basic execution} and the
|
|
1132 | 1142 | The value of a UTF-8 character literal
|
1133 | 1143 | is equal to its ISO/IEC 10646 code point value,
|
1134 | 1144 | provided that the code point value
|
1135 |
| -is representable with a single UTF-8 code unit |
1136 |
| -(that is, provided it is in the C0 Controls and Basic Latin Unicode block). |
| 1145 | +can be encoded as a single UTF-8 code unit. |
| 1146 | +\begin{note} |
| 1147 | +That is, provided the code point value is in the range 0x0-0x7F (inclusive). |
| 1148 | +\end{note} |
1137 | 1149 | If the value is not representable with a single UTF-8 code unit,
|
1138 | 1150 | the program is ill-formed.
|
1139 | 1151 | A UTF-8 character literal containing multiple \grammarterm{c-char}{s} is ill-formed.
|
|
1148 | 1160 | is a character literal of type \tcode{char16_t}. The value
|
1149 | 1161 | of a \tcode{char16_t} character literal containing a single \grammarterm{c-char} is
|
1150 | 1162 | equal to its ISO/IEC 10646 code point value, provided that the code point value is
|
1151 |
| -representable with a single 16-bit code unit (that is, provided it is in the |
1152 |
| -basic multi-lingual plane). If the value is not representable |
| 1163 | +representable with a single 16-bit code unit. |
| 1164 | +\begin{note} |
| 1165 | +That is, provided the code point value is in the range 0x0-0xFFFF (inclusive). |
| 1166 | +\end{note} |
| 1167 | +If the value is not representable |
1153 | 1168 | with a single 16-bit code unit, the program is ill-formed. A \tcode{char16_t} character literal
|
1154 | 1169 | containing multiple \grammarterm{c-char}{s} is ill-formed.
|
1155 | 1170 |
|
|
1554 | 1569 | is initialized with the given characters. A single \grammarterm{c-char} may
|
1555 | 1570 | produce more than one \tcode{char16_t} character in the form of
|
1556 | 1571 | surrogate pairs.
|
| 1572 | +\begin{note} |
| 1573 | +A surrogate pair is a representation for a single code point |
| 1574 | +as a sequence of two 16-bit code units. |
| 1575 | +\end{note} |
1557 | 1576 |
|
1558 | 1577 | \pnum
|
1559 | 1578 | \indextext{literal!string!\idxcode{char32_t}}%
|
|
0 commit comments