|
1139 | 1139 | A UTF-8 character literal containing multiple \grammarterm{c-char}{s} is ill-formed.
|
1140 | 1140 |
|
1141 | 1141 | \pnum
|
1142 |
| -\indextext{literal!character!\tcode{char16_t}}% |
1143 |
| -\indextext{char16_t character@\tcode{char16_t} character}% |
| 1142 | +\indextext{literal!character!UTF-16}% |
1144 | 1143 | \indextext{type!\idxcode{char16_t}}%
|
1145 | 1144 | A character literal that
|
1146 | 1145 | begins with the letter \tcode{u}, such as \tcode{u'x'},
|
1147 | 1146 | \indextext{prefix!\idxcode{u}}%
|
1148 |
| -is a character literal of type \tcode{char16_t}. The value |
1149 |
| -of a \tcode{char16_t} character literal containing a single \grammarterm{c-char} is |
| 1147 | +is a character literal of type \tcode{char16_t}, |
| 1148 | +known as a \defn{UTF-16 character literal}. |
| 1149 | +The value |
| 1150 | +of a UTF-16 character literal containing a single \grammarterm{c-char} is |
1150 | 1151 | equal to its ISO/IEC 10646 code point value, provided that the code point value is
|
1151 | 1152 | representable with a single 16-bit code unit (that is, provided it is in the
|
1152 | 1153 | basic multi-lingual plane). If the value is not representable
|
1153 |
| -with a single 16-bit code unit, the program is ill-formed. A \tcode{char16_t} character literal |
| 1154 | +with a single 16-bit code unit, the program is ill-formed. |
| 1155 | +A UTF-16 character literal |
1154 | 1156 | containing multiple \grammarterm{c-char}{s} is ill-formed.
|
1155 | 1157 |
|
1156 | 1158 | \pnum
|
1157 |
| -\indextext{literal!character!\tcode{char32_t}}% |
1158 |
| -\indextext{char32_t character@\tcode{char32_t} character}% |
| 1159 | +\indextext{literal!character!UTF-32}% |
1159 | 1160 | \indextext{type!\idxcode{char32_t}}%
|
1160 | 1161 | A character literal that
|
1161 | 1162 | begins with the letter \tcode{U}, such as \tcode{U'y'},
|
1162 | 1163 | \indextext{prefix!\idxcode{U}}%
|
1163 |
| -is a character literal of type \tcode{char32_t}. The value of a |
1164 |
| -\tcode{char32_t} character literal containing a single \grammarterm{c-char} is equal |
1165 |
| -to its ISO/IEC 10646 code point value. A \tcode{char32_t} character literal containing |
| 1164 | +is a character literal of type \tcode{char32_t}, |
| 1165 | +known as a \defn{UTF-32 character literal}. |
| 1166 | +The value of a |
| 1167 | +UTF-32 character literal containing a single \grammarterm{c-char} is equal |
| 1168 | +to its ISO/IEC 10646 code point value. |
| 1169 | +A UTF-32 character literal containing |
1166 | 1170 | multiple \grammarterm{c-char}{s} is ill-formed.
|
1167 | 1171 |
|
1168 | 1172 | \pnum
|
|
1530 | 1534 | \indextext{literal!string!UTF-8}%
|
1531 | 1535 | A \grammarterm{string-literal} that begins with \tcode{u8},
|
1532 | 1536 | \indextext{prefix!\idxcode{u8}}%
|
1533 |
| -such as \tcode{u8"asdf"}, is a \defn{UTF-8 string literal}, |
1534 |
| -also referred to as a \tcode{char8_t} string literal. |
1535 |
| -A \tcode{char8_t} string literal |
| 1537 | +such as \tcode{u8"asdf"}, is a \defn{UTF-8 string literal}. |
| 1538 | +A UTF-8 string literal |
1536 | 1539 | has type ``array of \placeholder{n} \tcode{const char8_t}'',
|
1537 | 1540 | where \placeholder{n} is the size of the string as defined below;
|
1538 | 1541 | each successive element of the object representation\iref{basic.types} has
|
|
1543 | 1546 | also referred to as narrow string literals.
|
1544 | 1547 |
|
1545 | 1548 | \pnum
|
1546 |
| -\indextext{literal!string!\idxcode{char16_t}}% |
| 1549 | +\indextext{literal!string!UTF-16}% |
1547 | 1550 | \indextext{type!\idxcode{char16_t}}%
|
1548 | 1551 | A \grammarterm{string-literal} that begins with \tcode{u},
|
1549 | 1552 | \indextext{prefix!\idxcode{u}}%
|
1550 | 1553 | such as \tcode{u"asdf"}, is
|
1551 |
| -a \tcode{char16_t} string literal. A \tcode{char16_t} string literal has |
| 1554 | +a \defn{UTF-16 string literal}. |
| 1555 | +A UTF-16 string literal has |
1552 | 1556 | type ``array of \placeholder{n} \tcode{const char16_t}'', where \placeholder{n} is the
|
1553 |
| -size of the string as defined below; it |
1554 |
| -is initialized with the given characters. A single \grammarterm{c-char} may |
| 1557 | +size of the string as defined below; |
| 1558 | +each successive element of the array |
| 1559 | +has the value of the corresponding code unit of |
| 1560 | +the UTF-16 encoding of the string. |
| 1561 | +\begin{note} |
| 1562 | +A single \grammarterm{c-char} may |
1555 | 1563 | produce more than one \tcode{char16_t} character in the form of
|
1556 | 1564 | surrogate pairs.
|
| 1565 | +\end{note} |
1557 | 1566 |
|
1558 | 1567 | \pnum
|
1559 |
| -\indextext{literal!string!\idxcode{char32_t}}% |
| 1568 | +\indextext{literal!string!UTF-32}% |
1560 | 1569 | \indextext{type!\idxcode{char32_t}}%
|
1561 | 1570 | A \grammarterm{string-literal} that begins with \tcode{U},
|
1562 | 1571 | \indextext{prefix!\idxcode{U}}%
|
1563 | 1572 | such as \tcode{U"asdf"}, is
|
1564 |
| -a \tcode{char32_t} string literal. A \tcode{char32_t} string literal has |
| 1573 | +a \defn{UTF-32 string literal}. |
| 1574 | +A UTF-32 string literal has |
1565 | 1575 | type ``array of \placeholder{n} \tcode{const char32_t}'', where \placeholder{n} is the
|
1566 |
| -size of the string as defined below; it |
1567 |
| -is initialized with the given characters. |
| 1576 | +size of the string as defined below; |
| 1577 | +each successive element of the array |
| 1578 | +has the value of the corresponding code unit of |
| 1579 | +the UTF-32 encoding of the string. |
1568 | 1580 |
|
1569 | 1581 | \pnum
|
1570 | 1582 | \indextext{literal!string!wide}%
|
|
1643 | 1655 | \tcode{\textbackslash'}, and the double quote \tcode{"} shall be preceded by a
|
1644 | 1656 | \tcode{\textbackslash},
|
1645 | 1657 | and except that a \grammarterm{universal-character-name} in a
|
1646 |
| -\tcode{char16_t} string literal may yield a surrogate pair. |
| 1658 | +UTF-16 string literal may yield a surrogate pair. |
1647 | 1659 | \indextext{string!\idxcode{sizeof}}%
|
1648 | 1660 | In a narrow string literal, a \grammarterm{universal-character-name} may map to more
|
1649 | 1661 | than one \tcode{char} or \tcode{char8_t} element due to \defnadj{multibyte}{encoding}. The
|
1650 | 1662 | size of a \tcode{char32_t} or wide string literal is the total number of
|
1651 | 1663 | escape sequences, \grammarterm{universal-character-name}{s}, and other characters, plus
|
1652 | 1664 | one for the terminating \tcode{U'\textbackslash 0'} or
|
1653 |
| -\tcode{L'\textbackslash 0'}. The size of a \tcode{char16_t} string |
| 1665 | +\tcode{L'\textbackslash 0'}. The size of a UTF-16 string |
1654 | 1666 | literal is the total number of escape sequences,
|
1655 | 1667 | \grammarterm{universal-character-name}{s}, and other characters, plus one for each
|
1656 | 1668 | character requiring a surrogate pair, plus one for the terminating
|
|
0 commit comments