Skip to content

Commit facaa17

Browse files
Dawn Perchikzygoloid
authored andcommitted
P1041R4 Make char16_t/char32_t string literals be UTF-16/32
1 parent 0f9ebdc commit facaa17

File tree

1 file changed

+31
-18
lines changed

1 file changed

+31
-18
lines changed

source/lex.tex

Lines changed: 31 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1139,30 +1139,34 @@
11391139
A UTF-8 character literal containing multiple \grammarterm{c-char}{s} is ill-formed.
11401140

11411141
\pnum
1142-
\indextext{literal!character!\tcode{char16_t}}%
1143-
\indextext{char16_t character@\tcode{char16_t} character}%
1142+
\indextext{literal!character!UTF-16}%
11441143
\indextext{type!\idxcode{char16_t}}%
11451144
A character literal that
11461145
begins with the letter \tcode{u}, such as \tcode{u'x'},
11471146
\indextext{prefix!\idxcode{u}}%
1148-
is a character literal of type \tcode{char16_t}. The value
1149-
of a \tcode{char16_t} character literal containing a single \grammarterm{c-char} is
1147+
is a character literal of type \tcode{char16_t},
1148+
known as a \defn{UTF-16 character literal}.
1149+
The value
1150+
of a UTF-16 character literal containing a single \grammarterm{c-char} is
11501151
equal to its ISO/IEC 10646 code point value, provided that the code point value is
11511152
representable with a single 16-bit code unit (that is, provided it is in the
11521153
basic multi-lingual plane). If the value is not representable
1153-
with a single 16-bit code unit, the program is ill-formed. A \tcode{char16_t} character literal
1154+
with a single 16-bit code unit, the program is ill-formed.
1155+
A UTF-16 character literal
11541156
containing multiple \grammarterm{c-char}{s} is ill-formed.
11551157

11561158
\pnum
1157-
\indextext{literal!character!\tcode{char32_t}}%
1158-
\indextext{char32_t character@\tcode{char32_t} character}%
1159+
\indextext{literal!character!UTF-32}%
11591160
\indextext{type!\idxcode{char32_t}}%
11601161
A character literal that
11611162
begins with the letter \tcode{U}, such as \tcode{U'y'},
11621163
\indextext{prefix!\idxcode{U}}%
1163-
is a character literal of type \tcode{char32_t}. The value of a
1164-
\tcode{char32_t} character literal containing a single \grammarterm{c-char} is equal
1165-
to its ISO/IEC 10646 code point value. A \tcode{char32_t} character literal containing
1164+
is a character literal of type \tcode{char32_t},
1165+
known as a \defn{UTF-32 character literal}.
1166+
The value of a
1167+
UTF-32 character literal containing a single \grammarterm{c-char} is equal
1168+
to its ISO/IEC 10646 code point value.
1169+
A UTF-32 character literal containing
11661170
multiple \grammarterm{c-char}{s} is ill-formed.
11671171

11681172
\pnum
@@ -1543,28 +1547,37 @@
15431547
also referred to as narrow string literals.
15441548

15451549
\pnum
1546-
\indextext{literal!string!\idxcode{char16_t}}%
1550+
\indextext{literal!string!UTF-16}%
15471551
\indextext{type!\idxcode{char16_t}}%
15481552
A \grammarterm{string-literal} that begins with \tcode{u},
15491553
\indextext{prefix!\idxcode{u}}%
15501554
such as \tcode{u"asdf"}, is
1551-
a \tcode{char16_t} string literal. A \tcode{char16_t} string literal has
1555+
a \defn{UTF-16 string literal}.
1556+
A UTF-16 string literal has
15521557
type ``array of \placeholder{n} \tcode{const char16_t}'', where \placeholder{n} is the
1553-
size of the string as defined below; it
1554-
is initialized with the given characters. A single \grammarterm{c-char} may
1558+
size of the string as defined below;
1559+
each successive element of the array
1560+
has the value of the corresponding code unit of
1561+
the UTF-16 encoding of the string.
1562+
\begin{note}
1563+
A single \grammarterm{c-char} may
15551564
produce more than one \tcode{char16_t} character in the form of
15561565
surrogate pairs.
1566+
\end{note}
15571567

15581568
\pnum
1559-
\indextext{literal!string!\idxcode{char32_t}}%
1569+
\indextext{literal!string!UTF-32}%
15601570
\indextext{type!\idxcode{char32_t}}%
15611571
A \grammarterm{string-literal} that begins with \tcode{U},
15621572
\indextext{prefix!\idxcode{U}}%
15631573
such as \tcode{U"asdf"}, is
1564-
a \tcode{char32_t} string literal. A \tcode{char32_t} string literal has
1574+
a \defn{UTF-32 string literal}.
1575+
A UTF-32 string literal has
15651576
type ``array of \placeholder{n} \tcode{const char32_t}'', where \placeholder{n} is the
1566-
size of the string as defined below; it
1567-
is initialized with the given characters.
1577+
size of the string as defined below;
1578+
each successive element of the array
1579+
has the value of the corresponding code unit of
1580+
the UTF-32 encoding of the string.
15681581

15691582
\pnum
15701583
\indextext{literal!string!wide}%

0 commit comments

Comments
 (0)