Skip to content

Commit cad039d

Browse files
authored
Merge 2019-02 CWG Motion 6
P1041R4 Make char16_t/char32_t string literals be UTF-16/32 Fixes #2686.
2 parents 0f9ebdc + 0e6ff4b commit cad039d

File tree

3 files changed

+40
-28
lines changed

3 files changed

+40
-28
lines changed

source/compatibility.tex

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -67,10 +67,10 @@
6767
The type of a UTF-8 string literal is changed
6868
from ``array of \tcode{char}''
6969
to ``array of \tcode{const char8_t}''.
70-
The type of a \tcode{char16_t} string literal is changed
70+
The type of a UTF-16 string literal is changed
7171
from ``array of \textit{some-integer-type}''
7272
to ``array of \tcode{const char16_t}''.
73-
The type of a \tcode{char32_t} string literal is changed
73+
The type of a UTF-32 string literal is changed
7474
from ``array of \textit{some-integer-type}''
7575
to ``array of \tcode{const char32_t}''.
7676
The type of a wide string literal is changed

source/declarations.tex

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5081,9 +5081,9 @@
50815081
or \tcode{wchar_t} array
50825082
can be initialized by
50835083
an ordinary string literal,
5084-
\tcode{char8_t} string literal,
5085-
\tcode{char16_t} string literal,
5086-
\tcode{char32_t} string literal, or
5084+
UTF-8 string literal,
5085+
UTF-16 string literal,
5086+
UTF-32 string literal, or
50875087
wide string literal,
50885088
respectively, or by an appropriately-typed string literal enclosed in
50895089
braces\iref{lex.string}.

source/lex.tex

Lines changed: 35 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -1139,30 +1139,34 @@
11391139
A UTF-8 character literal containing multiple \grammarterm{c-char}{s} is ill-formed.
11401140

11411141
\pnum
1142-
\indextext{literal!character!\tcode{char16_t}}%
1143-
\indextext{char16_t character@\tcode{char16_t} character}%
1142+
\indextext{literal!character!UTF-16}%
11441143
\indextext{type!\idxcode{char16_t}}%
11451144
A character literal that
11461145
begins with the letter \tcode{u}, such as \tcode{u'x'},
11471146
\indextext{prefix!\idxcode{u}}%
1148-
is a character literal of type \tcode{char16_t}. The value
1149-
of a \tcode{char16_t} character literal containing a single \grammarterm{c-char} is
1147+
is a character literal of type \tcode{char16_t},
1148+
known as a \defn{UTF-16 character literal}.
1149+
The value
1150+
of a UTF-16 character literal containing a single \grammarterm{c-char} is
11501151
equal to its ISO/IEC 10646 code point value, provided that the code point value is
11511152
representable with a single 16-bit code unit (that is, provided it is in the
11521153
basic multi-lingual plane). If the value is not representable
1153-
with a single 16-bit code unit, the program is ill-formed. A \tcode{char16_t} character literal
1154+
with a single 16-bit code unit, the program is ill-formed.
1155+
A UTF-16 character literal
11541156
containing multiple \grammarterm{c-char}{s} is ill-formed.
11551157

11561158
\pnum
1157-
\indextext{literal!character!\tcode{char32_t}}%
1158-
\indextext{char32_t character@\tcode{char32_t} character}%
1159+
\indextext{literal!character!UTF-32}%
11591160
\indextext{type!\idxcode{char32_t}}%
11601161
A character literal that
11611162
begins with the letter \tcode{U}, such as \tcode{U'y'},
11621163
\indextext{prefix!\idxcode{U}}%
1163-
is a character literal of type \tcode{char32_t}. The value of a
1164-
\tcode{char32_t} character literal containing a single \grammarterm{c-char} is equal
1165-
to its ISO/IEC 10646 code point value. A \tcode{char32_t} character literal containing
1164+
is a character literal of type \tcode{char32_t},
1165+
known as a \defn{UTF-32 character literal}.
1166+
The value of a
1167+
UTF-32 character literal containing a single \grammarterm{c-char} is equal
1168+
to its ISO/IEC 10646 code point value.
1169+
A UTF-32 character literal containing
11661170
multiple \grammarterm{c-char}{s} is ill-formed.
11671171

11681172
\pnum
@@ -1530,9 +1534,8 @@
15301534
\indextext{literal!string!UTF-8}%
15311535
A \grammarterm{string-literal} that begins with \tcode{u8},
15321536
\indextext{prefix!\idxcode{u8}}%
1533-
such as \tcode{u8"asdf"}, is a \defn{UTF-8 string literal},
1534-
also referred to as a \tcode{char8_t} string literal.
1535-
A \tcode{char8_t} string literal
1537+
such as \tcode{u8"asdf"}, is a \defn{UTF-8 string literal}.
1538+
A UTF-8 string literal
15361539
has type ``array of \placeholder{n} \tcode{const char8_t}'',
15371540
where \placeholder{n} is the size of the string as defined below;
15381541
each successive element of the object representation\iref{basic.types} has
@@ -1543,28 +1546,37 @@
15431546
also referred to as narrow string literals.
15441547

15451548
\pnum
1546-
\indextext{literal!string!\idxcode{char16_t}}%
1549+
\indextext{literal!string!UTF-16}%
15471550
\indextext{type!\idxcode{char16_t}}%
15481551
A \grammarterm{string-literal} that begins with \tcode{u},
15491552
\indextext{prefix!\idxcode{u}}%
15501553
such as \tcode{u"asdf"}, is
1551-
a \tcode{char16_t} string literal. A \tcode{char16_t} string literal has
1554+
a \defn{UTF-16 string literal}.
1555+
A UTF-16 string literal has
15521556
type ``array of \placeholder{n} \tcode{const char16_t}'', where \placeholder{n} is the
1553-
size of the string as defined below; it
1554-
is initialized with the given characters. A single \grammarterm{c-char} may
1557+
size of the string as defined below;
1558+
each successive element of the array
1559+
has the value of the corresponding code unit of
1560+
the UTF-16 encoding of the string.
1561+
\begin{note}
1562+
A single \grammarterm{c-char} may
15551563
produce more than one \tcode{char16_t} character in the form of
15561564
surrogate pairs.
1565+
\end{note}
15571566

15581567
\pnum
1559-
\indextext{literal!string!\idxcode{char32_t}}%
1568+
\indextext{literal!string!UTF-32}%
15601569
\indextext{type!\idxcode{char32_t}}%
15611570
A \grammarterm{string-literal} that begins with \tcode{U},
15621571
\indextext{prefix!\idxcode{U}}%
15631572
such as \tcode{U"asdf"}, is
1564-
a \tcode{char32_t} string literal. A \tcode{char32_t} string literal has
1573+
a \defn{UTF-32 string literal}.
1574+
A UTF-32 string literal has
15651575
type ``array of \placeholder{n} \tcode{const char32_t}'', where \placeholder{n} is the
1566-
size of the string as defined below; it
1567-
is initialized with the given characters.
1576+
size of the string as defined below;
1577+
each successive element of the array
1578+
has the value of the corresponding code unit of
1579+
the UTF-32 encoding of the string.
15681580

15691581
\pnum
15701582
\indextext{literal!string!wide}%
@@ -1643,14 +1655,14 @@
16431655
\tcode{\textbackslash'}, and the double quote \tcode{"} shall be preceded by a
16441656
\tcode{\textbackslash},
16451657
and except that a \grammarterm{universal-character-name} in a
1646-
\tcode{char16_t} string literal may yield a surrogate pair.
1658+
UTF-16 string literal may yield a surrogate pair.
16471659
\indextext{string!\idxcode{sizeof}}%
16481660
In a narrow string literal, a \grammarterm{universal-character-name} may map to more
16491661
than one \tcode{char} or \tcode{char8_t} element due to \defnadj{multibyte}{encoding}. The
16501662
size of a \tcode{char32_t} or wide string literal is the total number of
16511663
escape sequences, \grammarterm{universal-character-name}{s}, and other characters, plus
16521664
one for the terminating \tcode{U'\textbackslash 0'} or
1653-
\tcode{L'\textbackslash 0'}. The size of a \tcode{char16_t} string
1665+
\tcode{L'\textbackslash 0'}. The size of a UTF-16 string
16541666
literal is the total number of escape sequences,
16551667
\grammarterm{universal-character-name}{s}, and other characters, plus one for each
16561668
character requiring a surrogate pair, plus one for the terminating

0 commit comments

Comments
 (0)