Skip to content

Commit d53ab9c

Browse files
authored
Merge 2023-06 CWG Motion 3
P1854R4 Making non-encodable string literals ill-formed
2 parents d1bf633 + d3de50a commit d53ab9c

File tree

1 file changed

+12
-44
lines changed

1 file changed

+12
-44
lines changed

source/lex.tex

Lines changed: 12 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -1436,42 +1436,21 @@
14361436
\indextext{type!\idxcode{char32_t}}%
14371437
\indextext{wide-character}%
14381438
\indextext{type!\idxcode{wchar_t}}%
1439-
A \defnx{non-encodable character literal}{literal!character!non-encodable}
1440-
is a \grammarterm{character-literal}
1441-
whose \grammarterm{c-char-sequence} consists of a single \grammarterm{c-char}
1442-
that is not a \grammarterm{numeric-escape-sequence} and
1443-
that specifies a character
1444-
that either lacks representation in the literal's associated character encoding
1445-
or that cannot be encoded as a single code unit.
14461439
A \defnadj{multicharacter}{literal} is a \grammarterm{character-literal}
14471440
whose \grammarterm{c-char-sequence} consists of
14481441
more than one \grammarterm{c-char}.
1449-
The \grammarterm{encoding-prefix} of
1450-
a non-encodable character literal or a multicharacter literal
1451-
shall be absent.
1452-
Such \grammarterm{character-literal}s are conditionally-supported.
1442+
A multicharacter literal shall not have an \grammarterm{encoding-prefix}.
1443+
If a multicharacter literal contains a \grammarterm{c-char}
1444+
that is not encodable as a single code unit in the ordinary literal encoding,
1445+
the program is ill-formed.
1446+
Multicharacter literals are conditionally-supported.
14531447

14541448
\pnum
14551449
The kind of a \grammarterm{character-literal},
14561450
its type, and its associated character encoding\iref{lex.charset}
14571451
are determined by
14581452
its \grammarterm{encoding-prefix} and its \grammarterm{c-char-sequence}
14591453
as defined by \tref{lex.ccon.literal}.
1460-
The special cases for
1461-
non-encodable character literals and multicharacter literals
1462-
take precedence over the base kind.
1463-
\begin{note}
1464-
The associated character encoding for ordinary character literals
1465-
determines encodability,
1466-
but does not determine the value of
1467-
non-encodable ordinary character literals or
1468-
ordinary multicharacter literals.
1469-
The examples in \tref{lex.ccon.literal}
1470-
for non-encodable ordinary character literals assume that
1471-
the specified character lacks representation in
1472-
the ordinary literal encoding or
1473-
that encoding the character would require more than one code unit.
1474-
\end{note}
14751454

14761455
\begin{floattable}{Character literals}{lex.ccon.literal}
14771456
{l|l|l|l|l}
@@ -1482,15 +1461,10 @@
14821461
none &
14831462
\defnx{ordinary character literal}{literal!character!ordinary} &
14841463
\keyword{char} &
1485-
ordinary &
1464+
ordinary literal &
14861465
\tcode{'v'} \\ \cline{2-3}\cline{5-5}
14871466
&
1488-
non-encodable ordinary character literal &
1489-
\keyword{int} &
1490-
literal &
1491-
\tcode{'\textbackslash U0001F525'} \\ \cline{2-3}\cline{5-5}
1492-
&
1493-
ordinary multicharacter literal &
1467+
multicharacter literal &
14941468
\keyword{int} &
14951469
encoding &
14961470
\tcode{'abcd'} \\ \hline
@@ -1522,8 +1496,7 @@
15221496
the value of a \grammarterm{character-literal} is determined
15231497
using the range of representable values
15241498
of the \grammarterm{character-literal}'s type in translation phase 7.
1525-
A non-encodable character literal or a multicharacter literal
1526-
has an
1499+
A multicharacter literal has an
15271500
\impldef{value of non-encodable character literal or multicharacter literal}
15281501
value.
15291502
The value of any other kind of \grammarterm{character-literal}
@@ -1537,12 +1510,10 @@
15371510
\grammarterm{universal-character-name}
15381511
is the code unit value of the specified character
15391512
as encoded in the literal's associated character encoding.
1540-
\begin{note}
15411513
If the specified character lacks
15421514
representation in the literal's associated character encoding or
15431515
if it cannot be encoded as a single code unit,
1544-
then the literal is a non-encodable character literal.
1545-
\end{note}
1516+
then the program is ill-formed.
15461517
\item
15471518
A \grammarterm{character-literal} with
15481519
a \grammarterm{c-char-sequence} consisting of
@@ -1568,7 +1539,7 @@
15681539
$v$ does not exceed the range of representable values of the corresponding unsigned type for the underlying type of the \grammarterm{character-literal}'s type,
15691540
then the value is the unique value of the \grammarterm{character-literal}'s type \tcode{T} that is congruent to $v$ modulo $2^N$, where $N$ is the width of \tcode{T}.
15701541
\item
1571-
Otherwise, the \grammarterm{character-literal} is ill-formed.
1542+
Otherwise, the program is ill-formed.
15721543
\end{itemize}
15731544
\item
15741545
A \grammarterm{character-literal} with
@@ -2006,10 +1977,7 @@
20061977
is encoded to a code unit sequence
20071978
using the \grammarterm{string-literal}'s associated character encoding.
20081979
If a character lacks representation in the associated character encoding,
2009-
then the \grammarterm{string-literal} is conditionally-supported and
2010-
an
2011-
\impldef{code unit sequence for non-representable \grammarterm{string-literal}}
2012-
code unit sequence is encoded.
1980+
then the program is ill-formed.
20131981
\begin{note}
20141982
No character lacks representation in any Unicode encoding form.
20151983
\end{note}
@@ -2050,7 +2018,7 @@
20502018
the \grammarterm{string-literal}'s array element type \tcode{T}
20512019
that is congruent to $v$ modulo $2^N$, where $N$ is the width of \tcode{T}.
20522020
\item
2053-
Otherwise, the \grammarterm{string-literal} is ill-formed.
2021+
Otherwise, the program is ill-formed.
20542022
\end{itemize}
20552023
When encoding a stateful character encoding,
20562024
these sequences should have no effect on encoding state.

0 commit comments

Comments
 (0)