|
1436 | 1436 | \indextext{type!\idxcode{char32_t}}%
|
1437 | 1437 | \indextext{wide-character}%
|
1438 | 1438 | \indextext{type!\idxcode{wchar_t}}%
|
1439 |
| -A \defnx{non-encodable character literal}{literal!character!non-encodable} |
1440 |
| -is a \grammarterm{character-literal} |
1441 |
| -whose \grammarterm{c-char-sequence} consists of a single \grammarterm{c-char} |
1442 |
| -that is not a \grammarterm{numeric-escape-sequence} and |
1443 |
| -that specifies a character |
1444 |
| -that either lacks representation in the literal's associated character encoding |
1445 |
| -or that cannot be encoded as a single code unit. |
1446 | 1439 | A \defnadj{multicharacter}{literal} is a \grammarterm{character-literal}
|
1447 | 1440 | whose \grammarterm{c-char-sequence} consists of
|
1448 | 1441 | more than one \grammarterm{c-char}.
|
1449 |
| -The \grammarterm{encoding-prefix} of |
1450 |
| -a non-encodable character literal or a multicharacter literal |
1451 |
| -shall be absent. |
1452 |
| -Such \grammarterm{character-literal}s are conditionally-supported. |
| 1442 | +A multicharacter literal shall not have an \grammarterm{encoding-prefix}. |
| 1443 | +If a multicharacter literal contains a \grammarterm{c-char} |
| 1444 | +that is not encodable as a single code unit in the ordinary literal encoding, |
| 1445 | +the program is ill-formed. |
| 1446 | +Multicharacter literals are conditionally-supported. |
1453 | 1447 |
|
1454 | 1448 | \pnum
|
1455 | 1449 | The kind of a \grammarterm{character-literal},
|
1456 | 1450 | its type, and its associated character encoding\iref{lex.charset}
|
1457 | 1451 | are determined by
|
1458 | 1452 | its \grammarterm{encoding-prefix} and its \grammarterm{c-char-sequence}
|
1459 | 1453 | as defined by \tref{lex.ccon.literal}.
|
1460 |
| -The special cases for |
1461 |
| -non-encodable character literals and multicharacter literals |
1462 |
| -take precedence over the base kind. |
1463 |
| -\begin{note} |
1464 |
| -The associated character encoding for ordinary character literals |
1465 |
| -determines encodability, |
1466 |
| -but does not determine the value of |
1467 |
| -non-encodable ordinary character literals or |
1468 |
| -ordinary multicharacter literals. |
1469 |
| -The examples in \tref{lex.ccon.literal} |
1470 |
| -for non-encodable ordinary character literals assume that |
1471 |
| -the specified character lacks representation in |
1472 |
| -the ordinary literal encoding or |
1473 |
| -that encoding the character would require more than one code unit. |
1474 |
| -\end{note} |
1475 | 1454 |
|
1476 | 1455 | \begin{floattable}{Character literals}{lex.ccon.literal}
|
1477 | 1456 | {l|l|l|l|l}
|
|
1482 | 1461 | none &
|
1483 | 1462 | \defnx{ordinary character literal}{literal!character!ordinary} &
|
1484 | 1463 | \keyword{char} &
|
1485 |
| -ordinary & |
| 1464 | +ordinary literal & |
1486 | 1465 | \tcode{'v'} \\ \cline{2-3}\cline{5-5}
|
1487 | 1466 | &
|
1488 |
| -non-encodable ordinary character literal & |
1489 |
| -\keyword{int} & |
1490 |
| -literal & |
1491 |
| -\tcode{'\textbackslash U0001F525'} \\ \cline{2-3}\cline{5-5} |
1492 |
| - & |
1493 |
| -ordinary multicharacter literal & |
| 1467 | +multicharacter literal & |
1494 | 1468 | \keyword{int} &
|
1495 | 1469 | encoding &
|
1496 | 1470 | \tcode{'abcd'} \\ \hline
|
|
1522 | 1496 | the value of a \grammarterm{character-literal} is determined
|
1523 | 1497 | using the range of representable values
|
1524 | 1498 | of the \grammarterm{character-literal}'s type in translation phase 7.
|
1525 |
| -A non-encodable character literal or a multicharacter literal |
1526 |
| -has an |
| 1499 | +A multicharacter literal has an |
1527 | 1500 | \impldef{value of non-encodable character literal or multicharacter literal}
|
1528 | 1501 | value.
|
1529 | 1502 | The value of any other kind of \grammarterm{character-literal}
|
|
1537 | 1510 | \grammarterm{universal-character-name}
|
1538 | 1511 | is the code unit value of the specified character
|
1539 | 1512 | as encoded in the literal's associated character encoding.
|
1540 |
| -\begin{note} |
1541 | 1513 | If the specified character lacks
|
1542 | 1514 | representation in the literal's associated character encoding or
|
1543 | 1515 | if it cannot be encoded as a single code unit,
|
1544 |
| -then the literal is a non-encodable character literal. |
1545 |
| -\end{note} |
| 1516 | +then the program is ill-formed. |
1546 | 1517 | \item
|
1547 | 1518 | A \grammarterm{character-literal} with
|
1548 | 1519 | a \grammarterm{c-char-sequence} consisting of
|
|
1568 | 1539 | $v$ does not exceed the range of representable values of the corresponding unsigned type for the underlying type of the \grammarterm{character-literal}'s type,
|
1569 | 1540 | then the value is the unique value of the \grammarterm{character-literal}'s type \tcode{T} that is congruent to $v$ modulo $2^N$, where $N$ is the width of \tcode{T}.
|
1570 | 1541 | \item
|
1571 |
| -Otherwise, the \grammarterm{character-literal} is ill-formed. |
| 1542 | +Otherwise, the program is ill-formed. |
1572 | 1543 | \end{itemize}
|
1573 | 1544 | \item
|
1574 | 1545 | A \grammarterm{character-literal} with
|
|
2006 | 1977 | is encoded to a code unit sequence
|
2007 | 1978 | using the \grammarterm{string-literal}'s associated character encoding.
|
2008 | 1979 | If a character lacks representation in the associated character encoding,
|
2009 |
| -then the \grammarterm{string-literal} is conditionally-supported and |
2010 |
| -an |
2011 |
| -\impldef{code unit sequence for non-representable \grammarterm{string-literal}} |
2012 |
| -code unit sequence is encoded. |
| 1980 | +then the program is ill-formed. |
2013 | 1981 | \begin{note}
|
2014 | 1982 | No character lacks representation in any Unicode encoding form.
|
2015 | 1983 | \end{note}
|
|
2050 | 2018 | the \grammarterm{string-literal}'s array element type \tcode{T}
|
2051 | 2019 | that is congruent to $v$ modulo $2^N$, where $N$ is the width of \tcode{T}.
|
2052 | 2020 | \item
|
2053 |
| -Otherwise, the \grammarterm{string-literal} is ill-formed. |
| 2021 | +Otherwise, the program is ill-formed. |
2054 | 2022 | \end{itemize}
|
2055 | 2023 | When encoding a stateful character encoding,
|
2056 | 2024 | these sequences should have no effect on encoding state.
|
|
0 commit comments