Skip to content

Commit fbc6356

Browse files
author
Dawn Perchik
committed
P1139R2 Address wording issues related to ISO 10646
[lex] Turn notes into separate sentences.
1 parent cafdbd8 commit fbc6356

File tree

2 files changed

+33
-14
lines changed

2 files changed

+33
-14
lines changed

source/lex.tex

Lines changed: 32 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -208,21 +208,31 @@
208208
\end{bnf}
209209

210210
The character designated by the \grammarterm{universal-character-name} \tcode{\textbackslash
211-
UNNNNNNNN} is that character whose character short name in ISO/IEC 10646 is
212-
\tcode{NNNNNNNN}; the character designated by the \grammarterm{universal-character-name}
213-
\tcode{\textbackslash uNNNN} is that character whose character short name in
214-
ISO/IEC 10646 is \tcode{0000NNNN}. If the hexadecimal value for a
215-
\grammarterm{universal-character-name} corresponds to a surrogate code point (in the
216-
range 0xD800--0xDFFF, inclusive), the program is ill-formed. Additionally, if
217-
the hexadecimal value for a \grammarterm{universal-character-name} outside
211+
U00NNNNNN} is that character
212+
that has \tcode{U+NNNNNN} as a code point short identifier;
213+
the character designated by the \grammarterm{universal-character-name}
214+
\tcode{\textbackslash uNNNN} is that character
215+
that has \tcode{U+NNNN} as a code point short identifier.
216+
If a \grammarterm{universal-character-name} does not correspond to
217+
a code point in ISO/IEC 10646 or
218+
if a \grammarterm{universal-character-name} corresponds to
219+
a surrogate code point,
220+
the program is ill-formed. Additionally, if
221+
a \grammarterm{universal-character-name} outside
218222
the \grammarterm{c-char-sequence}, \grammarterm{s-char-sequence}, or
219223
\grammarterm{r-char-sequence} of
220224
a character or
221-
string literal corresponds to a control character (in either of the
222-
ranges 0x00--0x1F or 0x7F--0x9F, both inclusive) or to a character in the basic
225+
string literal corresponds to a control character or
226+
to a character in the basic
223227
source character set, the program is ill-formed.\footnote{A sequence of characters resembling a \grammarterm{universal-character-name} in an
224228
\grammarterm{r-char-sequence}\iref{lex.string} does not form a
225229
\grammarterm{universal-character-name}.}
230+
\begin{note}
231+
ISO/IEC 10646 code points are within the range 0x0-0x10FFFF (inclusive).
232+
A surrogate code point is a value in the range 0xD800-0xDFFF (inclusive).
233+
A control character is a character whose code point is
234+
in either of the ranges 0x0-0x1F or 0x7F-0x9F (both inclusive).
235+
\end{note}
226236

227237
\pnum
228238
The \defnx{basic execution character set}{character set!basic execution} and the
@@ -1132,8 +1142,10 @@
11321142
The value of a UTF-8 character literal
11331143
is equal to its ISO/IEC 10646 code point value,
11341144
provided that the code point value
1135-
is representable with a single UTF-8 code unit
1136-
(that is, provided it is in the C0 Controls and Basic Latin Unicode block).
1145+
can be encoded as a single UTF-8 code unit.
1146+
\begin{note}
1147+
That is, provided the code point value is in the range 0x0-0x7F (inclusive).
1148+
\end{note}
11371149
If the value is not representable with a single UTF-8 code unit,
11381150
the program is ill-formed.
11391151
A UTF-8 character literal containing multiple \grammarterm{c-char}{s} is ill-formed.
@@ -1148,8 +1160,11 @@
11481160
is a character literal of type \tcode{char16_t}. The value
11491161
of a \tcode{char16_t} character literal containing a single \grammarterm{c-char} is
11501162
equal to its ISO/IEC 10646 code point value, provided that the code point value is
1151-
representable with a single 16-bit code unit (that is, provided it is in the
1152-
basic multi-lingual plane). If the value is not representable
1163+
representable with a single 16-bit code unit.
1164+
\begin{note}
1165+
That is, provided the code point value is in the range 0x0-0xFFFF (inclusive).
1166+
\end{note}
1167+
If the value is not representable
11531168
with a single 16-bit code unit, the program is ill-formed. A \tcode{char16_t} character literal
11541169
containing multiple \grammarterm{c-char}{s} is ill-formed.
11551170

@@ -1554,6 +1569,10 @@
15541569
is initialized with the given characters. A single \grammarterm{c-char} may
15551570
produce more than one \tcode{char16_t} character in the form of
15561571
surrogate pairs.
1572+
\begin{note}
1573+
A surrogate pair is a representation for a single code point
1574+
as a sequence of two 16-bit code units.
1575+
\end{note}
15571576

15581577
\pnum
15591578
\indextext{literal!string!\idxcode{char32_t}}%

source/preprocessor.tex

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1549,7 +1549,7 @@
15491549
An integer literal of the form \tcode{yyyymmL} (for example,
15501550
\tcode{199712L}).
15511551
If this symbol is defined, then every character in the Unicode required set, when
1552-
stored in an object of type \tcode{wchar_t}, has the same value as the short identifier
1552+
stored in an object of type \tcode{wchar_t}, has the same value as the code point
15531553
of that character. The \defn{Unicode required set} consists of all
15541554
the characters that are defined by ISO/IEC 10646, along with
15551555
all amendments and technical corrigenda as of the specified year and month.

0 commit comments

Comments
 (0)