Skip to content

Commit 716bee3

Browse files
authored
Merge 2018-11 CWG Motion 11
P0482R6 char8_t: A type for UTF-8 characters and strings Fixes #2403
2 parents 5d81baf + b4eec9d commit 716bee3

14 files changed

+533
-241
lines changed

source/atomics.tex

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,7 @@
4343
// \ref{atomics.lockfree}, lock-free property
4444
#define ATOMIC_BOOL_LOCK_FREE @\unspec@
4545
#define ATOMIC_CHAR_LOCK_FREE @\unspec@
46+
#define ATOMIC_CHAR8_T_LOCK_FREE @\unspec@
4647
#define ATOMIC_CHAR16_T_LOCK_FREE @\unspec@
4748
#define ATOMIC_CHAR32_T_LOCK_FREE @\unspec@
4849
#define ATOMIC_WCHAR_T_LOCK_FREE @\unspec@
@@ -203,6 +204,7 @@
203204
using atomic_ulong = atomic<unsigned long>;
204205
using atomic_llong = atomic<long long>;
205206
using atomic_ullong = atomic<unsigned long long>;
207+
using atomic_char8_t = atomic<char8_t>;
206208
using atomic_char16_t = atomic<char16_t>;
207209
using atomic_char32_t = atomic<char32_t>;
208210
using atomic_wchar_t = atomic<wchar_t>;
@@ -272,6 +274,7 @@
272274
\indexlibrary{\idxcode{atomic_ulong}}%
273275
\indexlibrary{\idxcode{atomic_llong}}%
274276
\indexlibrary{\idxcode{atomic_ullong}}%
277+
\indexlibrary{\idxcode{atomic_char8_t}}%
275278
\indexlibrary{\idxcode{atomic_char16_t}}%
276279
\indexlibrary{\idxcode{atomic_char32_t}}%
277280
\indexlibrary{\idxcode{atomic_wchar_t}}%
@@ -535,6 +538,7 @@
535538

536539
\indexlibrary{\idxcode{ATOMIC_BOOL_LOCK_FREE}}%
537540
\indexlibrary{\idxcode{ATOMIC_CHAR_LOCK_FREE}}%
541+
\indexlibrary{\idxcode{ATOMIC_CHAR8_T_LOCK_FREE}}%
538542
\indexlibrary{\idxcode{ATOMIC_CHAR16_T_LOCK_FREE}}%
539543
\indexlibrary{\idxcode{ATOMIC_CHAR32_T_LOCK_FREE}}%
540544
\indexlibrary{\idxcode{ATOMIC_WCHAR_T_LOCK_FREE}}%
@@ -547,6 +551,7 @@
547551
\begin{codeblock}
548552
#define ATOMIC_BOOL_LOCK_FREE @\unspec@
549553
#define ATOMIC_CHAR_LOCK_FREE @\unspec@
554+
#define ATOMIC_CHAR8_T_LOCK_FREE @\unspec@
550555
#define ATOMIC_CHAR16_T_LOCK_FREE @\unspec@
551556
#define ATOMIC_CHAR32_T_LOCK_FREE @\unspec@
552557
#define ATOMIC_WCHAR_T_LOCK_FREE @\unspec@
@@ -927,6 +932,7 @@
927932
\tcode{unsigned long},
928933
\tcode{long long},
929934
\tcode{unsigned long long},
935+
\tcode{char8_t},
930936
\tcode{char16_t},
931937
\tcode{char32_t},
932938
\tcode{wchar_t},
@@ -1745,6 +1751,7 @@
17451751
\tcode{unsigned long},
17461752
\tcode{long long},
17471753
\tcode{unsigned long long},
1754+
\tcode{char8_t},
17481755
\tcode{char16_t},
17491756
\tcode{char32_t},
17501757
\tcode{wchar_t},

source/basic.tex

Lines changed: 19 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -3575,7 +3575,7 @@
35753575
\tcode{alignof} expression\iref{expr.alignof}. Furthermore,
35763576
the narrow character types\iref{basic.fundamental} shall have the weakest
35773577
alignment requirement.
3578-
\begin{note} This enables the narrow character types to be used as the
3578+
\begin{note} This enables the ordinary character types to be used as the
35793579
underlying type for an aligned memory area\iref{dcl.align}.\end{note}
35803580

35813581
\pnum
@@ -4289,6 +4289,7 @@
42894289
\defnx{extended integer types}{extended integer type}.
42904290

42914291
\pnum
4292+
\indextext{underlying type|see{type, underlying}}%
42924293
A fundamental type specified to have
42934294
a signed or unsigned integer type as its \defn{underlying type} has
42944295
the same object representation,
@@ -4300,6 +4301,7 @@
43004301
\pnum
43014302
\indextext{type!\idxcode{char}}%
43024303
\indextext{type!character}%
4304+
\indextext{type!ordinary character}%
43034305
\indextext{type!narrow character}%
43044306
\indextext{\idxcode{char}!implementation-defined sign of}%
43054307
\indextext{type!\idxcode{signed char}}%
@@ -4311,6 +4313,9 @@
43114313
The values of type \tcode{char} can represent distinct codes
43124314
for all members of the implementation's basic character set.
43134315
The three types \tcode{char}, \tcode{signed char}, and \tcode{unsigned char}
4316+
are collectively called
4317+
\defnx{ordinary character types}{type!ordinary character}.
4318+
The ordinary character types and \tcode{char8_t}
43144319
are collectively called \defnx{narrow character types}{narrow character type}.
43154320
For narrow character types,
43164321
each possible bit pattern of the object representation represents
@@ -4326,14 +4331,20 @@
43264331
\pnum
43274332
\indextext{\idxcode{wchar_t}|see{type, \tcode{wchar_t}}}%
43284333
\indextext{type!\idxcode{wchar_t}}%
4329-
\indextext{underlying type|see{type, underlying}}%
43304334
\indextext{type!underlying!\idxcode{wchar_t}}%
43314335
Type \tcode{wchar_t} is a distinct type that has
43324336
an \impldef{underlying type of \tcode{wchar_t}}
43334337
signed or unsigned integer type as its underlying type.
43344338
The values of type \tcode{wchar_t} can represent
43354339
distinct codes for all members of the largest extended character set
43364340
specified among the supported locales\iref{locale}.
4341+
4342+
\pnum
4343+
\indextext{\idxcode{char8_t}|see{type, \tcode{char8_t}}}%
4344+
\indextext{type!\idxcode{char8_t}}%
4345+
\indextext{type!underlying!\idxcode{char8_t}}%
4346+
Type \tcode{char8_t} denotes a distinct type
4347+
whose underlying type is \tcode{unsigned char}.
43374348
\indextext{\idxcode{char16_t}|see{type, \tcode{char16_t}}}%
43384349
\indextext{\idxcode{char32_t}|see{type, \tcode{char32_t}}}%
43394350
\indextext{type!\idxcode{char16_t}}%
@@ -4364,8 +4375,11 @@
43644375

43654376
\pnum
43664377
\indextext{type!integral}%
4367-
Types \tcode{bool}, \tcode{char}, \tcode{char16_t}, \tcode{char32_t},
4368-
\tcode{wchar_t}, and the signed and unsigned integer types are
4378+
Types
4379+
\tcode{bool},
4380+
\tcode{char}, \tcode{wchar_t},
4381+
\tcode{char8_t}, \tcode{char16_t}, \tcode{char32_t},
4382+
and the signed and unsigned integer types are
43694383
collectively called
43704384
\defnx{integral types}{integral type}.
43714385
A synonym for integral type is \defn{integer type}.
@@ -4737,7 +4751,7 @@
47374751
\indextext{type!\idxcode{wchar_t}}%
47384752
\indextext{type!\idxcode{char16_t}}%
47394753
\indextext{type!\idxcode{char32_t}}%
4740-
\item The ranks of \tcode{char16_t}, \tcode{char32_t}, and
4754+
\item The ranks of \tcode{char8_t}, \tcode{char16_t}, \tcode{char32_t}, and
47414755
\tcode{wchar_t} shall equal the ranks of their underlying
47424756
types\iref{basic.fundamental}.
47434757

source/compatibility.tex

Lines changed: 68 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,9 @@
6464
The type of a string literal is changed
6565
from ``array of \tcode{char}''
6666
to ``array of \tcode{const char}''.
67+
The type of a UTF-8 string literal is changed
68+
from ``array of \tcode{char}''
69+
to ``array of \tcode{const char8_t}''.
6770
The type of a \tcode{char16_t} string literal is changed
6871
from ``array of \textit{some-integer-type}''
6972
to ``array of \tcode{const char16_t}''.
@@ -1796,9 +1799,11 @@
17961799
to introduce constraints through a \grammarterm{requires-clause} or
17971800
a \grammarterm{requires-expression}. The \tcode{concept} keyword is
17981801
added to enable the definition of concepts\iref{temp.concept}.
1802+
The \tcode{char8_t} keyword is added to differentiate
1803+
the types of ordinary and UTF-8 literals\iref{lex.string}.
17991804
\effect
1800-
Valid ISO \CppXVII{} code using \tcode{concept} or \tcode{requires}
1801-
as an identifier is not valid in this International Standard.
1805+
Valid ISO \CppXVII{} code using \tcode{concept}, \tcode{requires},
1806+
or \tcode{char8_t} as an identifier is not valid in this International Standard.
18021807

18031808
\diffref{lex.operators}
18041809
\change New operator \tcode{<=>}.
@@ -1815,6 +1820,34 @@
18151820
}
18161821
\end{codeblock}
18171822

1823+
\diffref{lex.literal}
1824+
\change Type of UTF-8 string and character literals.
1825+
\rationale Required for new features.
1826+
The changed types enable function overloading, template specialization, and
1827+
type deduction to distinguish ordinary and UTF-8 string and character literals.
1828+
\effect Valid ISO \CppXVII{} code that depends on
1829+
UTF-8 string literals having type ``array of \tcode{const char}'' and
1830+
UTF-8 character literals having type ``char''
1831+
is not valid in this International Standard.
1832+
\begin{codeblock}
1833+
const auto *u8s = u8"text"; // \tcode{u8s} previously deduced as \tcode{const char*}; now deduced as \tcode{const char8_t*}
1834+
const char *ps = u8s; // ill-formed; previously well-formed
1835+
1836+
auto u8c = u8'c'; // \tcode{u8c} previously deduced as \tcode{char}; now deduced as \tcode{char8_t}
1837+
char *pc = &u8c; // ill-formed; previously well-formed
1838+
1839+
std::string s = u8"text"; // ill-formed; previously well-formed
1840+
1841+
void f(const char *s);
1842+
f(u8"text"); // ill-formed; previously well-formed
1843+
1844+
template<typename> struct ct;
1845+
template<> struct ct<char> {
1846+
using type = char;
1847+
};
1848+
ct<decltype(u8'c')>::type x; // ill-formed; previously well-formed.
1849+
\end{codeblock}
1850+
18181851
\rSec2[diff.cpp17.basic]{\ref{basic}: basics}
18191852

18201853
\diffref{intro.races}
@@ -2031,6 +2064,39 @@
20312064
Translation units compiled against this version of \Cpp{} may be incompatible with
20322065
translation units compiled against \CppXVII{}, either failing to link or having undefined behavior.
20332066

2067+
\rSec2[diff.cpp17.input.output]{\ref{input.output}: input/output library}
2068+
2069+
\diffref{ostream.inserters.character}
2070+
\change
2071+
Overload resolution for ostream inserters used with UTF-8 literals.
2072+
\rationale
2073+
Required for new features.
2074+
\effect
2075+
Valid ISO \CppXVII{} code that passes UTF-8 literals
2076+
to \tcode{basic_ostream::\brk{}operator<<}
2077+
no longer calls character-related overloads.
2078+
\begin{codeblock}
2079+
std::cout << u8"text"; // previously called \tcode{operator<<(const char*)} and printed a string;
2080+
// now calls \tcode{operator<<(const void*)} and prints a pointer value
2081+
std::cout << u8'X'; // previously called \tcode{operator<<(char)} and printed a character;
2082+
// now calls \tcode{operator<<(int)} and prints an integer value
2083+
\end{codeblock}
2084+
2085+
\diffref{fs.class.path}
2086+
\change
2087+
Return type of filesystem path format observer member functions.
2088+
\rationale
2089+
Required for new features.
2090+
\effect
2091+
Valid ISO \CppXVII{} code that depends on the \tcode{u8string()} and
2092+
\tcode{generic_u8string()} member functions of \tcode{std::filesystem::path}
2093+
returning \tcode{std::string} is not valid in this International Standard.
2094+
\begin{codeblock}
2095+
std::filesystem::path p;
2096+
std::string s1 = p.u8string(); // ill-formed; previously well-formed
2097+
std::string s2 = p.generic_u8string(); // ill-formed; previously well-formed
2098+
\end{codeblock}
2099+
20342100
\rSec2[diff.cpp17.depr]{\ref{depr}: compatibility features}
20352101

20362102
\nodiffref

source/declarations.tex

Lines changed: 19 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1247,6 +1247,7 @@
12471247
nested-name-specifier \terminal{template} simple-template-id\br
12481248
\opt{nested-name-specifier} template-name\br
12491249
\terminal{char}\br
1250+
\terminal{char8_t}\br
12501251
\terminal{char16_t}\br
12511252
\terminal{char32_t}\br
12521253
\terminal{wchar_t}\br
@@ -1278,6 +1279,7 @@
12781279

12791280
\pnum
12801281
\indextext{type specifier!\idxcode{char}}%
1282+
\indextext{type specifier!\idxcode{char8_t}}%
12811283
\indextext{type specifier!\idxcode{char16_t}}%
12821284
\indextext{type specifier!\idxcode{char32_t}}%
12831285
\indextext{type specifier!\idxcode{wchar_t}}%
@@ -1326,6 +1328,7 @@
13261328
\tcode{char} & ``\tcode{char}'' \\
13271329
\tcode{unsigned char} & ``\tcode{unsigned char}'' \\
13281330
\tcode{signed char} & ``\tcode{signed char}'' \\
1331+
\tcode{char8_t} & ``\tcode{char8_t}'' \\
13291332
\tcode{char16_t} & ``\tcode{char16_t}'' \\
13301333
\tcode{char32_t} & ``\tcode{char32_t}'' \\
13311334
\tcode{bool} & ``\tcode{bool}'' \\
@@ -4170,41 +4173,41 @@
41704173
\begin{itemize}
41714174
\item
41724175
If an indeterminate value of
4173-
unsigned narrow character type\iref{basic.fundamental}
4176+
unsigned ordinary character type\iref{basic.fundamental}
41744177
or \tcode{std::byte} type\iref{cstddef.syn}
41754178
is produced by the evaluation of:
41764179
\begin{itemize}
41774180
\item the second or third operand of a conditional expression\iref{expr.cond},
41784181
\item the right operand of a comma expression\iref{expr.comma},
41794182
\item the operand of a cast or conversion~(\ref{conv.integral},
41804183
\ref{expr.type.conv}, \ref{expr.static.cast}, \ref{expr.cast}) to an
4181-
unsigned narrow character type
4184+
unsigned ordinary character type
41824185
or \tcode{std::byte} type\iref{cstddef.syn}, or
41834186
\item a discarded-value expression\iref{expr.prop},
41844187
\end{itemize}
41854188
then the result of the operation is an indeterminate value.
41864189

41874190
\item
41884191
If an indeterminate value of
4189-
unsigned narrow character type
4192+
unsigned ordinary character type
41904193
or \tcode{std::byte} type
41914194
is produced by the evaluation of the right
41924195
operand of a simple assignment operator\iref{expr.ass} whose first operand
41934196
is an lvalue of
4194-
unsigned narrow character type
4197+
unsigned ordinary character type
41954198
or \tcode{std::byte} type,
41964199
an indeterminate value replaces
41974200
the value of the object referred to by the left operand.
41984201

41994202
\item
4200-
If an indeterminate value of unsigned narrow character type is produced by the
4203+
If an indeterminate value of unsigned ordinary character type is produced by the
42014204
evaluation of the initialization expression when initializing an object of
4202-
unsigned narrow character type, that object is initialized to an indeterminate
4205+
unsigned ordinary character type, that object is initialized to an indeterminate
42034206
value.
42044207

42054208
\item
42064209
If an indeterminate value of
4207-
unsigned narrow character type
4210+
unsigned ordinary character type
42084211
or \tcode{std::byte} type
42094212
is produced by the
42104213
evaluation of the initialization expression when initializing an object of
@@ -4292,6 +4295,7 @@
42924295
If the destination type is a reference type, see~\ref{dcl.init.ref}.
42934296
\item
42944297
If the destination type is an array of characters,
4298+
an array of \tcode{char8_t},
42954299
an array of \tcode{char16_t},
42964300
an array of \tcode{char32_t},
42974301
or an array of
@@ -4986,13 +4990,17 @@
49864990
\indextext{initialization!character array}
49874991

49884992
\pnum
4989-
An array of narrow character type\iref{basic.fundamental},
4993+
An array of ordinary character type\iref{basic.fundamental},
4994+
\tcode{char8_t} array,
49904995
\tcode{char16_t} array,
49914996
\tcode{char32_t} array,
49924997
or \tcode{wchar_t} array
4993-
can be initialized by a
4994-
narrow string literal, \tcode{char16_t} string literal, \tcode{char32_t} string
4995-
literal, or wide string literal,
4998+
can be initialized by
4999+
an ordinary string literal,
5000+
\tcode{char8_t} string literal,
5001+
\tcode{char16_t} string literal,
5002+
\tcode{char32_t} string literal, or
5003+
wide string literal,
49965004
respectively, or by an appropriately-typed string literal enclosed in
49975005
braces\iref{lex.string}.
49985006
\indextext{initialization!character array}%

source/expressions.tex

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1086,7 +1086,7 @@
10861086
converted to \tcode{float}.
10871087

10881088
\item Otherwise, the integral promotions\iref{conv.prom} shall be
1089-
performed on both operands.\footnote{As a consequence, operands of type \tcode{bool}, \tcode{char16_t},
1089+
performed on both operands.\footnote{As a consequence, operands of type \tcode{bool}, \tcode{char8_t}, \tcode{char16_t},
10901090
\tcode{char32_t}, \tcode{wchar_t}, or an enumerated type are converted
10911091
to some integral type.}
10921092
Then the following rules shall be applied to the promoted operands:
@@ -4267,8 +4267,9 @@
42674267
has function or incomplete type,
42684268
to the parenthesized name of such
42694269
types, or to a glvalue that designates a bit-field.
4270-
\tcode{sizeof(char)}, \tcode{sizeof(signed char)} and
4271-
\tcode{sizeof(unsigned char)} are \tcode{1}. The result of
4270+
The result of \tcode{sizeof}
4271+
applied to any of the narrow character types is \tcode{1}.
4272+
The result of
42724273
\tcode{sizeof} applied to any other fundamental
42734274
type\iref{basic.fundamental} is \impldef{\tcode{sizeof} applied to
42744275
fundamental types

0 commit comments

Comments
 (0)