Skip to content

P0482R6 char8_t: A type for UTF-8 characters and strings #2463

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Nov 25, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions source/atomics.tex
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@
// \ref{atomics.lockfree}, lock-free property
#define ATOMIC_BOOL_LOCK_FREE @\unspec@
#define ATOMIC_CHAR_LOCK_FREE @\unspec@
#define ATOMIC_CHAR8_T_LOCK_FREE @\unspec@
#define ATOMIC_CHAR16_T_LOCK_FREE @\unspec@
#define ATOMIC_CHAR32_T_LOCK_FREE @\unspec@
#define ATOMIC_WCHAR_T_LOCK_FREE @\unspec@
Expand Down Expand Up @@ -203,6 +204,7 @@
using atomic_ulong = atomic<unsigned long>;
using atomic_llong = atomic<long long>;
using atomic_ullong = atomic<unsigned long long>;
using atomic_char8_t = atomic<char8_t>;
using atomic_char16_t = atomic<char16_t>;
using atomic_char32_t = atomic<char32_t>;
using atomic_wchar_t = atomic<wchar_t>;
Expand Down Expand Up @@ -272,6 +274,7 @@
\indexlibrary{\idxcode{atomic_ulong}}%
\indexlibrary{\idxcode{atomic_llong}}%
\indexlibrary{\idxcode{atomic_ullong}}%
\indexlibrary{\idxcode{atomic_char8_t}}%
\indexlibrary{\idxcode{atomic_char16_t}}%
\indexlibrary{\idxcode{atomic_char32_t}}%
\indexlibrary{\idxcode{atomic_wchar_t}}%
Expand Down Expand Up @@ -535,6 +538,7 @@

\indexlibrary{\idxcode{ATOMIC_BOOL_LOCK_FREE}}%
\indexlibrary{\idxcode{ATOMIC_CHAR_LOCK_FREE}}%
\indexlibrary{\idxcode{ATOMIC_CHAR8_T_LOCK_FREE}}%
\indexlibrary{\idxcode{ATOMIC_CHAR16_T_LOCK_FREE}}%
\indexlibrary{\idxcode{ATOMIC_CHAR32_T_LOCK_FREE}}%
\indexlibrary{\idxcode{ATOMIC_WCHAR_T_LOCK_FREE}}%
Expand All @@ -547,6 +551,7 @@
\begin{codeblock}
#define ATOMIC_BOOL_LOCK_FREE @\unspec@
#define ATOMIC_CHAR_LOCK_FREE @\unspec@
#define ATOMIC_CHAR8_T_LOCK_FREE @\unspec@
#define ATOMIC_CHAR16_T_LOCK_FREE @\unspec@
#define ATOMIC_CHAR32_T_LOCK_FREE @\unspec@
#define ATOMIC_WCHAR_T_LOCK_FREE @\unspec@
Expand Down Expand Up @@ -927,6 +932,7 @@
\tcode{unsigned long},
\tcode{long long},
\tcode{unsigned long long},
\tcode{char8_t},
\tcode{char16_t},
\tcode{char32_t},
\tcode{wchar_t},
Expand Down Expand Up @@ -1745,6 +1751,7 @@
\tcode{unsigned long},
\tcode{long long},
\tcode{unsigned long long},
\tcode{char8_t},
\tcode{char16_t},
\tcode{char32_t},
\tcode{wchar_t},
Expand Down
24 changes: 19 additions & 5 deletions source/basic.tex
Original file line number Diff line number Diff line change
Expand Up @@ -3575,7 +3575,7 @@
\tcode{alignof} expression\iref{expr.alignof}. Furthermore,
the narrow character types\iref{basic.fundamental} shall have the weakest
alignment requirement.
\begin{note} This enables the narrow character types to be used as the
\begin{note} This enables the ordinary character types to be used as the
underlying type for an aligned memory area\iref{dcl.align}.\end{note}

\pnum
Expand Down Expand Up @@ -4289,6 +4289,7 @@
\defnx{extended integer types}{extended integer type}.

\pnum
\indextext{underlying type|see{type, underlying}}%
A fundamental type specified to have
a signed or unsigned integer type as its \defn{underlying type} has
the same object representation,
Expand All @@ -4300,6 +4301,7 @@
\pnum
\indextext{type!\idxcode{char}}%
\indextext{type!character}%
\indextext{type!ordinary character}%
\indextext{type!narrow character}%
\indextext{\idxcode{char}!implementation-defined sign of}%
\indextext{type!\idxcode{signed char}}%
Expand All @@ -4311,6 +4313,9 @@
The values of type \tcode{char} can represent distinct codes
for all members of the implementation's basic character set.
The three types \tcode{char}, \tcode{signed char}, and \tcode{unsigned char}
are collectively called
\defnx{ordinary character types}{type!ordinary character}.
The ordinary character types and \tcode{char8_t}
are collectively called \defnx{narrow character types}{narrow character type}.
For narrow character types,
each possible bit pattern of the object representation represents
Expand All @@ -4326,14 +4331,20 @@
\pnum
\indextext{\idxcode{wchar_t}|see{type, \tcode{wchar_t}}}%
\indextext{type!\idxcode{wchar_t}}%
\indextext{underlying type|see{type, underlying}}%
\indextext{type!underlying!\idxcode{wchar_t}}%
Type \tcode{wchar_t} is a distinct type that has
an \impldef{underlying type of \tcode{wchar_t}}
signed or unsigned integer type as its underlying type.
The values of type \tcode{wchar_t} can represent
distinct codes for all members of the largest extended character set
specified among the supported locales\iref{locale}.

\pnum
\indextext{\idxcode{char8_t}|see{type, \tcode{char8_t}}}%
\indextext{type!\idxcode{char8_t}}%
\indextext{type!underlying!\idxcode{char8_t}}%
Type \tcode{char8_t} denotes a distinct type
whose underlying type is \tcode{unsigned char}.
\indextext{\idxcode{char16_t}|see{type, \tcode{char16_t}}}%
\indextext{\idxcode{char32_t}|see{type, \tcode{char32_t}}}%
\indextext{type!\idxcode{char16_t}}%
Expand Down Expand Up @@ -4364,8 +4375,11 @@

\pnum
\indextext{type!integral}%
Types \tcode{bool}, \tcode{char}, \tcode{char16_t}, \tcode{char32_t},
\tcode{wchar_t}, and the signed and unsigned integer types are
Types
\tcode{bool},
\tcode{char}, \tcode{wchar_t},
\tcode{char8_t}, \tcode{char16_t}, \tcode{char32_t},
and the signed and unsigned integer types are
collectively called
\defnx{integral types}{integral type}.
A synonym for integral type is \defn{integer type}.
Expand Down Expand Up @@ -4737,7 +4751,7 @@
\indextext{type!\idxcode{wchar_t}}%
\indextext{type!\idxcode{char16_t}}%
\indextext{type!\idxcode{char32_t}}%
\item The ranks of \tcode{char16_t}, \tcode{char32_t}, and
\item The ranks of \tcode{char8_t}, \tcode{char16_t}, \tcode{char32_t}, and
\tcode{wchar_t} shall equal the ranks of their underlying
types\iref{basic.fundamental}.

Expand Down
70 changes: 68 additions & 2 deletions source/compatibility.tex
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,9 @@
The type of a string literal is changed
from ``array of \tcode{char}''
to ``array of \tcode{const char}''.
The type of a UTF-8 string literal is changed
from ``array of \tcode{char}''
to ``array of \tcode{const char8_t}''.
The type of a \tcode{char16_t} string literal is changed
from ``array of \textit{some-integer-type}''
to ``array of \tcode{const char16_t}''.
Expand Down Expand Up @@ -1796,9 +1799,11 @@
to introduce constraints through a \grammarterm{requires-clause} or
a \grammarterm{requires-expression}. The \tcode{concept} keyword is
added to enable the definition of concepts\iref{temp.concept}.
The \tcode{char8_t} keyword is added to differentiate
the types of ordinary and UTF-8 literals\iref{lex.string}.
\effect
Valid ISO \CppXVII{} code using \tcode{concept} or \tcode{requires}
as an identifier is not valid in this International Standard.
Valid ISO \CppXVII{} code using \tcode{concept}, \tcode{requires},
or \tcode{char8_t} as an identifier is not valid in this International Standard.

\diffref{lex.operators}
\change New operator \tcode{<=>}.
Expand All @@ -1815,6 +1820,34 @@
}
\end{codeblock}

\diffref{lex.literal}
\change Type of UTF-8 string and character literals.
\rationale Required for new features.
The changed types enable function overloading, template specialization, and
type deduction to distinguish ordinary and UTF-8 string and character literals.
\effect Valid ISO \CppXVII{} code that depends on
UTF-8 string literals having type ``array of \tcode{const char}'' and
UTF-8 character literals having type ``char''
is not valid in this International Standard.
\begin{codeblock}
const auto *u8s = u8"text"; // \tcode{u8s} previously deduced as \tcode{const char*}; now deduced as \tcode{const char8_t*}
const char *ps = u8s; // ill-formed; previously well-formed

auto u8c = u8'c'; // \tcode{u8c} previously deduced as \tcode{char}; now deduced as \tcode{char8_t}
char *pc = &u8c; // ill-formed; previously well-formed

std::string s = u8"text"; // ill-formed; previously well-formed

void f(const char *s);
f(u8"text"); // ill-formed; previously well-formed

template<typename> struct ct;
template<> struct ct<char> {
using type = char;
};
ct<decltype(u8'c')>::type x; // ill-formed; previously well-formed.
\end{codeblock}

\rSec2[diff.cpp17.basic]{\ref{basic}: basics}

\diffref{intro.races}
Expand Down Expand Up @@ -2031,6 +2064,39 @@
Translation units compiled against this version of \Cpp{} may be incompatible with
translation units compiled against \CppXVII{}, either failing to link or having undefined behavior.

\rSec2[diff.cpp17.input.output]{\ref{input.output}: input/output library}

\diffref{ostream.inserters.character}
\change
Overload resolution for ostream inserters used with UTF-8 literals.
\rationale
Required for new features.
\effect
Valid ISO \CppXVII{} code that passes UTF-8 literals
to \tcode{basic_ostream::\brk{}operator<<}
no longer calls character-related overloads.
\begin{codeblock}
std::cout << u8"text"; // previously called \tcode{operator<<(const char*)} and printed a string;
// now calls \tcode{operator<<(const void*)} and prints a pointer value
std::cout << u8'X'; // previously called \tcode{operator<<(char)} and printed a character;
// now calls \tcode{operator<<(int)} and prints an integer value
\end{codeblock}

\diffref{fs.class.path}
\change
Return type of filesystem path format observer member functions.
\rationale
Required for new features.
\effect
Valid ISO \CppXVII{} code that depends on the \tcode{u8string()} and
\tcode{generic_u8string()} member functions of \tcode{std::filesystem::path}
returning \tcode{std::string} is not valid in this International Standard.
\begin{codeblock}
std::filesystem::path p;
std::string s1 = p.u8string(); // ill-formed; previously well-formed
std::string s2 = p.generic_u8string(); // ill-formed; previously well-formed
\end{codeblock}

\rSec2[diff.cpp17.depr]{\ref{depr}: compatibility features}

\nodiffref
Expand Down
30 changes: 19 additions & 11 deletions source/declarations.tex
Original file line number Diff line number Diff line change
Expand Up @@ -1247,6 +1247,7 @@
nested-name-specifier \terminal{template} simple-template-id\br
\opt{nested-name-specifier} template-name\br
\terminal{char}\br
\terminal{char8_t}\br
\terminal{char16_t}\br
\terminal{char32_t}\br
\terminal{wchar_t}\br
Expand Down Expand Up @@ -1278,6 +1279,7 @@

\pnum
\indextext{type specifier!\idxcode{char}}%
\indextext{type specifier!\idxcode{char8_t}}%
\indextext{type specifier!\idxcode{char16_t}}%
\indextext{type specifier!\idxcode{char32_t}}%
\indextext{type specifier!\idxcode{wchar_t}}%
Expand Down Expand Up @@ -1326,6 +1328,7 @@
\tcode{char} & ``\tcode{char}'' \\
\tcode{unsigned char} & ``\tcode{unsigned char}'' \\
\tcode{signed char} & ``\tcode{signed char}'' \\
\tcode{char8_t} & ``\tcode{char8_t}'' \\
\tcode{char16_t} & ``\tcode{char16_t}'' \\
\tcode{char32_t} & ``\tcode{char32_t}'' \\
\tcode{bool} & ``\tcode{bool}'' \\
Expand Down Expand Up @@ -4170,41 +4173,41 @@
\begin{itemize}
\item
If an indeterminate value of
unsigned narrow character type\iref{basic.fundamental}
unsigned ordinary character type\iref{basic.fundamental}
or \tcode{std::byte} type\iref{cstddef.syn}
is produced by the evaluation of:
\begin{itemize}
\item the second or third operand of a conditional expression\iref{expr.cond},
\item the right operand of a comma expression\iref{expr.comma},
\item the operand of a cast or conversion~(\ref{conv.integral},
\ref{expr.type.conv}, \ref{expr.static.cast}, \ref{expr.cast}) to an
unsigned narrow character type
unsigned ordinary character type
or \tcode{std::byte} type\iref{cstddef.syn}, or
\item a discarded-value expression\iref{expr.prop},
\end{itemize}
then the result of the operation is an indeterminate value.

\item
If an indeterminate value of
unsigned narrow character type
unsigned ordinary character type
or \tcode{std::byte} type
is produced by the evaluation of the right
operand of a simple assignment operator\iref{expr.ass} whose first operand
is an lvalue of
unsigned narrow character type
unsigned ordinary character type
or \tcode{std::byte} type,
an indeterminate value replaces
the value of the object referred to by the left operand.

\item
If an indeterminate value of unsigned narrow character type is produced by the
If an indeterminate value of unsigned ordinary character type is produced by the
evaluation of the initialization expression when initializing an object of
unsigned narrow character type, that object is initialized to an indeterminate
unsigned ordinary character type, that object is initialized to an indeterminate
value.

\item
If an indeterminate value of
unsigned narrow character type
unsigned ordinary character type
or \tcode{std::byte} type
is produced by the
evaluation of the initialization expression when initializing an object of
Expand Down Expand Up @@ -4292,6 +4295,7 @@
If the destination type is a reference type, see~\ref{dcl.init.ref}.
\item
If the destination type is an array of characters,
an array of \tcode{char8_t},
an array of \tcode{char16_t},
an array of \tcode{char32_t},
or an array of
Expand Down Expand Up @@ -4986,13 +4990,17 @@
\indextext{initialization!character array}

\pnum
An array of narrow character type\iref{basic.fundamental},
An array of ordinary character type\iref{basic.fundamental},
\tcode{char8_t} array,
\tcode{char16_t} array,
\tcode{char32_t} array,
or \tcode{wchar_t} array
can be initialized by a
narrow string literal, \tcode{char16_t} string literal, \tcode{char32_t} string
literal, or wide string literal,
can be initialized by
an ordinary string literal,
\tcode{char8_t} string literal,
\tcode{char16_t} string literal,
\tcode{char32_t} string literal, or
wide string literal,
respectively, or by an appropriately-typed string literal enclosed in
braces\iref{lex.string}.
\indextext{initialization!character array}%
Expand Down
7 changes: 4 additions & 3 deletions source/expressions.tex
Original file line number Diff line number Diff line change
Expand Up @@ -1086,7 +1086,7 @@
converted to \tcode{float}.

\item Otherwise, the integral promotions\iref{conv.prom} shall be
performed on both operands.\footnote{As a consequence, operands of type \tcode{bool}, \tcode{char16_t},
performed on both operands.\footnote{As a consequence, operands of type \tcode{bool}, \tcode{char8_t}, \tcode{char16_t},
\tcode{char32_t}, \tcode{wchar_t}, or an enumerated type are converted
to some integral type.}
Then the following rules shall be applied to the promoted operands:
Expand Down Expand Up @@ -4267,8 +4267,9 @@
has function or incomplete type,
to the parenthesized name of such
types, or to a glvalue that designates a bit-field.
\tcode{sizeof(char)}, \tcode{sizeof(signed char)} and
\tcode{sizeof(unsigned char)} are \tcode{1}. The result of
The result of \tcode{sizeof}
applied to any of the narrow character types is \tcode{1}.
The result of
\tcode{sizeof} applied to any other fundamental
type\iref{basic.fundamental} is \impldef{\tcode{sizeof} applied to
fundamental types
Expand Down
Loading