Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions lib/std/fmt.zig
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,7 @@ fn peekIsAlign(comptime fmt: []const u8) bool {
/// - `d`: output numeric value in decimal notation
/// - `b`: output integer value in binary notation
/// - `c`: output integer as an ASCII character. Integer type must have 8 bits at max.
/// - `u`: output integer as an UTF-8 sequence. Integer type must have 21 bits at max.
/// - `*`: output the address of the value instead of the value itself.
///
/// If a formatted user type contains a function of the type
Expand Down Expand Up @@ -520,6 +521,12 @@ pub fn formatIntValue(
} else {
@compileError("Cannot print integer that is larger than 8 bits as a ascii");
}
} else if (comptime std.mem.eql(u8, fmt, "u")) {
if (@TypeOf(int_value).bit_count <= 21) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does a comptime int pass this check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy-pasted from c modifier.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do want this to handle comptime_int please. The code for the c modifier is also incorrect if it cannot handle them. Until recently, due to limitations of the zig language, comptime_ints could not be passed to formatted printing. However that is now fixed, and comptime ints can be passed.

I would be OK with a non-explicit @compileError call here, allowing the @as to do the job. That would be my personal strategy to handle comptime_int correctly. Note also that the @as is redundant. Every parameter always gets type coerced to the parameter type.

return formatUtf8Codepoint(@as(u21, int_value), options, context, Errors, output);
} else {
@compileError("Cannot print integer that is larger than 21 bits as an UTF-8 sequence");
}
} else if (comptime std.mem.eql(u8, fmt, "b")) {
radix = 2;
uppercase = false;
Expand Down Expand Up @@ -587,6 +594,18 @@ pub fn formatAsciiChar(
return format(context, Errors, output, "\\x{x:0<2}", .{c});
}

pub fn formatUtf8Codepoint(
c: u21,
options: FormatOptions,
context: var,
comptime Errors: type,
output: fn (@TypeOf(context), []const u8) Errors!void,
) Errors!void {
var buf: [4]u8 = undefined;
const len = std.unicode.utf8Encode(c, buf[0..]) catch unreachable;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this catch unreachable? what if someone e.g. passed in 0x1fffff

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

attempt to unwrap error: CodepointTooLarge

Personally I'd prefer to have utf8EncodeUnsafe with 0xFFFD replacement.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK with this function asserting that the codepoint fits, with a bit more defensive coding:

  • Document the assertion in doc comments of the function
  • Rather than catch unreachable, catch |err| switch (err) and then list the error(s) explicitly that are unreachable. This will cause compile errors if new errors are added to the set.

return output(context, @as(*const [4]u8, &buf)[0..len]);
}

pub fn formatBuf(
buf: []const u8,
options: FormatOptions,
Expand Down Expand Up @@ -1207,6 +1226,14 @@ test "int.specifier" {
const value: u8 = 'a';
try testFmt("u8: a\n", "u8: {c}\n", .{value});
}
{
const value: u8 = 'a';
try testFmt("UTF-8: a\n", "UTF-8: {u}\n", .{value});
}
{
const value: u21 = 0x1F310;
try testFmt("UTF-8: 🌐\n", "UTF-8: {u}\n", .{value});
}
{
const value: u8 = 0b1100;
try testFmt("u8: 0b1100\n", "u8: 0b{b}\n", .{value});
Expand Down