Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion lib/std/fmt.zig
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ pub const FormatOptions = struct {
/// - *specifier* is a type-dependent formatting option that determines how a type should formatted (see below)
/// - *fill* is a single character which is used to pad the formatted text
/// - *alignment* is one of the three characters `<`, `^`, or `>` to make the text left-, center-, or right-aligned, respectively
/// - *width* is the total width of the field in characters
/// - *width* is the total width of the field in "characters" (unicode codepoints)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// - *width* is the total width of the field in "characters" (unicode codepoints)
/// - *width* is the total width of the field in bytes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, a change just went in to allow any unicode codepoint to be used for the fill "character" ( 279607c ) is that wrong too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That one is a bit tricky. It's a counter-intuitive UI but it's technically OK since the implementation does not need to be Unicode-aware to use an arbitrary sequence of bytes as a fill character. It may as well be fill_bytes: []const u8 and the implementation assumes that all those bytes are to be treated as one width unit. However, it's not worth having that field be a reference to external memory, so having it be a fixed size integer is worth the limitation. It's similar rational to Zig's character literals, which are comptime_int and support any single Unicode codepoint, but do not for example support 👨‍👩‍👧‍👦 which is 4 codepoints joined with 3 Zero Width Join codepoints, because the purpose of a character literal is to be an integer.

This kind of unfortunate complexity (the fact that there is not a single integer corresponding to every Unicode character) is one reason I have no intention for Zig to depend on the large amount of volatile data needed to keep up with Unicode.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Points up how the term "character" is too ambiguous -- Unicode itself doesn't define it for good reason. The example of 👨‍👩‍👧‍👦 is what's more technically termed a grapheme cluster (that this looks like a single "character" here is entirely dependent on the font and the display context (web browser).
The zig term "character literal" trips some people up, because it's actually a "Unicode code point" literal. it would be nice to transitions discussion and the docs to use this term, even if it departs from the "C" terminology. Lots of folks wish for a "character cell" model for text formatting, but this always falls apart in the face of combining characters, worldwide text, fonts, and rendering technology. These is well beyond the scope of the standard library. What's most often of concern when writing format-to-buffer is the storage for the data, so stick to bytes for sizes and return values that give you resulting sizes of things. The fill quantity perhaps should not be bytes or characters, but a count of repetitions of the fill codepoint. Even if you have a Unicode character database, that is not sufficient in general for text layout. Counts of Unicode codepoints are in general not useful, and tends to encourage the wrong mental model of worldwide (Unicode) text.

Andrew I think has drawn just the right lines of compromise for fmt functionality.

/// - *precision* specifies how many decimals a formatted number should have
///
/// Note that most of the parameters are optional and may be omitted. Also you can leave out separators like `:` and `.` when
Expand Down