-
-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Description
I have had some ideas for std.fmt for a while now but I've been having trouble figuring out how to present them as concrete proposals. The following is an attempt at summarizing a few of them:
Summary of current problems
- Numeric formatting options are too limited and conflated with generic alignment options.
- It is difficult for users to implement custom formatters correctly (in fact, most of the formatters in
stdfail at this task). - Rarely used features like named placeholder options increase the overall complexity of
std.fmt. std.fmtshould deal in raw bytes only but some parts of it currently deal in Unicode scalar values/UTF-8 sequences.
Summary of proposed solutions
- Break out numeric formatting options from generic alignment options and instead make them part of relevant base specifiers like
dore. - Add more numeric formatting options such as for controlling positive signs, zero-padding or
0xprefixes (among other), to make it easier to format numbers correctly for tasks such as pretty-printing or code generation. - Remove the
options: std.fmt.FormatOptionsparameter from customformatformatter functions and instead process alignment generically, separately from formatting through clever use ofstd.io.countingWriter, to make it much easier for users to correctly implement custom formatters. - Remove named placeholder options to reduce complexity (this use case is better handled by custom formatters).
- Remove the
uspecifier (better handled by a formatter from thestd.unicodenamespace), clarify that thesandcspecifiers output bytes verbatim, redefinefillto be a literal byte and redefinewidthto be in bytes.
In more detail
Numeric formatting options are too limited
Currently, the only available numeric formatting option is precision, which controls the precision of and number of digits in the fractional part of a floating-point number. We can quickly think of a few other properties a user might want to control when formatting a number:
-
The minimum number of digits the number should be zero-padded to.
There is currently no way to correctly pad a number with leading zeroes such that the sign is written in the correct place. Code like
std.debug.print("{:0>5}\n", .{-123})prints0-123instead of the expected-0123or-00123. -
Whether a positive number should have a plus sign.
It is currently possible to format a positive integer with a plus sign, but how to do this is very obscure: You need to specify a
widthof 1 or greater, and the integer type must be signed.std.debug.print("{d:1}\n", .{@as(i32, 123)})prints+123. Unsigned integers and floating-point numbers, however, can currently not be formatted with a plus sign. -
Whether a hexadecimal number should be prefixed with
0x.It is currently not possible to format hexadecimal (or binary or octal) numbers with a prefix. In some situations you may be able to get around this by using a placeholder string like
0x{x}, but this will not work for negative values (-77would produce0x-4d) or when left-padding is involved.
There may be other options worth considering, but this should hopefully demonstrate that just precision is probably not enough and that width is not a suitable substitute for zero-padding.
Which leads us to the next point...
Numeric formatting options are conflated with alignment options
The four std.fmt.FormatOptions are fill, alignment, width and precision. This is a bit strange, because the first three deal with alignment and are always applicable no matter which type of value is being formatted, while precision is only relevant for numbers.
Numeric formatting (positive sign, leading zeroes, etc.) is a different concern from alignment; "zero-pad a number to a minimum number of digits" is different from "right-align a string to a minimum width by left-padding with the character 0" and there is currently no way for users to simultaneously zero-pad and right-align a number.
It is also a bit funny that a nonsensical precision option used with a non-numeric specifier like in {s:.3} is not an error.
I think it would make sense to break out precision and other future numeric formatting options from the generic alignment options specified after the : and instead make them part of the base specifiers (e.g. d or e) themselves. In other words, today's placeholder {d: >10.3} might become {d.3: >10}.
This also opens up the door for specifier-specific options; for example, an option specifying whether to prefix the number with 0x makes sense for x (similarly for b or o) but not for d and should be a compile error for the latter.
With std.fmt.FormatOptions reduced to only the three alignment-related options, we can move on to the third point...
It is difficult to implement custom format formatter functions correctly
Custom format formatter function currently have the following signature:
pub fn format(
self: ?,
comptime fmt: []const u8,
options: std.fmt.FormatOptions,
writer: anytype,
) !voidThe options parameter of type std.fmt.FormatOptions specifies the fill character, alignment, minimum width and numeric precision, corresponding to the options passed after the colon in the placeholder string. {:_>9.3} is parsed as .{ .fill = '_', .alignment = .right, .width = 9, .precision = 3 }.
The problem is, most custom formatters (both in std and in external packages) completely ignore these options:
std.debug.print("{s:_>20}\n", .{"hello"});
std.debug.print("{:_>20}\n", .{std.fmt.fmtSliceHexLower("hello")});
std.debug.print("{:_>20}\n", .{std.SemanticVersion.parse("1.2.3") catch unreachable});_______________hello
68656c6c6f
1.2.3One could argue that the onus is on the custom formatters to correctly implement padding and that it is a bug that formatters like fmtSliceHexLower or SemanticVersion.format don't handle padding.
I will instead point out that padding could be trivially handled in the main std.fmt.format function, without burdening custom formatters with the task of implementing it, simply by writing in two passes; first to a std.io.countingWriter(std.io.null_writer) to determine the width of the unpadded string, then again to the real writer, padding the difference on either side as needed. Left-alignment only requires a single pass to a std.io.countingWriter(writer).
With fill, alignment and width handled generically, the remaining option would be precision. But with that one removed by the above sub-proposal, we are left with no options and can remove the std.fmt.FormatOptions parameter, simplify the format signature to
pub fn format(
self: ?,
comptime fmt: []const u8,
writer: anytype,
) !voidwhich makes it much easier for users to implement correctly.
(As a side note, the fmt argument here should really be renamed specifier or spec so that it doesn't get mixed up with the fmt string itself.)
Remove named placeholder options
Did you know that the following is possible?
var width: usize = 10;
var precision: usize = 3;
std.debug.print("{d:_>[1].[2]}\n", .{ @as(f32, 1.23456789), width, precision });That's correct; certain placeholder options like width and precision don't have to be specified literally but can also be resolved at runtime by specifying the name of a field of args.
This is a fairly obscure feature which increases the overall complexity of std.fmt. It is also limited to only width and precision; other options like fill or alignment must be specified literally and can not be resolved at runtime.
Instead of putting all of this complexity in the parsing and handling of the placeholder string itself, runtime control of formatting options is probably better handled by custom formatters, which are not only more flexible but also make the intent of such code more immediately visible and explicit to readers. To help users with the task of runtime-controlled aligned formatting, the std.fmt namespace could expose a formatter function for this purpose.
Remove any notion of Unicode-awareness from std.fmt
(Related: #18536 (comment), 2d9c479, #234)
Simple: std.fmt should not be Unicode-aware and should deal in raw bytes only, for simplicity. Therefore,
- the
u"formatu21as UTF-8 sequence" specifier should be removed (better handled by a formatter fromstd.unicode), - the
sandcspecifiers should clarify that they output (sequences of) bytes verbatim, without any sort of replacement or transformation, - the
widthplaceholder option should clarify that it controls the minimum width in bytes (not code points, grapheme clusters or some other unit of measure), and - the
fillplaceholder option should clarify that it is a literal byte repeated verbatim to pad out the string.
Applications that need powerful Unicode-aware formatting should use a different third-party package.
Other considerations
std.fmt currently generates a lot of code which is undesirable and can be problematic for constrained embedded targets. These problems are described in great detail in #9635. It's important that the above suggestions, if applied, do not negatively affect code size, compile times or runtime performance.