std.fmt: Improve numeric options, simplify custom formatters, reduce complexity and more

I have had some ideas for `std.fmt` for a while now but I've been having trouble figuring out how to present them as concrete proposals. The following is an attempt at summarizing a few of them:

## Summary of current problems

- Numeric formatting options are too limited and conflated with generic alignment options.
- It is difficult for users to implement custom formatters correctly (in fact, most of the formatters in `std` fail at this task).
- Rarely used features like named placeholder options increase the overall complexity of `std.fmt`.
- `std.fmt` should deal in raw bytes only but some parts of it currently deal in Unicode scalar values/UTF-8 sequences.

## Summary of proposed solutions

- Break out numeric formatting options from generic alignment options and instead make them part of relevant base specifiers like `d` or `e`.
- Add more numeric formatting options such as for controlling positive signs, zero-padding or `0x` prefixes (among other), to make it easier to format numbers correctly for tasks such as pretty-printing or code generation.
- Remove the `options: std.fmt.FormatOptions` parameter from custom `format` formatter functions and instead process alignment generically, separately from formatting through clever use of `std.io.countingWriter`, to make it much easier for users to correctly implement custom formatters.
- Remove named placeholder options to reduce complexity (this use case is better handled by custom formatters).
- Remove the `u` specifier (better handled by a formatter from the `std.unicode` namespace), clarify that the `s` and `c` specifiers output bytes verbatim, redefine `fill` to be a literal byte and redefine `width` to be in bytes.

## In more detail

### Numeric formatting options are too limited

(Related: #14436, #19488)

Currently, the only available numeric formatting option is `precision`, which controls the precision of and number of digits in the fractional part of a floating-point number. We can quickly think of a few other properties a user might want to control when formatting a number:

- The minimum number of digits the number should be zero-padded to.

  There is currently no way to correctly pad a number with leading zeroes such that the sign is written in the correct place. Code like `std.debug.print("{:0>5}\n", .{-123})` prints `0-123` instead of the expected `-0123` or `-00123`.
- Whether a positive number should have a plus sign.

  It is currently possible to format a positive integer with a plus sign, but how to do this is very obscure: You need to specify a `width` of 1 or greater, and the integer type must be signed. `std.debug.print("{d:1}\n", .{@as(i32, 123)})` prints `+123`. Unsigned integers and floating-point numbers, however, can currently not be formatted with a plus sign.
- Whether a hexadecimal number should be prefixed with `0x`.

  It is currently not possible to format hexadecimal (or binary or octal) numbers with a prefix. In some situations you may be able to get around this by using a placeholder string like `0x{x}`, but this will not work for negative values (`-77` would produce `0x-4d`) or when left-padding is involved.

There may be other options worth considering, but this should hopefully demonstrate that just `precision` is probably not enough and that `width` is not a suitable substitute for zero-padding.

Which leads us to the next point...

### Numeric formatting options are conflated with alignment options

The four `std.fmt.FormatOptions` are `fill`, `alignment`, `width` and `precision`. This is a bit strange, because the first three deal with alignment and are always applicable no matter which type of value is being formatted, while `precision` is only relevant for numbers.

Numeric formatting (positive sign, leading zeroes, etc.) is a different concern from alignment; "zero-pad a number to a minimum number of digits" is different from "right-align a string to a minimum width by left-padding with the character `0`" and there is currently no way for users to simultaneously zero-pad and right-align a number.

It is also a bit funny that a nonsensical `precision` option used with a non-numeric specifier like in `{s:.3}` is not an error.

I think it would make sense to break out `precision` and other future numeric formatting options from the generic alignment options specified after the `:` and instead make them part of the base specifiers (e.g. `d` or `e`) themselves. In other words, today's placeholder `{d: >10.3}` might become `{d.3: >10}`.

This also opens up the door for specifier-specific options; for example, an option specifying whether to prefix the number with `0x` makes sense for `x` (similarly for `b` or `o`) but not for `d` and should be a compile error for the latter.

With `std.fmt.FormatOptions` reduced to only the three alignment-related options, we can move on to the third point...

### It is difficult to implement custom `format` formatter functions correctly

Custom `format` formatter function currently have the following signature:

```zig
pub fn format(
    self: ?,
    comptime fmt: []const u8,
    options: std.fmt.FormatOptions,
    writer: anytype,
) !void
```

The `options` parameter of type `std.fmt.FormatOptions` specifies the fill character, alignment, minimum width and numeric precision, corresponding to the options passed after the colon in the placeholder string. `{:_>9.3}` is parsed as `.{ .fill = '_', .alignment = .right, .width = 9, .precision = 3 }`.

The problem is, most custom formatters (both in `std` and in external packages) completely ignore these options:

```zig
std.debug.print("{s:_>20}\n", .{"hello"});
std.debug.print("{:_>20}\n", .{std.fmt.fmtSliceHexLower("hello")});
std.debug.print("{:_>20}\n", .{std.SemanticVersion.parse("1.2.3") catch unreachable});
```

```txt
_______________hello
68656c6c6f
1.2.3
```

One could argue that the onus is on the custom formatters to correctly implement padding and that it is a bug that formatters like `fmtSliceHexLower` or `SemanticVersion.format` don't handle padding.

I will instead point out that padding could be trivially handled in the main `std.fmt.format` function, without burdening custom formatters with the task of implementing it, simply by writing in two passes; first to a `std.io.countingWriter(std.io.null_writer)` to determine the width of the unpadded string, then again to the real writer, padding the difference on either side as needed. Left-alignment only requires a single pass to a `std.io.countingWriter(writer)`.

With `fill`, `alignment` and `width` handled generically, the remaining option would be `precision`. But with that one removed by the above sub-proposal, we are left with no options and can remove the `std.fmt.FormatOptions` parameter, simplify the `format` signature to

```zig
pub fn format(
    self: ?,
    comptime fmt: []const u8,
    writer: anytype,
) !void
```

which makes it *much* easier for users to implement correctly.

(As a side note, the `fmt` argument here should really be renamed `specifier` or `spec` so that it doesn't get mixed up with the `fmt` string itself.)

### Remove named placeholder options

Did you know that the following is possible?

```zig
var width: usize = 10;
var precision: usize = 3;
std.debug.print("{d:_>[1].[2]}\n", .{ @as(f32, 1.23456789), width, precision });
```

That's correct; certain placeholder options like `width` and `precision` don't have to be specified literally but can also be resolved at runtime by specifying the name of a field of `args`.

This is a fairly obscure feature which increases the overall complexity of `std.fmt`. It is also limited to only `width` and `precision`; other options like `fill` or `alignment` must be specified literally and can not be resolved at runtime.

Instead of putting all of this complexity in the parsing and handling of the placeholder string itself, runtime control of formatting options is probably better handled by custom formatters, which are not only more flexible but also make the intent of such code more immediately visible and explicit to readers. To help users with the task of runtime-controlled aligned formatting, the `std.fmt` namespace could expose a formatter function for this purpose.

### Remove any notion of Unicode-awareness from `std.fmt`

(Related: https://github.com/ziglang/zig/pull/18536#issuecomment-1891098784, 2d9c4792ae2cab0ff3b1df54b15913dcbcaef112, #234)

Simple: `std.fmt` should not be Unicode-aware and should deal in raw bytes only, for simplicity. Therefore,

- the `u` "format `u21` as UTF-8 sequence" specifier should be removed (better handled by a formatter from `std.unicode`),
- the `s` and `c` specifiers should clarify that they output (sequences of) bytes verbatim, without any sort of replacement or transformation,
- the `width` placeholder option should clarify that it controls the minimum width in bytes (not code points, grapheme clusters or some other unit of measure), and
- the `fill` placeholder option should clarify that it is a literal byte repeated verbatim to pad out the string.

Applications that need powerful Unicode-aware formatting should use a different third-party package.

## Other considerations

`std.fmt` currently generates a lot of code which is undesirable and can be problematic for constrained embedded targets. These problems are described in great detail in #9635. It's important that the above suggestions, if applied, do not negatively affect code size, compile times or runtime performance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

std.fmt: Improve numeric options, simplify custom formatters, reduce complexity and more #20152

Summary of current problems

Summary of proposed solutions

In more detail

Numeric formatting options are too limited

Numeric formatting options are conflated with alignment options

It is difficult to implement custom `format` formatter functions correctly

Remove named placeholder options

Remove any notion of Unicode-awareness from `std.fmt`

Other considerations

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

std.fmt: Improve numeric options, simplify custom formatters, reduce complexity and more #20152

Description

Summary of current problems

Summary of proposed solutions

In more detail

Numeric formatting options are too limited

Numeric formatting options are conflated with alignment options

It is difficult to implement custom format formatter functions correctly

Remove named placeholder options

Remove any notion of Unicode-awareness from std.fmt

Other considerations

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

It is difficult to implement custom `format` formatter functions correctly

Remove any notion of Unicode-awareness from `std.fmt`