Skip to content

Improve code generation of zig.fmt #9635

@N00byEdge

Description

@N00byEdge

Today the code generated by zig.fmt is far from desirable.

Here is a sketch that I made up in a afternoon. Of course it's not a complete implementation, so it can't print everything yet, and most importantly it doesn't handle a runtime known writer properly yet, but it shows what the main issues are.

In my opinion, doing log("Hello, {d}!", .{asm("rdrand %[reg]" : [reg] "=r" (->u64))}); should be entirely inlined, and instead just call a couple of functions, and be equivalent to the following code (and this is what I achieved)

    call get_some_log_lock
    push rax
    lea rdi, "Hello, "
    call print_string
    rdrand rdi
    call print_runtime_decimal_value
    lea rdi, "!\n"
    call print_string
    pop rdi
    call release_log_lock

Today, you get the worst of both worlds. You get one function for each call site, which parses the string at runtime, causing a lot of code bloat. Let's get some numbers on this. Replacing the std.fmt usage in the Florence OS kernel resulted in the following change:

$ bloaty --debug-vmaddr=0xF zig-out/bin/Flork_stivale2_x86_64_newlog -- zig-out/bin/Flork_stivale2_x86_64
    FILE SIZE        VM SIZE
 --------------  --------------
  +0.1%     +14  [ = ]       0    .debug_abbrev
   +22%      +4  [ = ]       0    [Unmapped]
  [ = ]       0  [DEL]      -8    [LOAD #2 [RW]]
  [ = ]       0  -0.0%     -24    .bss
  -2.3% -1.64Ki  [ = ]       0    .debug_pubnames
 -10.0% -3.03Ki  [ = ]       0    .debug_frame
 -12.6% -5.08Ki  [ = ]       0    .debug_pubtypes
  -5.1% -6.32Ki  [ = ]       0    .debug_str
 -68.4% -7.01Ki -68.8% -7.01Ki    .data
 -10.8% -7.68Ki -10.8% -7.68Ki    .rodata
 -24.3% -11.6Ki  [ = ]       0    .strtab
 -43.1% -17.1Ki  [ = ]       0    .symtab
 -17.4% -35.3Ki  [ = ]       0    .debug_line
 -25.9% -38.7Ki  [ = ]       0    .debug_ranges
 -21.7% -72.8Ki -21.7% -72.8Ki    .text
 -24.3%  -133Ki  [ = ]       0    .debug_info
 -23.7%  -136Ki  [ = ]       0    .debug_loc
 -21.1%  -476Ki  -0.5% -87.5Ki    TOTAL

As we can see, over 20% of the entire kernel executable was just unnecessary code generated by std.fmt, and the same goes for the executable in large, half a meg just disappeared.

I think the main issues with std.fmt today is that not enough functions are callconv(.Inline) nor noinline. I achieved this by going the extra mile to say that everything that takes comptime fmt: []const u8 is callconv(.Inline), and anything that doesn't is noinline. Just doing this gets you very far. That way the format string code generation is as specialized to the format string as possible, and it instead tries to break it down into elementary components.

I also took extra caution to always append comptime known values to the format string itself and, by doing that, combine as many value writes as possible.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementSolving this issue will likely involve adding new logic or components to the codebase.optimizationstandard libraryThis issue involves writing Zig code for the standard library.

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions