-
-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Description
Today the code generated by zig.fmt is far from desirable.
Here is a sketch that I made up in a afternoon. Of course it's not a complete implementation, so it can't print everything yet, and most importantly it doesn't handle a runtime known writer properly yet, but it shows what the main issues are.
In my opinion, doing log("Hello, {d}!", .{asm("rdrand %[reg]" : [reg] "=r" (->u64))}); should be entirely inlined, and instead just call a couple of functions, and be equivalent to the following code (and this is what I achieved)
call get_some_log_lock
push rax
lea rdi, "Hello, "
call print_string
rdrand rdi
call print_runtime_decimal_value
lea rdi, "!\n"
call print_string
pop rdi
call release_log_lockToday, you get the worst of both worlds. You get one function for each call site, which parses the string at runtime, causing a lot of code bloat. Let's get some numbers on this. Replacing the std.fmt usage in the Florence OS kernel resulted in the following change:
$ bloaty --debug-vmaddr=0xF zig-out/bin/Flork_stivale2_x86_64_newlog -- zig-out/bin/Flork_stivale2_x86_64
FILE SIZE VM SIZE
-------------- --------------
+0.1% +14 [ = ] 0 .debug_abbrev
+22% +4 [ = ] 0 [Unmapped]
[ = ] 0 [DEL] -8 [LOAD #2 [RW]]
[ = ] 0 -0.0% -24 .bss
-2.3% -1.64Ki [ = ] 0 .debug_pubnames
-10.0% -3.03Ki [ = ] 0 .debug_frame
-12.6% -5.08Ki [ = ] 0 .debug_pubtypes
-5.1% -6.32Ki [ = ] 0 .debug_str
-68.4% -7.01Ki -68.8% -7.01Ki .data
-10.8% -7.68Ki -10.8% -7.68Ki .rodata
-24.3% -11.6Ki [ = ] 0 .strtab
-43.1% -17.1Ki [ = ] 0 .symtab
-17.4% -35.3Ki [ = ] 0 .debug_line
-25.9% -38.7Ki [ = ] 0 .debug_ranges
-21.7% -72.8Ki -21.7% -72.8Ki .text
-24.3% -133Ki [ = ] 0 .debug_info
-23.7% -136Ki [ = ] 0 .debug_loc
-21.1% -476Ki -0.5% -87.5Ki TOTAL
As we can see, over 20% of the entire kernel executable was just unnecessary code generated by std.fmt, and the same goes for the executable in large, half a meg just disappeared.
I think the main issues with std.fmt today is that not enough functions are callconv(.Inline) nor noinline. I achieved this by going the extra mile to say that everything that takes comptime fmt: []const u8 is callconv(.Inline), and anything that doesn't is noinline. Just doing this gets you very far. That way the format string code generation is as specialized to the format string as possible, and it instead tries to break it down into elementary components.
I also took extra caution to always append comptime known values to the format string itself and, by doing that, combine as many value writes as possible.