-
-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Introduce std.fmt.scan - Draft #7158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- similar to scanf in c but with curly specifiers. - initial support for c, d, e, x, o, b, s specifiers.
|
Crazy idea (thanks to new const myStruct = struct {
u: u32,
i: i32
};will transformed to struct {
u: ?u32,
i: ?i32
};And parse to this new type. |
|
I like this idea. So this would look like the following and return an instance of the given type? try scan("42", "{}", struct{a: u32})Only problem would be parsing strings / slices. This would require allocating memory. Maybe thats the right direction to go? |
Why? |
|
Oh yeah. I guess not. A slice would just point to the input. This would also allow setting the length of the slice. Currently the length of the parsed substring is lost. It would be easy to check for a |
struct {
u: ?u32,
i: ?i32
};Not sure this is necessary, Just return a regular instance of the struct on success and error otherwise. No need for nullables I think. |
|
I also suggest thinking in advance about more flexible patterns. |
And don't return the size? |
|
It looks I misread scanf's return value. Rather than the number of bytes matched, its actually the number of items matched. Maybe returning a new struct type would be better: struct {
items_matched: usize,
items: T
}This would indicate that the first Not sure if this is any better than using the nullable fields suggestion... |
|
after a little thought, nullable fields seem like a better option to me. this would also allow for named specifiers in addition to in-order field assignment. maybe an |
lib/std/fmt/scan.zig
Outdated
| inline for (fmt) |fmt_c, fmt_i| { | ||
| var neg = false; | ||
| if (i == std.math.maxInt(usize)) return error.InputMatchFailure; | ||
| switch (state) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rewrote the parser in #6411, perhaps we can extract it and use in both the print and scan features.
The idea is to support the width & precision specifiers to limit the amount of scanned input, otherwise the user has no control over how much each specifier will munch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the reviews. This seems like the best way forward to me, extracting and re-using the existing parser. I'll take a closer look at that pr. Let me know if you have an idea of what this might look like once its extracted. A struct with method fields? Or maybe just fmt.format() but with method argument? I hope to work more on this tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it depends on what the specification for scan is. I doubt the alignment and fill specifiers are needed and the only place where I can see a width specifier to be useful is with the c specifier.
If you can live with this reduced set of options I'd simply extract the parser (and arg_state) struct and all you have to do is copy-paste the inline while in format and modify it to suit your needs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just pushed a commit that addresses many of these points. Haven't handled precision yet, but initial width support is there. There are many TODOs in the code. For one, I wasn't sure what to do if {c} has a width or precision.
lib/std/fmt/scan.zig
Outdated
| '}' => if (fmt_i > 0 and fmt[fmt_i - 1] == '}') { | ||
| if (input[i] == '}') { | ||
| i += 1; | ||
| } else i = std.math.maxInt(usize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The string is being parsed at compile time, you can simply stop the compilation with a @compileError telling the user what's wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto for all the error cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed in new commit
lib/std/fmt/scan.zig
Outdated
| i += 1; | ||
| } else i = std.math.maxInt(usize); | ||
| }, | ||
| else => if (std.ascii.isSpace(fmt_c)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably need a comptime prefix, the isSpace call should not end up in the final binary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed in last commit
lib/std/fmt/scan.zig
Outdated
| } | ||
|
|
||
| if (i < input.len) { | ||
| if (input[i] == '-') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're eating the trailing -/+ even if the argument is not a number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think the last commit solves this. still need to add a test for this.
lib/std/fmt/scan.zig
Outdated
| var arg = args[arg_i]; | ||
| switch (fmt_c) { | ||
| '}', 'd' => { | ||
| if (i < input.len) i += try parseInt(input[i..], 10, std.ascii.isDigit, arg, neg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should split the scanning from the parsing, the latter is done by fmt.parseInt so you only have to do the former here.
Write some small helper routines such as scanSigned, scanUnsigned, scanFloat (remember of "weird" values such as NaN, infinite and their different casing/abbreviations), scanBool to return a slice of text to parse and update the current input position.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still needs work as i'm still mixing the scanning and parsing in new commit (look for is_negative).
lib/std/fmt/scan.zig
Outdated
| 's' => { | ||
| var si: usize = 0; | ||
| while (i < input.len) { | ||
| if (std.ascii.isSpace(input[i]) or (fmt_i + 2 < fmt.len and input[i] == fmt[fmt_i + 2])) break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's input[i] == fmt[fmt_i + 2] checking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i removed this behavior in new commit. it was allowing for a fmt like "{s}[{s}]" (with no spaces) to work. without this, everything (including square brackets) until a space will be gobbled up.
lib/std/fmt/scan.zig
Outdated
| if (arg_i >= args.len) return error.TooFewArguments; | ||
| var arg = args[arg_i]; | ||
| switch (fmt_c) { | ||
| '}', 'd' => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each specifier must check that arg is a pointer to the correct type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i also believe this is happening now.
lib/std/fmt/scan.zig
Outdated
| var si: usize = 0; | ||
| while (i < input.len) { | ||
| if (std.ascii.isSpace(input[i]) or (fmt_i + 2 < fmt.len and input[i] == fmt[fmt_i + 2])) break; | ||
| arg[si] = input[i]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, s is returning a copy of the string? Not a slice of the input?
You're not checking arg's type nor its length, this is dangerous either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
attempted to address in new commit
lib/std/fmt/scan.zig
Outdated
| }, | ||
| } | ||
| } | ||
| if (arg_i < args.len) return error.TooManyArguments; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of this can be statically checked at compile time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed in new commit
|
I don't think transforming the types to nullables is the better option here. If the programmer wants to scan the string for a struct, I would assume they want the struct to be returned. If they had an assortment of options that could optionally be set, shouldn't those fields be optional in the original struct to begin with? |
- ArgsState now a type function with an args_len parameter
|
My advice here is to start from the documentation, write down a few lines about what specifiers are allowed, what the precision/width modifiers do, how they behave, how are literal pieces of the format string matched, how are the whitespaces treated (and what you consider as whitespace).
Well I'd say |
|
I think this commit introduced a spin-loop somewhere. Locally $ zig build test-std -Dskip-non-native
# ...
# many successes/ skips
# ...
Test [637/1619] event.loop.test "std-native-ReleaseFast-bare-multi std.event.Loop - sleep"...SKIP
Test [658/1619] fmt.scan.test "std-native-ReleaseFast-bare-multi scan basic"... ^C
Output hangs here before i killed it. I don't understand what the multiple sets of tests are, here's another output with test filter just for my scan tests $ zig build test-std -Dskip-non-native -Dtest-filter=scan
All 8 tests passed.
All 8 tests passed.
All 8 tests passed.
Test [2/8] fmt.scan.test "std-native-ReleaseFast-bare-multi scan basic"... (this spin-loops too) |
|
Ok, it looks like this commit is spin-looping in release-fast and release-small modes. $ zig build test-std -Dskip-non-native=true -Dtest-filter=scan -Dskip-release-fast -Dskip-release-small
All 8 tests passed.
All 8 tests passed.
All 8 tests passed.
All 8 tests passed.
All 8 tests passed.
All 8 tests passed.
# no hangI have no idea why... any ideas? |
|
Here's a backtrace from gdb: Program received signal SIGINT, Interrupt.
0x000000000020f0a0 in io.fixed_buffer_stream.FixedBufferStream([]const u8).read (self=0x7fffffffddb0, dest=...)
at /media/data/Users/Travis/Documents/Code/zig/zig/lib/std/io/fixed_buffer_stream.zig:67
67 const size = std.math.min(dest.len, self.buffer.len - self.pos);
(gdb) bt
#0 0x000000000020f0a0 in io.fixed_buffer_stream.FixedBufferStream([]const u8).read (self=0x7fffffffddb0, dest=...)
at /media/data/Users/Travis/Documents/Code/zig/zig/lib/std/io/fixed_buffer_stream.zig:67
#1 io.reader.Reader(*io.fixed_buffer_stream.FixedBufferStream([]const u8),io.fixed_buffer_stream.ReadError,io.fixed_buffer_stream.FixedBufferStream([]const u8).read).read
(self=..., buffer=...) at /media/data/Users/Travis/Documents/Code/zig/zig/lib/std/io/reader.zig:32
#2 io.peek_stream.PeekStream(fifo.LinearFifoBufferType { .Static = 4},io.reader.Reader(*io.fixed_buffer_stream.FixedBufferStream([]const u8),io.fixed_buffer_stream.ReadError,io.fixed_buffer_stream.FixedBufferStream([]const u8).read)).read (self=0x7fffffffdd78, dest=...)
at /media/data/Users/Travis/Documents/Code/zig/zig/lib/std/io/peek_stream.zig:71
#3 fmt.readByte (reader=0x7fffffffdd78) at fmt.zig:1983
#4 fmt.matchByteFn (reader=0x7fffffffdd78) at fmt.zig:1999
#5 fmt.scanReader (args=..., backing_reader=...) at fmt.zig:2076
#6 fmt.scan (input=..., args=...) at fmt.zig:2025
#7 fmt.scan.test "std-native-ReleaseFast-bare-multi scan basic" () at /media/data/Users/Travis/Documents/Code/zig/zig/lib/std/fmt/scan.zig:9
#8 0x0000000000218bf7 in std.special.main () at /media/data/Users/Travis/Documents/Code/zig/zig/lib/std/special/test_runner.zig:61
#9 start.callMain () at start.zig:334
#10 start.initEventLoopAndCallMain () at start.zig:276
#11 start.callMainWithArgs (argc=<optimized out>, argv=<optimized out>, envp=...) at start.zig:239
#12 start.posixCallMainAndExit () at start.zig:230
#13 0x000000000021849f in start._start () at start.zig:162 |
|
Looks like something related to the io.peekStream. I tried upping the peekStream size from 4 to 16 but its still spinning. I hope the ci job will time out. Its been running more than an hour now i think. |
- all previous tests passing - added tests for reader, bools, floats (-nan, -inf), width - initial width support for int, float, slice types - expect mutable slice pointer and assign its length, don't write beyond its length - try to stop eating +/- - ScanErrors only add InputMatchFailure, NegationOfUnsignedInteger, and ConversionFailure - other previous errors are now mis-compilations - added scanReader(). scan() is now just a wrapper around scanReader with fixedBufferStream - scanReader uses an io.peekStream for look ahead - add comptime prefix to isSpace
| '{' => { | ||
| if (start_index < i) { | ||
| //try writer.writeAll(fmt[start_index..i]); | ||
| // TODO not sure what to do here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LemonBoy any idea what should be done here? I don't understand what it means when start_index < i.
|
I'm cleaning up old drafts, feel free to re-open if you are still working on it. |
Initial draft, please review.
I was bored solving some old advent of code problems and wrote this. I think the community has been wanting this in zig std lib for a while.
Perhaps this should depend on
std.fmt.parseInt()rather than rolling my ownparseInt()? ThisparseInt()takes a out pointer and directly assigns to it. Open to suggestions here and in general.