Add float parsing support to std #1958

tiehuis · 2019-02-13T11:23:01Z

This is derived from http://krashan.ppa.pl/articles/stringtofloat/. it is a fairly simple implementation and it doesn't handle some edge cases but I figure something 95% of the way is better than nothing right now. There is also the question down the line of how easily we can get away without using an allocator since most implementations use a big number implementation for various edge cases but we forgo these cases now so don't need to worry about it just yet.

This also adds some more f16 and f128 support to std to support the float parsing requirements in generating specific output types.

I haven't thought about the error behavior at all and will look over that before merging this. It does what C does right now and fails with a zero value if nothing could be parsed. We can do better than that.

A discussion over the expected allowed values would be good too. For example, supporting hex-floats may be valuable.

Closes #375.

Allows addition/subtraction of f128 and narrowing casts to f16 from larger float types.

This is not intended to be the long-term implementation as it doesn't provide various properties that we eventually will want (e.g. round-tripping, denormal support). It also uses f64 internally so the wider f128 will be inaccurate.

tiehuis · 2019-02-13T11:25:23Z

std/special/compiler_rt/addXf3.zig

@@ -0,0 +1,191 @@
+// Ported from:
+//
+// https://github.com/llvm-mirror/compiler-rt/blob/92f7768ce940f6437b32ecc0985a1446cd040f7a/lib/builtins/fp_add_impl.inc


I've referenced the git mirror here regarding #1504 since it was easier. Can switch to the svn-id however if there is a preference.

There's an official git repository now: https://github.com/llvm/llvm-project

tiehuis · 2019-02-13T11:27:14Z

std/fmt/parse_float.zig

+        d.d2 = @truncate(u32, w);
+    }
+
+    fn dump(d: Z96) void {


Forgot I left this in. Can remove this tomorrow.

tiehuis · 2019-02-13T11:29:18Z

std/fmt/parse_float.zig

+    MinusInf,
+};
+
+inline fn isDigit(c: u8) bool {


Would an ascii module be valuable? Similar to <ctype.h>. There are some implementations under os/path.zig that perform similar functions.

I think that sounds quite reasonable.

tiehuis · 2019-02-13T11:30:35Z

std/json.zig

            Value{ .Integer = try std.fmt.parseInt(i64, token.slice(input, i), 10) }
        else
-            @panic("TODO: fmt.parseFloat not yet implemented");
+            Value{ .Float = std.fmt.parseFloat(f64, token.slice(input, i)) };


Maybe a parseFloatExact or something similar would be valuable and would not allow trailing characters?

tiehuis · 2019-02-13T11:41:47Z

std/fmt/parse_float.zig

+        .mantissa = 0,
+    };
+
+    if (caseInEql(s, "nan")) {


Would be worthwhile to add an extra condition here to avoid 4 compares in the non-special case.

From my experience doing anything smart around string comparisons tend to not be worth it as the compiler is pretty good in this area. https://github.com/Hejsil/fun-with-zig/blob/e8f857a401151dbb716acce1775e83c4d1656132/bench/match.zig#L9-L18

Good point. Let's measure it!

https://zig.godbolt.org/z/9zC4yU

The slow-case is actually far worse than the 4 I said since there are a lot of branches in the slice equality check. Without the check we have to go through most of the branches for any value 3 or 4 digits in length. Definitely think this is worth it given the float-parsing code is often the bottleneck in high-performance parsing.

tiehuis · 2019-02-13T22:50:04Z

std/fmt/parse_float.zig

+}
+
+test "fmt.parseFloat" {
+    const assert = std.debug.assert;


Should use the new testing.expect here instead.

tiehuis · 2019-02-14T10:46:56Z

I've modified this so parseFloat accepts strictly a complete floating point slice only. This is in line with how parseInt works currently.

Hejsil · 2019-02-14T10:56:29Z

std/fmt/parse_float.zig

We have testing.expectError for this

Hejsil · 2019-02-14T10:56:46Z

std/fmt/parse_float.zig

We have testing.expectEqual for this

Hejsil · 2019-02-14T11:02:14Z

std/fmt/parse_float.zig

Is there a reason we compare against 2 instead of 3? All our cases are >=3 long

And we probably want to measure this optimizations impact and leave a comment here explaining why it is worth it do do this check

I've ended up removing this optimization for the moment. While it definitely is better assembly it doesn't really factor in to the runtime performance due to the current parsing/conversion dominating the runtime anyway.

I'll leave this for down the track when this is optimized specifically for performance instead.

andrewrk

Exciting! I think it's a great goal to one day have non-allocating float parsing that works for all values. Eliminate that entire class of nondeterminism.

andrewrk · 2019-02-16T02:14:56Z

🎉

should we have an open issue for the particular values that are not parsable yet? Seems like that would be a loose end to tie up before 1.0.0, yeah?

donpdonp · 2019-05-17T23:18:08Z

from zig 0.4.0. std.json.Parser on '{"float": 0.7062146892655368}' results in integer overflow. (note not a contrived value, though it looks that way. its from a mastodon public feed json which includes picture aspect ratios as overly specific floats).

tiehuis · 2019-05-20T05:30:36Z

Thanks @donpdonp. Fixed in this commit: 163a8e9.

tiehuis added 4 commits February 13, 2019 23:24

compiler-rt: Add __addtf3, __subtf3 and __truncdfhf2

be861a8

Allows addition/subtraction of f128 and narrowing casts to f16 from larger float types.

Add f128 support for fabs, isinf, isnan, inf and nan functions

cf007e3

Add parseFloat to std.fmt

c34ce68

This is not intended to be the long-term implementation as it doesn't provide various properties that we eventually will want (e.g. round-tripping, denormal support). It also uses f64 internally so the wider f128 will be inaccurate.

Add parseFloat support to json.zig

de7c551

tiehuis added the standard library This issue involves writing Zig code for the standard library. label Feb 13, 2019

tiehuis commented Feb 13, 2019

View reviewed changes

tiehuis mentioned this pull request Feb 14, 2019

FreeBSD PR CI does not checkout the correct pr branch #1960

Closed

Hejsil reviewed Feb 14, 2019

View reviewed changes

Make parseFloat stricter in what it accepts as input

18ad509

tiehuis force-pushed the parse-float branch from a77628e to 18ad509 Compare February 15, 2019 04:32

Use official llvm mirror for compiler-rt commit ref

170ec50

andrewrk approved these changes Feb 16, 2019

View reviewed changes

tiehuis merged commit 77a4e7b into master Feb 16, 2019

tiehuis deleted the parse-float branch February 16, 2019 02:04

andrewrk mentioned this pull request Apr 7, 2019

robust float parsing in the standard library #2207

Closed

Uh oh!

Add float parsing support to std #1958

Add float parsing support to std #1958

Uh oh!

Conversation

tiehuis commented Feb 13, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tiehuis commented Feb 14, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andrewrk left a comment

Choose a reason for hiding this comment

Uh oh!

andrewrk commented Feb 16, 2019

Uh oh!

donpdonp commented May 17, 2019

Uh oh!

tiehuis commented May 20, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

tiehuis commented Feb 13, 2019 •

edited

Loading