Skip to content

Conversation

@tiehuis
Copy link
Member

@tiehuis tiehuis commented Feb 13, 2019

This is derived from http://krashan.ppa.pl/articles/stringtofloat/. it is a fairly simple implementation and it doesn't handle some edge cases but I figure something 95% of the way is better than nothing right now. There is also the question down the line of how easily we can get away without using an allocator since most implementations use a big number implementation for various edge cases but we forgo these cases now so don't need to worry about it just yet.

This also adds some more f16 and f128 support to std to support the float parsing requirements in generating specific output types.

I haven't thought about the error behavior at all and will look over that before merging this. It does what C does right now and fails with a zero value if nothing could be parsed. We can do better than that.

A discussion over the expected allowed values would be good too. For example, supporting hex-floats may be valuable.

Closes #375.

Allows addition/subtraction of f128 and narrowing casts to f16 from
larger float types.
This is not intended to be the long-term implementation as it doesn't
provide various properties that we eventually will want (e.g.
round-tripping, denormal support). It also uses f64 internally so the
wider f128 will be inaccurate.
@tiehuis tiehuis added the standard library This issue involves writing Zig code for the standard library. label Feb 13, 2019
@@ -0,0 +1,191 @@
// Ported from:
//
// https://github.com/llvm-mirror/compiler-rt/blob/92f7768ce940f6437b32ecc0985a1446cd040f7a/lib/builtins/fp_add_impl.inc
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've referenced the git mirror here regarding #1504 since it was easier. Can switch to the svn-id however if there is a preference.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's an official git repository now: https://github.com/llvm/llvm-project

d.d2 = @truncate(u32, w);
}

fn dump(d: Z96) void {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot I left this in. Can remove this tomorrow.

MinusInf,
};

inline fn isDigit(c: u8) bool {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would an ascii module be valuable? Similar to <ctype.h>. There are some implementations under os/path.zig that perform similar functions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that sounds quite reasonable.

std/json.zig Outdated
Value{ .Integer = try std.fmt.parseInt(i64, token.slice(input, i), 10) }
else
@panic("TODO: fmt.parseFloat not yet implemented");
Value{ .Float = std.fmt.parseFloat(f64, token.slice(input, i)) };
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a parseFloatExact or something similar would be valuable and would not allow trailing characters?

.mantissa = 0,
};

if (caseInEql(s, "nan")) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be worthwhile to add an extra condition here to avoid 4 compares in the non-special case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my experience doing anything smart around string comparisons tend to not be worth it as the compiler is pretty good in this area. https://github.com/Hejsil/fun-with-zig/blob/e8f857a401151dbb716acce1775e83c4d1656132/bench/match.zig#L9-L18

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Let's measure it!

https://zig.godbolt.org/z/9zC4yU

The slow-case is actually far worse than the 4 I said since there are a lot of branches in the slice equality check. Without the check we have to go through most of the branches for any value 3 or 4 digits in length. Definitely think this is worth it given the float-parsing code is often the bottleneck in high-performance parsing.

}

test "fmt.parseFloat" {
const assert = std.debug.assert;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should use the new testing.expect here instead.

@tiehuis
Copy link
Member Author

tiehuis commented Feb 14, 2019

I've modified this so parseFloat accepts strictly a complete floating point slice only. This is in line with how parseInt works currently.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have testing.expectError for this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have testing.expectEqual for this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we compare against 2 instead of 3? All our cases are >=3 long

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And we probably want to measure this optimizations impact and leave a comment here explaining why it is worth it do do this check

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've ended up removing this optimization for the moment. While it definitely is better assembly it doesn't really factor in to the runtime performance due to the current parsing/conversion dominating the runtime anyway.

I'll leave this for down the track when this is optimized specifically for performance instead.

Copy link
Member

@andrewrk andrewrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exciting! I think it's a great goal to one day have non-allocating float parsing that works for all values. Eliminate that entire class of nondeterminism.

@tiehuis tiehuis merged commit 77a4e7b into master Feb 16, 2019
@tiehuis tiehuis deleted the parse-float branch February 16, 2019 02:04
@andrewrk
Copy link
Member

🎉

should we have an open issue for the particular values that are not parsable yet? Seems like that would be a loose end to tie up before 1.0.0, yeah?

@donpdonp
Copy link

from zig 0.4.0. std.json.Parser on '{"float": 0.7062146892655368}' results in integer overflow. (note not a contrived value, though it looks that way. its from a mastodon public feed json which includes picture aspect ratios as overly specific floats).

@tiehuis
Copy link
Member Author

tiehuis commented May 20, 2019

Thanks @donpdonp. Fixed in this commit: 163a8e9.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

standard library This issue involves writing Zig code for the standard library.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

add to standard library: parse floats

4 participants