Skip to content

Add CG-07-18 meeting notes #1330

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
219 changes: 218 additions & 1 deletion main/2023/CG-07-18.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,4 +39,221 @@ Installation is required, see the calendar invite.

## Meeting Notes

To be added after the meeting.
###
Attendees
Thomas Trankler
Michael Smith
Conrad Watt
Daniel Hillerström
Justin MIchaud
Zalim Bashorov
Francis McCabe
Paolo Severini
Alon Zakai
Chris Woods
Yuri Delendik
Manos Koukoutos
Johnnie Birch
Nuno Perieira
Bruce He
Yuri Iozzelli
Adam Klein
Ilya Rezvov
Ryan Hunt
Deepti Gandluri
Sean Jensen-Grey
Andreas Rossberg
Asumu Takikawa
Ben Titzer
Thomas Lively
Dan Philips
Alex Crichton
Jakob Kummerow
Brendan Dahl
Andrew Brown
Matthias Liedtke
Nick Fitzgerald
Shaoib
Igor Iakovlev
Ashley Nelson
Jeff Charles
Mingqiu Sun
Emanuel Ziegler
Nick Ruff
Dan Gohman
Richard Winterton
Slava Kuzmich
Heejin Ahn
Shravan Narayan




### Proposals and Discussions

#### Strings in Wasm

Introduction, and quick summaries [Adam Klein, Jacob Kummerow, Ryan Hunt, 15 mins]

Adam’s quick update

AK: Hoping to get some resolution (tentative) - hear details on stringref from JK then discussion

straw polls on direction?
how positive do we feel about stringref
how positive do we feel about compile-time imports

JK update - presenting (TODO: Add slides)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
JK update - presenting (TODO: Add slides)
JK update - presenting ([slides](https://docs.google.com/presentation/d/1rc2M5oYXeRmiAKUNR3FrJQcOhgqHTRlZc11zlZn7PKw/edit?usp=sharing))


RH: QQ about bulk memory operations, about toUpper as a 2x performance difference, can you talk a little bit more about the performance difference?

JK: I haven’t measured all of the ops. I have seen about 3x for int to string conversion, comparing the building v8 implementation that takes all the shortcuts, vs the version from the JVM compiled to wasm. Presumably the JVM is a good implementation. It comes down to iteration performance, bounds checks, ability to operate on more than one char at a time, lots of implementation tricks that add up.

RH: With bulk memory, we did go down the route of having native implementations for common functions, would be hesitant to carry that over to strings, if there’s a way to have better performance across the board, we’d like to look into mitigating them

JK: I'm certainly not saying we have to add all of these to core wasm. If we have a string type it makes it easy to add those things as imports. We can make case by case decisions and start small, and possibly extend in the future.

CW: This is the hybrid solution that was floated earlier, where we can add a string type to core wasm that doesn’t do much, and then include compile time imports as well


JK: it’s up to us where we draw the line, yeah

FM: I’d like to look more at the gap between wasm and non-wasm implementations. Generally if there's a big gap between native and wasm implementations of the same algorithm, then we have a problem in wasm. It would be worth taking a naive implementation of toUpperCase for example, not a tricky one, and compare in wasm vs non-wasm. If there’s a factor of 4, then there's a problem in wasm that we should fix.

JK: Sure, ideally Wasm would be as fast as native, but in practise thats not the case, we can sufficiently fast compiler optimizations in the fullness of time, but we don’t expect to have that anytime soon

TL: for these performance comparisons, are these comparing implementations in wasm with e.g. i8/i16 arrays, or translating char by char to a JS string, and calling out to JS to do the operation?

JK: Comparing fewer string implementations instead of builtins. Leaving super tight, optimizable ops to the engine is a lot faster than rolling your own. We can also do string builtins so that 3rd parties can roll their own functions, but that’s the case where you’d see the 4x difference.

AB [chat] Question for Jakob: with stringref, you expect accesses inside a string to all be bounds-checked, right?

JK: yes. For iterating over string chars, there would be bounds checks, just like iterating over an array, and maybe some could be optimized away.

AB: This proposal would include new instructions for iterating over chars etc. right? Stringref has the key decision of introducing an encoding agnostic type, when something requires a particular encoding, then there’s a view that makes the engine do whatever it needs to do under the hood to ensure that you get that encoding. That’s how we get generic strings that support different source languages that can’t easily change

CW: my interest is really in the embedded space. There is’t usually linear applications like Rust and C. My concerns are around the affected languages and toolchain changes; i haven’t seen the languages you’ve considered. I’m also concerned about the continual growth of the runtime size. As we add more operations that we must support, the runtimes get bigger and the number of devices we can support drops. Have we considered how big runtimes will get, or have a policy about how far we’re willing to go? I like the idea of importing from the host, so I can choose what I have in my runtime. Embedded space is moving away from linear languages toward newer ones, so there’s a bit of a quandary.

JK: Haven’t looked at the smallest possible binary size we can get away with for a naive strings, I have experimented with having the stringref proposal implemented in pure wasm in an afternoon, and given the size of implementing GC, GC + Strings will be a smaller binary size
RH: I do think there’s a question about what is a compliant implementation. In the browser its a lot of code to do all those optimizations, there's a real chance that source modules will expect these optimia

CW: Its either a large binary size or that we have a disproportionately large performance gap

DS: If you want to support these newer GCd languages and have fast string performance, you’ll have to pay for it one way or another. You can either support them in your runtime in the form of stringref, or import them (but you import them from your runtime!), you really can’t choose to not do it if you want the performance benefit, it’s just about where

CW: Practically we can only get C or rust, but not other languages..

DS: There will be environments that just don’t support GC languages

CW: and this may be where profiles help

BT: There was a slide that had future revolutions - would argue having imported strings is easier for future extensions because we don’t need to support them right away, I disagree with how fundamental strings are, there are 80% of platforms that need strings and 20% that don’t so having an extensible choice makes more sense

JK: What I meant by more future proof is that the world is generally moving toward UTF8 but it’s relatively rare now. The stringref will scale without further additions to a world where UTF8 is more common. Importing won’t scale, we’d have to add to the spec. But I think the stringref proposal is more future proof than imported strings in that respect.

BT: I don’t understand that point, views if there are types can also be typed imports, all the functions can also be imported, the only difference is whether you call them type imports or have dedicated bytecode

AR: I would also make the point that abstracting these semantics is not enough to be able to change the representation in the future without braking code, because the performance characteristics matter, e.g .whether a loop is O(n) or O(n^2) and you can’t just argue that away by saying there’s and abstraction. I think this is less of a problem with imported strings because the developer has control of what they pick, they can pick one with the properties they want. But with stringref they are just at the mercy of what the engine picks. That’s a portability/compatibility hazard.

NF: <in chat> The Wasmtime project would prefer not to implement `stringref`. We don't have a managed host string to expose to Wasm guests, and we don't want to have to implement one. We would rather that Wasm guests implement their own strings on top of the GC proposal, where they can build exactly what they need, which can be very different across languages. As far as interoperability of strings across different languages goes, we believe this should be addressed at the component model and interface types layer. For tight integration with JS strings on the Web, string imports seem like a good solution.

CW: We have a position from wasmtime in chat, would anyone like to speak to or respond to that?

TT<in chat>: could you import stringref as a generic type being backed on browsers engines by domstring?

AK: That's good feedback, it’s pretty clear. I also wanted to address TT’s question. I think that’s effectively what a future version of the imports proposal would do with the addition of a type imports facility where you’d import a type from the host. So you’d import an externref that represents the DOMString. Alternatively a minimal version would be to have something in caore called “string” that was implemented in browsers as JS string or DOMstring.

CW: At the limit you have a type that is a string, where you can interact with the host but nothing else, which fixes boundary issues but it doesn’t mitigate the bulk performance issues that JK mentioned,

RH: one thing i'd like to discuss more, i think JK had a good point about what we should be thinking is a good compile target. I don’t think stringref is a good compile target. There are 2 problems it tries to solve. One is host interop, where you have the same as what the host uses for native APIs. The other is bing a good fast string internally that supports fast operations. With host strings, I view wasm as not really having an opinion about what the host strings are so we can’t say much atou what kinds of encodings nwe use, what operations it supports. So we can’t say much about what you can import from particular hosts.
With bulk operations you really do want to know about the performance characteristics of what you're importing.

There’s not much you can say about them when importing from specific hosts, for the bulk operations you really want to know about the perf characteristics, It’s not a good compile target because we don’t know how its implemented, it's not so much about the eJS string types, but what hosts have to do to make them performant its more about what you’re mandating the hosts to do or implement based on whats expected.

If you have a UTF8 languages the hos string is UTF16 and needs to copy, then you will get bad performance.

BT: I think we should be relatively conservative about adding things to core wasm and pick things that have a big bang for the buck, things that are on most machine. If we pick between stringref, which does have a lot of value, but I think compile-time imports and type imports are bigger value because they solve multiple problems at once. They allow for other intrinsics like JS operations, typedarray operations, and things from multiple different platforms. So there's a bit difference in terms of the capabilities. Stringref solves a small set of problems, but type imports solves a whole family of problems.

JK: I think type imports are great top have, and would want them, I also think that strings are also very fundamental that we still want them

CW: theres also a timing issues. Type imports can take a lot longer to spec an put in the platform. What would the impact be on users if they had to wait say a year or more for type imports to get in?

AK: from Jakob’s slides, even without type imports today we can do pretty well on performance today without that. It would be definitely good to add, but externref is workable in the short term.

JK: We can work with exnref for now, having type imports would eliminate some type checks, generally speaking we can do now

RH: you’d want to have a clear migration story to using type imports in the future.

CW: One of the motivations was this boundary problem, where does that show up if its not the difference between exnref and the stringref?

RH: my understanding is that it’s with copies, not so much about the type check, but sharing the representation.

JK: If we did neither, imported strings or stringref, then wasm modules have their own strings rolled in, then when exposing to the outside, then you always have to copy, and then it would still be slow

CW: Even in the case where you're trying to solve the problem of interop performance, that still requires the language to change its representation everywhere to whatever you’re using at the boundary?

JK: Both of these approaches can give fast copy primitives, that does mitigate some of the worst problems, but it is still a copy, I have a microbenchmark, that can demonstrates some of this performance issues scale badly

AR: Going back a bit, I was surprised by the assumption Conrad made that implementing stringref is probably going to happen faster than imported intrinsics. I would assume they are on par at best for an engine that hasn’t started either one, I would guess that imports would more likely be faster. I guess we don’t have real data but I would still guess that imports would be less to implement.

CW: IT does require a change to the way the JS API looks at instantiating a Wasm module

AR:yes but I would still guess that’s a much smaller change.

JK: Probably about the same. Most of the code you need is the same between the 2. Then you either need to implement decoding, or implement the builtins that recognize the builtins and dispatch them to the same place in the end.

CW: In the interest of moving on, we should moving on to the straw polls

AK: Presenting slide of two competing approaches?

RH: on the “lightweight stringref” i think that could use some discussion, I’m not sure I really understand it.

AK: if it’s useful it might come out of the imports work as we do it.

CW: The way we should do this is how positive do we feel about moving forward with the string ref approach and then a follow up of the next one as well?


Question: how do we feel about moving forward with the stringref approach

Favorable: 18
Neutral: 10
Unfavorable/Against: 9

Q: how do we feel about moving forward with the imported strings approach
Favorable: 24
Neutral: 13
Unfavorable: 0

CW: thanks, that’s useful to inform future discussions

#### Update and Phase 1 vote for [Compile-time imports](https://github.com/WebAssembly/design/issues/1479) and [String builtins](https://github.com/WebAssembly/design/issues/1480)

RH: [presenting, TODO add slides]

RH: also I don’t know what sort of vote?

CW: Given that we didn’t have any unfavorable results in the straw poll, I think unanimous consent would be sufficient, if anyone objects we can fall back to a full poll. We also need a champion.

RH: i could be the champion

FM: doesn’t this suppose that this proposal is winning in preference to stringref?

CW: Stringref is already phase 1, this is just putting both proposals at par with each other
CW: this looks like unanimous consent. Thanks to all the folks who put in all the work in preparing and discussion

RH: and prototyping, thanks to the V8 folks who did that.

Unanimous consent, Proposal at Phase 1

#### Introduce Mike Smith as the W3C contact

DG: I’d like to introduce Mike our new W3C contact

MS: we’ve been without much staff support from W3C for a while, but I’ve just started out. W3C doesn’t usually devote staff to CGs, only to WGs, but that doesn’t really matter because it’s all one effort with wasm. So I’m here to help support what the group needs to get work done. One area in particular is publishing. I have an action item to help get publishing working. But in general if you need support you can reach out, I can help. I plan to attend more of these meetings in the future

Note about October live meeting: https://github.com/WebAssembly/meetings/blob/main/main/2023/CG-10.md


AZ [chat]: Idea for future polls: rather than F/N/A we could use +/0/- which is less ambiguous

### Closure