Support symbolic variables #12

rossberg · 2015-08-18T21:35:24Z

@lukewagner @kg , this makes the parser a bit more ugly, but I suppose it's useful for hand-written code. I tried to keep all of it out of the AST. The only compromise is that vars are now lazy integers (to deal with forward references w/o an extra pass). WDYT?

kg · 2015-08-18T22:25:48Z

I did a lot of thinking/experimenting on this and discussed it with @jfbastien in the past. It might be easier to discuss it over a call and then document it here after-the-fact. I'll try to write up my thoughts and conclusions at length though, once I've had time to understand your changes here.

lukewagner · 2015-08-19T03:19:21Z

I'm totally in favor of having names in the S-Expr language; we'll be writing a lot of these, so names and readability are a win as long as we keep it out of the AST/semantics.

I'm a little sad to see the laziness enter ast.ml/eval.ml, since those are supposed to be our little temples of speciness and this is an impl detail leaking out from SExpr parsing limitations. At the cost of making SExpr parsing more verbose (but simpler, I think), what if we introduced a new SExpr ADT (which stored all names as simple strings) which was generated by parser.mly and then converted to an Ast.modul in a second pass (which could then easily do the name resolution)? The SExpr ADT wouldn't have to duplicate all of ast.ml; it could use generic "unary" and "binary" nodes for most expressions and only have meaningful name-related nodes. Another potential benefit is that, right now parser.mly is rather tightly coupled with Ast.modul and has magic ocamlyacc syntax and I expect this would get much simpler with the new intermediate.

rossberg · 2015-08-19T06:54:02Z

Yes, I agree the lazy leaking into the AST is sad. It was the one smallish price to pay for avoiding an extra representation and pass. But I'm happy to discuss other options.

FWIW, though, I had actually started out the prototype just like you suggest: having a simple parser that just creates a generic S-expr tree and then transforms that in a separate pass. The problem was that the transformation then had to manually lex all the mnemonics to tokenise and decompose them. Trust me, that was -way- more complicated and ugly than using mllex's declarative syntax, which is why I abandoned it.

So if you want to avoid that, you'd still want the intermediate ADT to already have all the opcodes resolved. Thus, it would pretty much duplicate the AST. Not that I wouldn't be okay with that, just pointing out.

Can you elaborate what magic ocamlyacc syntax you are referring to? At least this change doesn't add anything magic, AFAICT.

kg · 2015-08-19T14:19:15Z

My prototype had a separate SExpr AST and grammar, then a transform to map SExpr to the wasm AST. Symbol parsing occurred in the SExpr parser, and mapping symbol names to integer IDs happened in the SExpr -> wasm AST stage. That allows us to avoid having any laziness in the AST (I think we definitely don't want anything fancy in there).

The general model was that during sexpr -> wasm parsing, it would encounter symbols (i.e. @symbolName or @0). A raw integer symbol basically represents a stripped symbol - anywhere we're referring to a semantic identifier like a function or a local by index, that's a symbol. Some general rules would govern which (if any) symbol names are round-tripped through the binary format, and code generators could be free to generate sexprs only containing numeric symbols instead of names.

When converting a SExpr module to a wasm module, a symbol table is generated so that the @symbolName <-> @0 mapping is consistent. I defined a simple rule such that each function also had a child symbol table, such that the list of a function's formal parameters and locals could also define local symbol names. I'm not sure whether that's a good idea in the long run, though. This was pretty simple to implement and meant that at the wasm ast level you just had Symbols as arguments to operations like setlocal and invoke, where the symbol is either a name+integer pair or just an integer.

See https://github.com/WebAssembly/semantics-prototype/blob/master/sexpr/sexpr.fs , https://github.com/WebAssembly/semantics-prototype/blob/master/text-decoder/sexpr-to-ast.fs and https://github.com/WebAssembly/semantics-prototype/blob/master/text-decoder/symbols.fs .

I definitely don't think we should have laziness in the wasm AST. I think it should have symbolic references (distinct from just bare ints), and potentially have names attached to them. Tooling will absolutely require names, but it's possible we want to leave that concept out of the formal spec and build it into a separate reference toolchain that we encourage people to use.

kg · 2015-08-19T14:21:57Z

Ah, sorry. And for the record, here's what a toy module definition looked like in my grammar, using named symbols.

https://github.com/WebAssembly/semantics-prototype/blob/master/test.fsx#L36

In that context you can replace all the named symbols with integers and it works the same.

(You'll also note that my grammar had the concept of keywords, for things like the type int32. I think we need this in cases where we don't have the ability to pack the type into the opcode name.)

rossberg · 2015-08-19T15:15:56Z

On 19 August 2015 at 16:19, Katelyn Gadd [email protected] wrote:

My prototype had a separate SExpr AST and grammar, then a transform to map
SExpr to the wasm AST. Symbol parsing occurred in the SExpr parser, and
mapping symbol names to integer IDs happened in the SExpr -> wasm AST
stage. That allows us to avoid having any laziness in the AST (I think we
definitely don't want anything fancy in there).

The general model was that during sexpr -> wasm parsing, it would
encounter symbols (i.e. @symbolName or @0). A raw integer symbol
basically represents a stripped symbol - anywhere we're referring to a
semantic identifier like a function or a local by index, that's a symbol.
Some general rules would govern which (if any) symbol names are
round-tripped through the binary format, and code generators could be free
to generate sexprs only containing numeric symbols instead of names.

When converting a SExpr module to a wasm module, a symbol table is
generated so that the @symbolName <-> @0 mapping is consistent. I defined
a simple rule such that each function also had a child symbol table, such
that the list of a function's formal parameters and locals could also
define local symbol names. I'm not sure whether that's a good idea in the
long run, though. This was pretty simple to implement and meant that at the
wasm ast level you just had Symbols as arguments to operations like
setlocal and invoke, where the symbol is either a name+integer pair or just
an integer.

Oh, I agree that symbol conversion is straightforward (which is why I was
even able to fold it into the parser :) ). FWIW, the current change allows
a mix of symbolic and raw indexes just like you describe.

My concern was about Luke's broader suggestion of doing generic
S-expressions first. Recognising and decomposing mnemonics isn't entirely
trivial, and it's exactly the kind of thing that Lex is designed for.
Hacking that manually is inferior in almost every way, so I don't see the
benefit.

But I'm fine with having an intermediate AST with symbolic names.

See

https://github.com/WebAssembly/semantics-prototype/blob/master/sexpr/sexpr.fs
,
https://github.com/WebAssembly/semantics-prototype/blob/master/text-decoder/sexpr-to-ast.fs
and
https://github.com/WebAssembly/semantics-prototype/blob/master/text-decoder/symbols.fs
.

I definitely don't think we should have laziness in the wasm AST. I think
it should have symbolic references (distinct from just bare ints), and
potentially have names attached to them. Tooling will absolutely require
names, but it's possible we want to leave that concept out of the formal
spec and build it into a separate reference toolchain that we encourage
people to use.

Yes, I don't think the actual AST should be polluted with symbolic
references. The way I see it, they are a matter of external representation
and tooling, not a proper part of the low-level language we are defining.
That one just knows indexes.

lukewagner · 2015-08-19T16:27:54Z

@rossberg-chromium Oops, I didn't mean to imply a fully generic/unstructured SExpr AST; it definitely makes sense to leverage lex/yacc and capture the results in the intermediate AST. The "magic" I was referring to was module_fields but this was more owing to my lack of familiarity and on second glance it looks fine so n/m, sorry.

Actually, it seems like you could use Ast.modul as the intermediate with var = int and no laziness. The first (parsing) pass would give functions nonconsecutive integers based on order defined/used and the second pass would fixe up all the integers so that functions had consecutive indices. Seems pretty easy to do with a mutable map or two maintained while parsing. The only downside is the boilerplate of traversing the whole AST in the second pass just to find the few relevant nodes (does ocaml have generic traversals by any chance? :), but I think it's worth it.

Add named exports

kg · 2015-08-19T19:19:36Z

FTR since I didn't clearly express this, after looking over this commit it looks like symbols are just bare identifiers, like opcode names and types? I very strongly believe they should have some sort of syntactic disambiguation, like @symbolName or :symbolName or $symbolName. Thoughts?

rossberg · 2015-08-19T19:39:59Z

@lukewagner, ah, ok, seems I misread what you said. I will look into making name resolution a separate pass. No geenric traversal in Ocaml, but it shouldn't be too bad in this case, the AST is pretty small.

@kg, I'd be fine with making that a requirement, if others prefer so as well. But what ambiguity do you worry about specifically? With the current change, almost any sequence of characters that can't be mistaken for a number is allowed (Lisp-style), so any of the above could be used by convention.

kg · 2015-08-19T19:45:53Z

Essentially, I think our sexpr grammar should never require context to figure out what a particular set of characters is. This is partially a readability consideration but it's also a parsing one.

For example, stripping symbol names is a thing that will happen in some capacity. In that case, I really, really don't want it to go from getlocal foo to getlocal 0. getlocal :foo -> getlocal :0 is more immediately clear, I think. This matters more in cases where a single sexpr may contain many symbols, like a function definition where we're potentially putting a name, list of argument types, list of argument names, list of local types, etc all next to each other in some particular arrangement. Line breaks might get introduced there.

Your current parse rules are pretty unambiguous, so I don't see a problem there. An alternate solution to my concern is just to ensure that numeric symbols are disambiguated in some way from raw integer literals.

rossberg · 2015-08-19T20:12:04Z

@kg, as long as we avoid tagless S-expr nodes, this is a fairly simple problem, because tags and leafs can always be distinguished trivially. Then you only need to disambiguate different leaf types, which is what you are getting at in the last paragraph, I suppose?

rossberg · 2015-08-20T08:29:06Z

Okay, I figured out a way to avoid laziness -- and it's even simpler than before. A tiny extra suspension in the right place when parsing functions is all you need. No extra pass. :)

lukewagner · 2015-08-20T17:46:50Z

Hah, nicely done sir! After rebasing over the conflict with named exports, then we'll have names on both sides of the export. Since we don't exactly have a daunting text suite to update, do you think we could just remove the unnamed alternatives for func/local declaration and update the tests to use names? Either way, lgtm.

…into named-variables Conflicts: ml-proto/README.md ml-proto/src/eval.ml ml-proto/src/parser.mly ml-proto/test/expected-output/fac.wasm.log ml-proto/test/fac.wasm

…ositions are extracted eagerly

rossberg · 2015-08-20T21:00:03Z

@kg, enforcing symbolic names starting with $ now.

@lukewagner, changed other tests to use symbolic names.

kg · 2015-08-20T22:07:28Z

lgtm

lukewagner · 2015-08-20T23:49:25Z

@rossberg-chromium That's great; what do you think about removing support for the nameless declarations?

rossberg · 2015-08-21T05:35:29Z

@lukewagner, hm, why? Isn't it useful to still be able to express the "raw" AST format, too? S-expr generators might also benefit.

Support symbolic variables

lukewagner · 2015-08-21T14:23:10Z

@rossberg-chromium I don't see any real expressiveness benefits by allowing unnamed but it also doesn't really matter so I'll drop it.

* [spec] Initial documentation of syntax Includes the basic structure needed for the proposal, but no validation, binary, text or execution.

Fixes WebAssembly#12. The effect of these dynamic constructors can be achieved with an S.const instruction followed by a sequence of S.replaceLane instructions. This is also what LLVM does.

Handle issues 7, 9, and 10.

…ly#12)

Addresses WebAssembly#12.

This patch pulls in the recent changes to WebAssembly/spec, WebAssembly/function-references, and WebAssembly/gc.

Merge upstream

* Fix binary grammar definition of the branch hints custom section The overall section structure definition wasm missing.

rossberg added 2 commits August 18, 2015 23:15

Enable named variables

2c1bc98

Allow forward references

773177f

rossberg added 3 commits August 19, 2015 20:08

Merge pull request #13 from WebAssembly/named-exports

f3d7186

Add named exports

Adapted test logs; minor beautifications

54b03ce

Nit

dafa14c

rossberg added 2 commits August 20, 2015 09:10

Removed contributed test for unclear IP status

7960a17

Allow forward references without leaking laziness into the AST

dbef419

rossberg added 6 commits August 20, 2015 21:54

Document naming conventions

22902b5

Enable named variables

c8b6c4e

Allow forward references

26333ed

Allow forward references without leaking laziness into the AST

5dfdac7

Merge branch 'named-variables' of https://github.com/WebAssembly/spec …

646f6da

…into named-variables Conflicts: ml-proto/README.md ml-proto/src/eval.ml ml-proto/src/parser.mly ml-proto/test/expected-output/fac.wasm.log ml-proto/test/fac.wasm

Enforce that symbolic variables start with $; make sure that source p…

18fcc34

…ositions are extracted eagerly

rossberg added a commit that referenced this pull request Aug 21, 2015

Merge pull request #12 from WebAssembly/named-variables

93441b0

Support symbolic variables

rossberg merged commit 93441b0 into master Aug 21, 2015

rossberg deleted the named-variables branch August 21, 2015 05:37

eqrion pushed a commit to eqrion/wasm-spec that referenced this pull request Sep 18, 2019

Remove ref.eq (WebAssembly#12)

ab760d0

rossberg referenced this pull request in effect-handlers/wasm-spec Feb 15, 2021

Link to the JS type reflection proposal (#12)

7cf1a94

dhil pushed a commit to dhil/webassembly-spec that referenced this pull request Mar 2, 2023

Merge pull request WebAssembly#12 from KarlSchimpf/label

54a506c

Handle issues 7, 9, and 10.

dhil pushed a commit to dhil/webassembly-spec that referenced this pull request Mar 2, 2023

fix cont.bind to bind arguments in the correct order (fixes WebAssemb…

3d81921

…ly#12)

backes pushed a commit to backes/spec that referenced this pull request Jul 12, 2023

Update syntax for elem/data to match bulk proposal (WebAssembly#12)

768400e

dhil pushed a commit to dhil/webassembly-spec that referenced this pull request Oct 3, 2023

Fix typo in type grammar

273691c

Addresses WebAssembly#12.

dhil added a commit to dhil/webassembly-spec that referenced this pull request Nov 13, 2023

Merge pull request WebAssembly#12 from dhil/wasmfx-merge

ab570e6

This patch pulls in the recent changes to WebAssembly/spec, WebAssembly/function-references, and WebAssembly/gc.

dhil pushed a commit to dhil/webassembly-spec that referenced this pull request Apr 12, 2024

Merge pull request WebAssembly#12 from WebAssembly/merge-upstream2

33d73c4

Merge upstream

rossberg pushed a commit that referenced this pull request Jul 19, 2024

Fix binary grammar definition of the branch hints custom section (#12)

b0913a9

* Fix binary grammar definition of the branch hints custom section The overall section structure definition wasm missing.

Support symbolic variables #12

Support symbolic variables #12

Uh oh!

Conversation

rossberg commented Aug 18, 2015

Uh oh!

kg commented Aug 18, 2015

Uh oh!

lukewagner commented Aug 19, 2015

Uh oh!

rossberg commented Aug 19, 2015

Uh oh!

kg commented Aug 19, 2015

Uh oh!

kg commented Aug 19, 2015

Uh oh!

rossberg commented Aug 19, 2015

Uh oh!

lukewagner commented Aug 19, 2015

Uh oh!

kg commented Aug 19, 2015

Uh oh!

rossberg commented Aug 19, 2015

Uh oh!

kg commented Aug 19, 2015

Uh oh!

rossberg commented Aug 19, 2015

Uh oh!

rossberg commented Aug 20, 2015

Uh oh!

lukewagner commented Aug 20, 2015

Uh oh!

rossberg commented Aug 20, 2015

Uh oh!

kg commented Aug 20, 2015

Uh oh!

lukewagner commented Aug 20, 2015

Uh oh!

rossberg commented Aug 21, 2015

Uh oh!

lukewagner commented Aug 21, 2015

Uh oh!

Uh oh!