Skip to content

Support symbolic variables #12

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Aug 21, 2015
Merged

Support symbolic variables #12

merged 13 commits into from
Aug 21, 2015

Conversation

rossberg
Copy link
Member

@lukewagner @kg , this makes the parser a bit more ugly, but I suppose it's useful for hand-written code. I tried to keep all of it out of the AST. The only compromise is that vars are now lazy integers (to deal with forward references w/o an extra pass). WDYT?

@kg
Copy link
Contributor

kg commented Aug 18, 2015

I did a lot of thinking/experimenting on this and discussed it with @jfbastien in the past. It might be easier to discuss it over a call and then document it here after-the-fact. I'll try to write up my thoughts and conclusions at length though, once I've had time to understand your changes here.

@lukewagner
Copy link
Member

I'm totally in favor of having names in the S-Expr language; we'll be writing a lot of these, so names and readability are a win as long as we keep it out of the AST/semantics.

I'm a little sad to see the laziness enter ast.ml/eval.ml, since those are supposed to be our little temples of speciness and this is an impl detail leaking out from SExpr parsing limitations. At the cost of making SExpr parsing more verbose (but simpler, I think), what if we introduced a new SExpr ADT (which stored all names as simple strings) which was generated by parser.mly and then converted to an Ast.modul in a second pass (which could then easily do the name resolution)? The SExpr ADT wouldn't have to duplicate all of ast.ml; it could use generic "unary" and "binary" nodes for most expressions and only have meaningful name-related nodes. Another potential benefit is that, right now parser.mly is rather tightly coupled with Ast.modul and has magic ocamlyacc syntax and I expect this would get much simpler with the new intermediate.

@rossberg
Copy link
Member Author

Yes, I agree the lazy leaking into the AST is sad. It was the one smallish price to pay for avoiding an extra representation and pass. But I'm happy to discuss other options.

FWIW, though, I had actually started out the prototype just like you suggest: having a simple parser that just creates a generic S-expr tree and then transforms that in a separate pass. The problem was that the transformation then had to manually lex all the mnemonics to tokenise and decompose them. Trust me, that was -way- more complicated and ugly than using mllex's declarative syntax, which is why I abandoned it.

So if you want to avoid that, you'd still want the intermediate ADT to already have all the opcodes resolved. Thus, it would pretty much duplicate the AST. Not that I wouldn't be okay with that, just pointing out.

Can you elaborate what magic ocamlyacc syntax you are referring to? At least this change doesn't add anything magic, AFAICT.

@kg
Copy link
Contributor

kg commented Aug 19, 2015

My prototype had a separate SExpr AST and grammar, then a transform to map SExpr to the wasm AST. Symbol parsing occurred in the SExpr parser, and mapping symbol names to integer IDs happened in the SExpr -> wasm AST stage. That allows us to avoid having any laziness in the AST (I think we definitely don't want anything fancy in there).

The general model was that during sexpr -> wasm parsing, it would encounter symbols (i.e. @symbolName or @0). A raw integer symbol basically represents a stripped symbol - anywhere we're referring to a semantic identifier like a function or a local by index, that's a symbol. Some general rules would govern which (if any) symbol names are round-tripped through the binary format, and code generators could be free to generate sexprs only containing numeric symbols instead of names.

When converting a SExpr module to a wasm module, a symbol table is generated so that the @symbolName <-> @0 mapping is consistent. I defined a simple rule such that each function also had a child symbol table, such that the list of a function's formal parameters and locals could also define local symbol names. I'm not sure whether that's a good idea in the long run, though. This was pretty simple to implement and meant that at the wasm ast level you just had Symbols as arguments to operations like setlocal and invoke, where the symbol is either a name+integer pair or just an integer.

See https://github.com/WebAssembly/semantics-prototype/blob/master/sexpr/sexpr.fs , https://github.com/WebAssembly/semantics-prototype/blob/master/text-decoder/sexpr-to-ast.fs and https://github.com/WebAssembly/semantics-prototype/blob/master/text-decoder/symbols.fs .

I definitely don't think we should have laziness in the wasm AST. I think it should have symbolic references (distinct from just bare ints), and potentially have names attached to them. Tooling will absolutely require names, but it's possible we want to leave that concept out of the formal spec and build it into a separate reference toolchain that we encourage people to use.

@kg
Copy link
Contributor

kg commented Aug 19, 2015

Ah, sorry. And for the record, here's what a toy module definition looked like in my grammar, using named symbols.

https://github.com/WebAssembly/semantics-prototype/blob/master/test.fsx#L36

In that context you can replace all the named symbols with integers and it works the same.

(You'll also note that my grammar had the concept of keywords, for things like the type int32. I think we need this in cases where we don't have the ability to pack the type into the opcode name.)

@rossberg
Copy link
Member Author

On 19 August 2015 at 16:19, Katelyn Gadd [email protected] wrote:

My prototype had a separate SExpr AST and grammar, then a transform to map
SExpr to the wasm AST. Symbol parsing occurred in the SExpr parser, and
mapping symbol names to integer IDs happened in the SExpr -> wasm AST
stage. That allows us to avoid having any laziness in the AST (I think we
definitely don't want anything fancy in there).

The general model was that during sexpr -> wasm parsing, it would
encounter symbols (i.e. @symbolName or @0). A raw integer symbol
basically represents a stripped symbol - anywhere we're referring to a
semantic identifier like a function or a local by index, that's a symbol.
Some general rules would govern which (if any) symbol names are
round-tripped through the binary format, and code generators could be free
to generate sexprs only containing numeric symbols instead of names.

When converting a SExpr module to a wasm module, a symbol table is
generated so that the @symbolName <-> @0 mapping is consistent. I defined
a simple rule such that each function also had a child symbol table, such
that the list of a function's formal parameters and locals could also
define local symbol names. I'm not sure whether that's a good idea in the
long run, though. This was pretty simple to implement and meant that at the
wasm ast level you just had Symbols as arguments to operations like
setlocal and invoke, where the symbol is either a name+integer pair or just
an integer.

Oh, I agree that symbol conversion is straightforward (which is why I was
even able to fold it into the parser :) ). FWIW, the current change allows
a mix of symbolic and raw indexes just like you describe.

My concern was about Luke's broader suggestion of doing generic
S-expressions first. Recognising and decomposing mnemonics isn't entirely
trivial, and it's exactly the kind of thing that Lex is designed for.
Hacking that manually is inferior in almost every way, so I don't see the
benefit.

But I'm fine with having an intermediate AST with symbolic names.

See

https://github.com/WebAssembly/semantics-prototype/blob/master/sexpr/sexpr.fs
,
https://github.com/WebAssembly/semantics-prototype/blob/master/text-decoder/sexpr-to-ast.fs
and
https://github.com/WebAssembly/semantics-prototype/blob/master/text-decoder/symbols.fs
.

I definitely don't think we should have laziness in the wasm AST. I think
it should have symbolic references (distinct from just bare ints), and
potentially have names attached to them. Tooling will absolutely require
names, but it's possible we want to leave that concept out of the formal
spec and build it into a separate reference toolchain that we encourage
people to use.

Yes, I don't think the actual AST should be polluted with symbolic
references. The way I see it, they are a matter of external representation
and tooling, not a proper part of the low-level language we are defining.
That one just knows indexes.

@lukewagner
Copy link
Member

@rossberg-chromium Oops, I didn't mean to imply a fully generic/unstructured SExpr AST; it definitely makes sense to leverage lex/yacc and capture the results in the intermediate AST. The "magic" I was referring to was module_fields but this was more owing to my lack of familiarity and on second glance it looks fine so n/m, sorry.

Actually, it seems like you could use Ast.modul as the intermediate with var = int and no laziness. The first (parsing) pass would give functions nonconsecutive integers based on order defined/used and the second pass would fixe up all the integers so that functions had consecutive indices. Seems pretty easy to do with a mutable map or two maintained while parsing. The only downside is the boilerplate of traversing the whole AST in the second pass just to find the few relevant nodes (does ocaml have generic traversals by any chance? :), but I think it's worth it.

@kg
Copy link
Contributor

kg commented Aug 19, 2015

FTR since I didn't clearly express this, after looking over this commit it looks like symbols are just bare identifiers, like opcode names and types? I very strongly believe they should have some sort of syntactic disambiguation, like @symbolName or :symbolName or $symbolName. Thoughts?

@rossberg
Copy link
Member Author

@lukewagner, ah, ok, seems I misread what you said. I will look into making name resolution a separate pass. No geenric traversal in Ocaml, but it shouldn't be too bad in this case, the AST is pretty small.

@kg, I'd be fine with making that a requirement, if others prefer so as well. But what ambiguity do you worry about specifically? With the current change, almost any sequence of characters that can't be mistaken for a number is allowed (Lisp-style), so any of the above could be used by convention.

@kg
Copy link
Contributor

kg commented Aug 19, 2015

Essentially, I think our sexpr grammar should never require context to figure out what a particular set of characters is. This is partially a readability consideration but it's also a parsing one.

For example, stripping symbol names is a thing that will happen in some capacity. In that case, I really, really don't want it to go from getlocal foo to getlocal 0. getlocal :foo -> getlocal :0 is more immediately clear, I think. This matters more in cases where a single sexpr may contain many symbols, like a function definition where we're potentially putting a name, list of argument types, list of argument names, list of local types, etc all next to each other in some particular arrangement. Line breaks might get introduced there.

Your current parse rules are pretty unambiguous, so I don't see a problem there. An alternate solution to my concern is just to ensure that numeric symbols are disambiguated in some way from raw integer literals.

@rossberg
Copy link
Member Author

@kg, as long as we avoid tagless S-expr nodes, this is a fairly simple problem, because tags and leafs can always be distinguished trivially. Then you only need to disambiguate different leaf types, which is what you are getting at in the last paragraph, I suppose?

@rossberg
Copy link
Member Author

Okay, I figured out a way to avoid laziness -- and it's even simpler than before. A tiny extra suspension in the right place when parsing functions is all you need. No extra pass. :)

@lukewagner
Copy link
Member

Hah, nicely done sir! After rebasing over the conflict with named exports, then we'll have names on both sides of the export. Since we don't exactly have a daunting text suite to update, do you think we could just remove the unnamed alternatives for func/local declaration and update the tests to use names? Either way, lgtm.

@rossberg
Copy link
Member Author

@kg, enforcing symbolic names starting with $ now.

@lukewagner, changed other tests to use symbolic names.

@kg
Copy link
Contributor

kg commented Aug 20, 2015

lgtm

@lukewagner
Copy link
Member

@rossberg-chromium That's great; what do you think about removing support for the nameless declarations?

@rossberg
Copy link
Member Author

@lukewagner, hm, why? Isn't it useful to still be able to express the "raw" AST format, too? S-expr generators might also benefit.

rossberg added a commit that referenced this pull request Aug 21, 2015
@rossberg rossberg merged commit 93441b0 into master Aug 21, 2015
@rossberg rossberg deleted the named-variables branch August 21, 2015 05:37
@lukewagner
Copy link
Member

@rossberg-chromium I don't see any real expressiveness benefits by allowing unnamed but it also doesn't really matter so I'll drop it.

eqrion pushed a commit to eqrion/wasm-spec that referenced this pull request Jul 18, 2019
* [spec] Initial documentation of syntax

Includes the basic structure needed for the proposal, but no validation,
binary, text or execution.
eqrion pushed a commit to eqrion/wasm-spec that referenced this pull request Sep 18, 2019
alexcrichton pushed a commit to alexcrichton/spec that referenced this pull request Nov 18, 2019
Fixes WebAssembly#12.

The effect of these dynamic constructors can be achieved with an S.const
instruction followed by a sequence of S.replaceLane instructions. This
is also what LLVM does.
rossberg referenced this pull request in effect-handlers/wasm-spec Feb 15, 2021
dhil pushed a commit to dhil/webassembly-spec that referenced this pull request Mar 2, 2023
dhil pushed a commit to dhil/webassembly-spec that referenced this pull request Mar 2, 2023
backes pushed a commit to backes/spec that referenced this pull request Jul 12, 2023
dhil pushed a commit to dhil/webassembly-spec that referenced this pull request Oct 3, 2023
dhil added a commit to dhil/webassembly-spec that referenced this pull request Nov 13, 2023
This patch pulls in the recent changes to WebAssembly/spec, WebAssembly/function-references, and WebAssembly/gc.
dhil pushed a commit to dhil/webassembly-spec that referenced this pull request Apr 12, 2024
rossberg pushed a commit that referenced this pull request Jul 19, 2024
* Fix binary grammar definition of the branch hints custom section

The overall section structure definition wasm missing.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants