-
Notifications
You must be signed in to change notification settings - Fork 478
Support symbolic variables #12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I did a lot of thinking/experimenting on this and discussed it with @jfbastien in the past. It might be easier to discuss it over a call and then document it here after-the-fact. I'll try to write up my thoughts and conclusions at length though, once I've had time to understand your changes here. |
I'm totally in favor of having names in the S-Expr language; we'll be writing a lot of these, so names and readability are a win as long as we keep it out of the AST/semantics. I'm a little sad to see the laziness enter ast.ml/eval.ml, since those are supposed to be our little temples of speciness and this is an impl detail leaking out from SExpr parsing limitations. At the cost of making SExpr parsing more verbose (but simpler, I think), what if we introduced a new SExpr ADT (which stored all names as simple strings) which was generated by parser.mly and then converted to an Ast.modul in a second pass (which could then easily do the name resolution)? The SExpr ADT wouldn't have to duplicate all of ast.ml; it could use generic "unary" and "binary" nodes for most expressions and only have meaningful name-related nodes. Another potential benefit is that, right now parser.mly is rather tightly coupled with Ast.modul and has magic ocamlyacc syntax and I expect this would get much simpler with the new intermediate. |
Yes, I agree the lazy leaking into the AST is sad. It was the one smallish price to pay for avoiding an extra representation and pass. But I'm happy to discuss other options. FWIW, though, I had actually started out the prototype just like you suggest: having a simple parser that just creates a generic S-expr tree and then transforms that in a separate pass. The problem was that the transformation then had to manually lex all the mnemonics to tokenise and decompose them. Trust me, that was -way- more complicated and ugly than using mllex's declarative syntax, which is why I abandoned it. So if you want to avoid that, you'd still want the intermediate ADT to already have all the opcodes resolved. Thus, it would pretty much duplicate the AST. Not that I wouldn't be okay with that, just pointing out. Can you elaborate what magic ocamlyacc syntax you are referring to? At least this change doesn't add anything magic, AFAICT. |
My prototype had a separate SExpr AST and grammar, then a transform to map SExpr to the wasm AST. Symbol parsing occurred in the SExpr parser, and mapping symbol names to integer IDs happened in the SExpr -> wasm AST stage. That allows us to avoid having any laziness in the AST (I think we definitely don't want anything fancy in there). The general model was that during sexpr -> wasm parsing, it would encounter symbols (i.e. When converting a SExpr module to a wasm module, a symbol table is generated so that the See https://github.com/WebAssembly/semantics-prototype/blob/master/sexpr/sexpr.fs , https://github.com/WebAssembly/semantics-prototype/blob/master/text-decoder/sexpr-to-ast.fs and https://github.com/WebAssembly/semantics-prototype/blob/master/text-decoder/symbols.fs . I definitely don't think we should have laziness in the wasm AST. I think it should have symbolic references (distinct from just bare ints), and potentially have names attached to them. Tooling will absolutely require names, but it's possible we want to leave that concept out of the formal spec and build it into a separate reference toolchain that we encourage people to use. |
Ah, sorry. And for the record, here's what a toy module definition looked like in my grammar, using named symbols. https://github.com/WebAssembly/semantics-prototype/blob/master/test.fsx#L36 In that context you can replace all the named symbols with integers and it works the same. (You'll also note that my grammar had the concept of keywords, for things like the type int32. I think we need this in cases where we don't have the ability to pack the type into the opcode name.) |
On 19 August 2015 at 16:19, Katelyn Gadd [email protected] wrote:
Oh, I agree that symbol conversion is straightforward (which is why I was My concern was about Luke's broader suggestion of doing generic But I'm fine with having an intermediate AST with symbolic names. See
Yes, I don't think the actual AST should be polluted with symbolic |
@rossberg-chromium Oops, I didn't mean to imply a fully generic/unstructured SExpr AST; it definitely makes sense to leverage lex/yacc and capture the results in the intermediate AST. The "magic" I was referring to was module_fields but this was more owing to my lack of familiarity and on second glance it looks fine so n/m, sorry. Actually, it seems like you could use Ast.modul as the intermediate with |
FTR since I didn't clearly express this, after looking over this commit it looks like symbols are just bare identifiers, like opcode names and types? I very strongly believe they should have some sort of syntactic disambiguation, like |
@lukewagner, ah, ok, seems I misread what you said. I will look into making name resolution a separate pass. No geenric traversal in Ocaml, but it shouldn't be too bad in this case, the AST is pretty small. @kg, I'd be fine with making that a requirement, if others prefer so as well. But what ambiguity do you worry about specifically? With the current change, almost any sequence of characters that can't be mistaken for a number is allowed (Lisp-style), so any of the above could be used by convention. |
Essentially, I think our sexpr grammar should never require context to figure out what a particular set of characters is. This is partially a readability consideration but it's also a parsing one. For example, stripping symbol names is a thing that will happen in some capacity. In that case, I really, really don't want it to go from Your current parse rules are pretty unambiguous, so I don't see a problem there. An alternate solution to my concern is just to ensure that numeric symbols are disambiguated in some way from raw integer literals. |
@kg, as long as we avoid tagless S-expr nodes, this is a fairly simple problem, because tags and leafs can always be distinguished trivially. Then you only need to disambiguate different leaf types, which is what you are getting at in the last paragraph, I suppose? |
Okay, I figured out a way to avoid laziness -- and it's even simpler than before. A tiny extra suspension in the right place when parsing functions is all you need. No extra pass. :) |
Hah, nicely done sir! After rebasing over the conflict with named exports, then we'll have names on both sides of the export. Since we don't exactly have a daunting text suite to update, do you think we could just remove the unnamed alternatives for func/local declaration and update the tests to use names? Either way, lgtm. |
…into named-variables Conflicts: ml-proto/README.md ml-proto/src/eval.ml ml-proto/src/parser.mly ml-proto/test/expected-output/fac.wasm.log ml-proto/test/fac.wasm
…ositions are extracted eagerly
@kg, enforcing symbolic names starting with $ now. @lukewagner, changed other tests to use symbolic names. |
lgtm |
@rossberg-chromium That's great; what do you think about removing support for the nameless declarations? |
@lukewagner, hm, why? Isn't it useful to still be able to express the "raw" AST format, too? S-expr generators might also benefit. |
@rossberg-chromium I don't see any real expressiveness benefits by allowing unnamed but it also doesn't really matter so I'll drop it. |
* [spec] Initial documentation of syntax Includes the basic structure needed for the proposal, but no validation, binary, text or execution.
Fixes WebAssembly#12. The effect of these dynamic constructors can be achieved with an S.const instruction followed by a sequence of S.replaceLane instructions. This is also what LLVM does.
Handle issues 7, 9, and 10.
Addresses WebAssembly#12.
This patch pulls in the recent changes to WebAssembly/spec, WebAssembly/function-references, and WebAssembly/gc.
Merge upstream
* Fix binary grammar definition of the branch hints custom section The overall section structure definition wasm missing.
@lukewagner @kg , this makes the parser a bit more ugly, but I suppose it's useful for hand-written code. I tried to keep all of it out of the AST. The only compromise is that vars are now lazy integers (to deal with forward references w/o an extra pass). WDYT?