Skip to content

Binary encoder & decoder #268

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Apr 20, 2016
Merged
50 changes: 39 additions & 11 deletions ml-proto/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,13 @@ This repository implements a prototypical reference interpreter for WebAssembly.
Currently, it can

* *parse* a simple S-expression format,
* *decode* the binary format (work in progress),
* *validate* modules defined in it,
* *execute* invocations to functions exported by a module.
* *execute* invocations to functions exported by a module,
* *encode* the binary format,
* *prettyprint* the S-expression format (work in progress).

The file format is a (very dumb) form of *script* that cannot just define a module, but also batch a sequence of invocations.
The S-expression format is a (very dumb) form of *script* that cannot just define a module, but in fact a sequence of them, and a batch of invocations, assertions, and conversions to each one. As such it is different from the binary format, with the additional functionality purely intended as testing infrastructure. (See [below](#scripts) for details.)

The interpreter can also be run as a REPL, allowing to enter pieces of scripts interactively.

Expand Down Expand Up @@ -61,17 +64,34 @@ Either way, in order to run the test suite you'll need to have Python installed.
You can call the executable with

```
wasm [option] [file ...]
wasm [option | file ...]
```

where `file` is a script file (see below) to be run. If no file is given, you'll get into the REPL and can enter script commands interactively. You can also get into the REPL by explicitly passing `-` as a file name. You can do that in combination to giving a module file, so that you can then invoke its exports interactively, e.g.:
where `file`, depending on its extension, either should be an S-expression script file (see below) to be run, or a binary module file to be loaded.

A file prefixed by `-o` is taken to be an output file. Depending on its extension, this will write out the preceding module definition in either S-expression or binary format. This option can be used to convert between the two in both directions, e.g.:

```
./wasm module.wast -
wasm -d module.wast -o module.wasm
wasm -d module.wasm -o module.wast
```
Note however that the REPL currently is too dumb to allow multi-line input. :)

See `wasm -h` for (the few) options.
The `-d` option selects "dry mode" and ensures that the input module is not run, even if it has a start section.
In the second case, the produced script contains exactly one module definition (work in progress).

Finally, the option `-e` allows to provide arbitrary script commands directly on the command line. For example:

```
wasm module.wasm -e '(invoke "foo")'
```

If neither a file nor any of the previous options is given, you'll land in the REPL and can enter script commands interactively. You can also get into the REPL by explicitly passing `-` as a file name. You can do that in combination to giving a module file, so that you can then invoke its exports interactively, e.g.:

```
wasm module.wast -
```

See `wasm -h` for (the few) additional options.


## S-Expression Syntax
Expand Down Expand Up @@ -168,9 +188,13 @@ cmd:
( assert_return_nan (invoke <name> <expr>* )) ;; assert return with floating point nan result of invocation
( assert_trap (invoke <name> <expr>* ) <failure> ) ;; assert invocation traps with given failure string
( assert_invalid <module> <failure> ) ;; assert invalid module with given failure string
( input <string> ) ;; read script or module from file
( output <string> ) ;; output module to file
```

Invocation is only possible after a module has been defined.
Commands are executed in sequence. Invocation, assertions, and output apply to the most recently defined module (the _current_ module), and are only possible after a module has been defined. Note that there only ever is one current module, the different module definitions cannot interact.

The input and output commands determine the requested file format from the file name extension. They can handle both `.wast` and `.wasm` files. In the case of input, a `.wast` script will be recursively executed.

Again, this is only a meta-level for testing, and not a part of the language proper.

Expand Down Expand Up @@ -202,11 +226,15 @@ The implementation consists of the following parts:

* *Parser* (`lexer.mll`, `parser.mly`, `desguar.ml[i]`). Generated with ocamllex and ocamlyacc. The lexer does the opcode encoding (non-trivial tokens carry e.g. type information as semantic values, as declared in `parser.mly`), the parser the actual S-expression parsing. The parser generates a full AST that is desugared into the kernel AST in a separate pass.

* *Pretty Printer* (`prettyprint.ml[i]`). Turns a module AST back into the textual S-expression format. (Work in progress)

* *Decoder*/*Encoder* (`decode.ml[i]`, `encode.ml[i]`). The former (work in progress) parses the binary format and turns it into an AST, the latter does the inverse.

* *Validator* (`check.ml[i]`). Does a recursive walk of the kernel AST, passing down the *expected* type for expressions, and checking each expression against that. An expected empty type can be matched by any result, corresponding to implicit dropping of unused values (e.g. in a block).

* *Evaluator* (`eval.ml[i]`, `values.ml`, `arithmetic.ml[i]`, `int.ml`, `float.ml`, `memory.ml[i]`, and a few more). Evaluation of control transfer (`br` and `return`) is implemented using local exceptions as "labels". While these are allocated dynamically in the code and addressed via a stack, that is merely to simplify the code. In reality, these would be static jumps.

* *Driver* (`main.ml`, `script.ml[i]`, `error.ml`, `print.ml[i]`, `flags.ml`). Executes scripts, reports results or errors, etc.
* *Driver* (`main.ml`, `run.ml[i]`, `script.ml[i]`, `error.ml`, `print.ml[i]`, `flags.ml`). Executes scripts, reports results or errors, etc.

The most relevant pieces are probably the validator (`check.ml`) and the evaluator (`eval.ml`). They are written to look as much like a "specification" as possible. Hopefully, the code is fairly self-explanatory, at least for those with a passing familiarity with functional programming.

Expand All @@ -215,6 +243,6 @@ In typical FP convention (and for better readability), the code tends to use sin

## What Next?

* Binary format as input and output.
* More tests.

* Compilation to JS/asm.js.
* Compilation to JS/asm.js?
3 changes: 3 additions & 0 deletions ml-proto/given/lib.ml
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
module List =
struct
let rec make n x =
if n = 0 then [] else x :: make (n - 1) x

let rec take n xs =
match n, xs with
| 0, _ -> []
Expand Down
1 change: 1 addition & 0 deletions ml-proto/given/lib.mli
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

module List :
sig
val make : int -> 'a -> 'a list
val take : int -> 'a list -> 'a list
val drop : int -> 'a list -> 'a list

Expand Down
11 changes: 8 additions & 3 deletions ml-proto/given/source.ml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
type pos = {file : string; line : int; column : int}
type region = {left : pos; right : pos}
type 'a phrase = { at : region; it : 'a}
type 'a phrase = {at : region; it : 'a}


(* Positions and regions *)
Expand All @@ -9,9 +9,14 @@ let no_pos = {file = ""; line = 0; column = 0}
let no_region = {left = no_pos; right = no_pos}

let string_of_pos pos =
string_of_int pos.line ^ "." ^ string_of_int (pos.column + 1)
if pos.line = -1 then
string_of_int pos.column
else
string_of_int pos.line ^ "." ^ string_of_int (pos.column + 1)

let string_of_region r =
r.left.file ^ ":" ^ string_of_pos r.left ^ "-" ^ string_of_pos r.right
r.left.file ^ ":" ^ string_of_pos r.left ^
(if r.right = r.left then "" else "-" ^ string_of_pos r.right)

let before region = {left = region.left; right = region.left}
let after region = {left = region.right; right = region.right}
Expand Down
Loading