You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: truffle/docs/bytecode_dsl/BytecodeDSL.md
+42-29Lines changed: 42 additions & 29 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,11 +13,11 @@ Though Truffle AST interpreters enjoy excellent peak performance, they can strug
13
13
-*Memory footprint*. Trees are not compact data structures. A root node's entire AST, with all of its state (e.g., `@Cached` parameters) must be allocated before it can execute. This allocation is especially detrimental for code that is only executed a handful of times (e.g., bootstrap code).
14
14
-*Interpreted performance*. AST interpreters contain many highly polymorphic `execute` call sites that are difficult for the JVM to optimize. These sites pose no problem for runtime-compiled code (where partial evaluation can eliminate the polymorphism), but cold code that runs in the interpreter suffers from poor performance.
15
15
16
-
Bytecode interpreters enjoy the same peak performance as ASTs, but they can also be encoded with less memory and are more amenable to optimization (e.g., via [host compilation](HostCompilation.md)). Unfortunately, these benefits come at a cost: bytecode interpreters are more difficult and tedious to implement properly. Bytecode DSL simplifies the implementation effort for bytecode interpreters by generating them automatically from AST node-like specifications called "operations".
16
+
Bytecode interpreters enjoy the same peak performance as ASTs, but they can also be encoded with less memory and are more amenable to optimization (e.g., via [host compilation](../HostCompilation.md)). Unfortunately, these benefits come at a cost: bytecode interpreters are more difficult and tedious to implement properly. Bytecode DSL reduces the implementation effort by generating a bytecode interpreter automatically from a set of AST node-like specifications called "operations".
17
17
18
18
## Operations
19
19
20
-
An operation in Bytecode DSL is an atomic unit of language semantics. Each operation can be executed, performing some computation and optionally returning a value. Operations can be nested together to form a program. As an example, the following pseudocode
20
+
An operation in Bytecode DSL is a basic unit of language semantics. Each operation can be executed, performing some computation and optionally returning a value. Operations can be nested together to form a program. As an example, the following pseudocode
21
21
```python
22
22
if1==2:
23
23
print("what")
@@ -39,9 +39,9 @@ Each of these operations has its own execution semantics. For example, the `IfTh
39
39
40
40
The operations in Bytecode DSL are divided into two groups: built-in and custom.
41
41
42
-
- Built-in operations come with the DSL itself, and their semantics cannot be changed. They model behaviour that is common across languages, such as control flow (`IfThen`, `While`, etc.), constant accesses (`LoadConstant`) and local variable manipulation (`LoadLocal`, `StoreLocal`). We describe the precise semantics of the built-in operations later in [Built-in Operations](#built-in-operations).
42
+
- Built-in operations come with the DSL itself, and their semantics cannot be changed. They model behaviour that is common across languages, such as control flow (`IfThen`, `While`, etc.), constant accesses (`LoadConstant`) and local variable manipulation (`LoadLocal`, `StoreLocal`). We describe the precise semantics of the built-in operations later in [Built-in Operations](UserGuide.md#built-in-operations).
43
43
44
-
- Custom operations are provided by the language. They model language-specific behaviour, such as the semantics of operators, value conversions, calls, etc. In our previous example, `Equals`, `CallFunction` and `LoadGlobal` are custom operations. There are two kinds of custom operations: regular (eager) operations and short-circuiting operations.
44
+
- Custom operations are provided by the language. They model language-specific behaviour, such as the semantics of operators, value conversions, calls, etc. In our previous example, `Equals`, `CallFunction` and `LoadGlobal` are custom operations. We can define two kinds of custom operations: [regular (eager) operations](UserGuide.md#defining-custom-operations) and [short-circuiting operations](UserGuide.md#defining-short-circuiting-custom-operations).
45
45
46
46
## Simple example
47
47
@@ -60,10 +60,10 @@ As an example, let us implement a Bytecode DSL interpreter for a simple language
60
60
61
61
### Defining the Bytecode class
62
62
63
-
The entry-point to a Bytecode DSL interpreter is the `@GenerateBytecode` annotation. This annotation must be attached to a class that `extends RootNode` and `implements BytecodeRootNode`:
63
+
The entry-point to a Bytecode DSL interpreter is the `@GenerateBytecode` annotation. It should annotate a class that `extends RootNode` and `implements BytecodeRootNode`:
@@ -72,10 +72,10 @@ public abstract class ExampleBytecodeRootNode extends RootNode implements Byteco
72
72
``````
73
73
The classmust have a two-argument constructor that takes a `TruffleLanguage<?>` and a `FrameDescriptor` (or `FrameDescriptor.Builder`). This constructor is used by the generated code to instantiate root nodes, so any other instance fields must be initialized separately.
74
74
75
-
Inside the bytecode classwe define custom operations. Each operation is structured similarly to a Truffle DSL node, except it does not need to be a subclassof `Node` and all of its specializations should be `static`. In our example language, the `+` operator can be expressed with its own operation:
75
+
Inside the bytecode classwe define [custom operations](UserGuide.md#defining-custom-operations). Each operation is structured similarly to a Truffle DSL node, except it does not need to be a subclassof `Node` and all of its specializations should be `static` (see the `Operation` Javadoc for full details). In our example language, the `+` operator can be expressed with its own operation:
76
76
77
77
```java
78
-
//place inside ExampleBytecodeRootNode
78
+
//define inside ExampleBytecodeRootNode
79
79
@Operation
80
80
publicstaticfinalclassAdd {
81
81
@Specialization
@@ -94,42 +94,47 @@ public static final class Add {
94
94
95
95
Within operations, we can use most of the Truffle DSL, including `@Cached` and `@Bind` parameters, guards, and specialization limits. We cannot use features that require node instances, such as `@NodeChild`, `@NodeField`, nor any instance fields or methods.
96
96
97
-
One limitation of custom operations is that they eagerly evaluate all of their operands. They cannot perform conditional execution, loops, etc. For those use-cases, we have to use the built-in operations or define custom short-circuiting operations.
97
+
One limitation of custom operations is that they eagerly evaluate all of their operands. They cannot perform conditional execution, loops, etc. For those use-cases, we have to use the [built-in operations](UserGuide.md#control-flow-operations) or define [custom short-circuiting operations](UserGuide.md#defining-short-circuiting-custom-operations).
98
98
99
-
From this simple description, the DSL will generate a `ExampleBytecodeRootNodeGen` class that contains a full bytecode interpreter definition.
99
+
The bytecode class and its operations define a specification for the interpreter.
100
+
From this specification, the DSL generates an entire bytecode interpreter definition inside the `ExampleBytecodeRootNodeGen` class.
100
101
101
102
### Converting a program to bytecode
102
103
103
104
In order to execute a guest program, we need to convert it to the bytecode defined by the generated interpreter.
104
105
We refer to this process as "parsing" the bytecode root node.
105
106
<!-- We refer to the process of converting a guest program to bytecode (and thereby creating a `BytecodeRootNode`) as parsing. -->
106
107
107
-
To parse a program to a bytecode root node, we encode the program in terms of operations.
108
-
We invoke methods on the generated `Builder` class to construct these operations; the builder translates these method calls to a sequence of bytecodes that can be executed by the generated interpreter.
109
-
110
-
111
-
For this example, let's assume the guest program has already been parsed to an AST as follows:
108
+
For this example, let's assume the guest program can be parsed to an AST with the following node kinds:
Let's also assume there is a simple visitor pattern implemented over the AST.
120
116
121
-
The expression `1 + 2` can be expressed as operations `(Add (LoadConstant 1) (LoadConstant 2))`. It can be parsed using the following sequence of builder calls:
117
+
To parse a program to a bytecode root node, we use the generated `Builder` class to encode the program as a "tree" of operations.
118
+
For each operation `X`, the builder defines `beginX` and `endX` methods that can be used to encode the operation.
119
+
Simple operations that have no data dependency (i.e., no children) instead have `emitX` methods.
120
+
The builder translates calls to these methods to a flat sequence of bytecodes that can be executed by the generated interpreter.
121
+
122
+
As an example, the expression `1 + (2 + 3)` can be expressed as operations `(Add (LoadConstant 1) (Add (LoadConstant 2) (LoadConstant 3)))`. It can be parsed using the following sequence of builder calls:
122
123
123
124
```java
124
125
b.beginAdd();
125
-
b.emitLoadConstant(1);
126
-
b.emitLoadConstant(2);
126
+
b.emitLoadConstant(1);
127
+
b.beginAdd();
128
+
b.emitLoadConstant(2);
129
+
b.emitLoadConstant(3);
130
+
b.endAdd();
127
131
b.endAdd();
128
132
```
129
133
130
-
You can think of the `beginX` and `endX` as opening and closing `<X>` and `</X>` XML tags, while `emitX` is the empty tag `<X/>` used when the operation does not take children. Each operation has either `beginX` and `endX` methods or an `emitX` method.
134
+
This sequence of calls automatically produces bytecode to perform the computation.
131
135
132
-
We can then write a visitor to construct bytecode from the AST representation:
136
+
Observe that the sequence of builder calls is essentially a traversal of the operations "tree".
137
+
A simple way to encode this traversal is with a visitor over the AST:
@@ -156,14 +161,14 @@ class ExampleBytecodeVisitor implements ExprVisitor {
156
161
}
157
162
```
158
163
159
-
Now that we have a visitor, we can define a `parse` method. This method converts an AST to a`ExampleBytecodeRootNode`, which can then be executed by the language runtime:
164
+
Now that we have a visitor, we can define a top-level `parse` method. This method converts an AST to an`ExampleBytecodeRootNode`, which can then be executed by the language runtime:
// Root operation must enclose each function. It is further explained later.
171
+
// Root operation must enclose each function. See the User Guide for details.
167
172
builder.beginRoot(language);
168
173
169
174
// This root node returns the result of executing the expression,
@@ -184,13 +189,21 @@ public static ExampleBytecodeRootNode parseExample(ExampleLanguage language, Exp
184
189
}
185
190
```
186
191
187
-
We first invoke the `ExampleBytecodeRootNodeGen#create` function, which is the entry-point for parsing. Its first argument is a `BytecodeConfig`, which defines a parsing mode. `BytecodeConfig.DEFAULT` will suffice for our purposes (there are other modes that include source positions and/or instrumentation info; see [Reparsing](#reparsing)).
192
+
We first invoke the `ExampleBytecodeRootNodeGen#create` function, which is the entry-point for parsing.
193
+
Its first argument is a `BytecodeConfig`, which defines a [parsing mode](UserGuide.md#parsing-modes).
194
+
The default mode is sufficient for most use cases.
195
+
196
+
The second argument is the parser. The parser implements the `BytecodeParser` functional interface, which uses a supplied `Builder` argument to parse a guest language program.
197
+
In this example, the parser uses the visitor to parse `program`, wrapping the operations in `Root` and `Return` operations.
198
+
The parser must be deterministic (i.e., each parse should produce the same sequence of `Builder` calls), since it may be called more than once to implement [reparsing](#reparsing).
188
199
189
-
The second argument is the parser. The parser is an implementation of the `BytecodeParser` functional interface, which is responsible for parsing a program using a given `Builder` parameter.
190
-
In this example, the parser uses the visitor to parse `program`, wrapping the operations within `Root` and `Return` operations.
191
-
The parser must be deterministic (i.e., if invoked multiple times, it should invoke the same sequence of `Builder` methods), since it may be called more than once to implement reparsing (see [Reparsing](#reparsing)).
200
+
The result is a `BytecodeNodes` instance, which acts as a wrapper class for the `BytecodeRootNode`s produced by the parse (along with other shared information). The nodes can be extracted using `getNode()` or `getNodes()`.
192
201
193
-
The result is a `BytecodeNodes` instance, which acts as a wrapper class for the `BytecodeRootNode`s produced by the parse (along with other shared information). The nodes can be extracted using the `getNode()` or `getNodes()`.
202
+
And that's it! The `parse` method returns a root node containing a sequence of bytecode.
203
+
When the root node is invoked, it executes the bytecode using the generated bytecode interpreter.
194
204
195
-
And that's it! During parsing, the builder generates a sequence of bytecode for each root node. The generated bytecode interpreter executes this bytecode sequence when a root node is executed.
205
+
## Next steps
196
206
207
+
This introduction covers the basics of Bytecode DSL.
208
+
For more specific usage information, consult the [User guide](UserGuide.md) and [Javadoc](https://www.graalvm.org/truffle/javadoc/com/oracle/truffle/api/bytecode/package-summary.html).
209
+
The Bytecode DSL implementations for [SimpleLanguage](https://github.com/oracle/graal/blob/master/truffle/src/com.oracle.truffle.sl/src/com/oracle/truffle/sl/bytecode/SLBytecodeRootNode.java) and [GraalPython](https://github.com/oracle/graalpython/blob/master/graalpython/com.oracle.graal.python/src/com/oracle/graal/python/nodes/bytecode_dsl/PBytecodeDSLRootNode.java) may also be useful references.
Copy file name to clipboardExpand all lines: truffle/docs/bytecode_dsl/Optimization.md
+38-2Lines changed: 38 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,15 +1,51 @@
1
1
# Optimization
2
2
3
-
Bytecode interpreters commonly employ [quickening](https://dl.acm.org/doi/10.1145/1869631.1869633) and [superinstructions](https://dl.acm.org/doi/abs/10.1145/1059579.1059583) to achieve better interpreted performance. This section discusses how to employ these optimizations in Bytecode DSL interpreters.
3
+
Bytecode interpreters commonly employ a variety of optimizations to achieve better interpreted performance.
4
+
This section discusses how to employ these optimizations in Bytecode DSL interpreters.
5
+
6
+
## Boxing elimination
7
+
8
+
A major source of overhead in interpreted code (for both Truffle AST and bytecode interpreters) is boxing.
9
+
By default, values are passed between operations as objects, which forces primitive values to be boxed up.
10
+
Often, the boxed value is subsequently unboxed when it gets consumed.
11
+
12
+
Boxing elimination avoids these unnecessary boxing steps.
13
+
The interpreter can speculatively rewrite bytecode to pass primitive values between operations whenever possible.
14
+
15
+
To enable boxing elimination, specify a set of `boxingEliminationTypes` on the `@GenerateBytecode` annotation. For example, the following configuration
will instruct the interpreter to automatically avoid boxing for `long` and `boolean` values.
25
+
26
+
Boxing elimination is implemented using quickening, which is described below.
4
27
5
28
## Quickening
6
29
7
-
**TODO**: talk about how quickening works, how it connects with BE, @ForceQuickening, and tracing
30
+
[Quickening](https://dl.acm.org/doi/10.1145/1869631.1869633) is a general technique to rewrite an instruction with a specialized version that (typically) requires less work.
31
+
Bytecode DSL supports quickened operations, which handle a subset of the specializations defined by an operation.
32
+
33
+
Quickened operations can reduce the amount of work required to evaluate an operation.
34
+
For example, a quickened operation that only accepts `int` inputs can avoid operand boxing and the additional type checks required by the general operation.
35
+
36
+
Quickened instructions can be automatically derived using [tracing](#tracing), or specified manually using `@ForceQuickening`.
37
+
8
38
9
39
## Superinstructions
10
40
11
41
**Note: Superinstructions are not yet supported**.
12
42
43
+
[Superinstructions](https://dl.acm.org/doi/abs/10.1145/1059579.1059583) combine common sequences of instructions together into single instructions.
44
+
Using superinstructions can reduce the overhead of instruction dispatch, and it can enable the host compiler to perform optimizations across the instructions (e.g., eliding a stack push for a value that is subsequently popped).
45
+
46
+
Superinstructions can be automatically derived using [tracing](#tracing).
0 commit comments