Skip to content

Commit 6be1c20

Browse files
committed
Continue with docs
1 parent 0ce9bc2 commit 6be1c20

File tree

3 files changed

+143
-49
lines changed

3 files changed

+143
-49
lines changed

truffle/docs/bytecode_dsl/BytecodeDSL.md

Lines changed: 42 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -13,11 +13,11 @@ Though Truffle AST interpreters enjoy excellent peak performance, they can strug
1313
- *Memory footprint*. Trees are not compact data structures. A root node's entire AST, with all of its state (e.g., `@Cached` parameters) must be allocated before it can execute. This allocation is especially detrimental for code that is only executed a handful of times (e.g., bootstrap code).
1414
- *Interpreted performance*. AST interpreters contain many highly polymorphic `execute` call sites that are difficult for the JVM to optimize. These sites pose no problem for runtime-compiled code (where partial evaluation can eliminate the polymorphism), but cold code that runs in the interpreter suffers from poor performance.
1515

16-
Bytecode interpreters enjoy the same peak performance as ASTs, but they can also be encoded with less memory and are more amenable to optimization (e.g., via [host compilation](HostCompilation.md)). Unfortunately, these benefits come at a cost: bytecode interpreters are more difficult and tedious to implement properly. Bytecode DSL simplifies the implementation effort for bytecode interpreters by generating them automatically from AST node-like specifications called "operations".
16+
Bytecode interpreters enjoy the same peak performance as ASTs, but they can also be encoded with less memory and are more amenable to optimization (e.g., via [host compilation](../HostCompilation.md)). Unfortunately, these benefits come at a cost: bytecode interpreters are more difficult and tedious to implement properly. Bytecode DSL reduces the implementation effort by generating a bytecode interpreter automatically from a set of AST node-like specifications called "operations".
1717

1818
## Operations
1919

20-
An operation in Bytecode DSL is an atomic unit of language semantics. Each operation can be executed, performing some computation and optionally returning a value. Operations can be nested together to form a program. As an example, the following pseudocode
20+
An operation in Bytecode DSL is a basic unit of language semantics. Each operation can be executed, performing some computation and optionally returning a value. Operations can be nested together to form a program. As an example, the following pseudocode
2121
```python
2222
if 1 == 2:
2323
print("what")
@@ -39,9 +39,9 @@ Each of these operations has its own execution semantics. For example, the `IfTh
3939

4040
The operations in Bytecode DSL are divided into two groups: built-in and custom.
4141

42-
- Built-in operations come with the DSL itself, and their semantics cannot be changed. They model behaviour that is common across languages, such as control flow (`IfThen`, `While`, etc.), constant accesses (`LoadConstant`) and local variable manipulation (`LoadLocal`, `StoreLocal`). We describe the precise semantics of the built-in operations later in [Built-in Operations](#built-in-operations).
42+
- Built-in operations come with the DSL itself, and their semantics cannot be changed. They model behaviour that is common across languages, such as control flow (`IfThen`, `While`, etc.), constant accesses (`LoadConstant`) and local variable manipulation (`LoadLocal`, `StoreLocal`). We describe the precise semantics of the built-in operations later in [Built-in Operations](UserGuide.md#built-in-operations).
4343

44-
- Custom operations are provided by the language. They model language-specific behaviour, such as the semantics of operators, value conversions, calls, etc. In our previous example, `Equals`, `CallFunction` and `LoadGlobal` are custom operations. There are two kinds of custom operations: regular (eager) operations and short-circuiting operations.
44+
- Custom operations are provided by the language. They model language-specific behaviour, such as the semantics of operators, value conversions, calls, etc. In our previous example, `Equals`, `CallFunction` and `LoadGlobal` are custom operations. We can define two kinds of custom operations: [regular (eager) operations](UserGuide.md#defining-custom-operations) and [short-circuiting operations](UserGuide.md#defining-short-circuiting-custom-operations).
4545

4646
## Simple example
4747

@@ -60,10 +60,10 @@ As an example, let us implement a Bytecode DSL interpreter for a simple language
6060

6161
### Defining the Bytecode class
6262

63-
The entry-point to a Bytecode DSL interpreter is the `@GenerateBytecode` annotation. This annotation must be attached to a class that `extends RootNode` and `implements BytecodeRootNode`:
63+
The entry-point to a Bytecode DSL interpreter is the `@GenerateBytecode` annotation. It should annotate a class that `extends RootNode` and `implements BytecodeRootNode`:
6464

6565
```java
66-
@GenerateBytecode
66+
@GenerateBytecode(...)
6767
public abstract class ExampleBytecodeRootNode extends RootNode implements BytecodeRootNode {
6868
public ExampleBytecodeRootNode(TruffleLanguage<?> language, FrameDescriptor frameDescriptor) {
6969
...
@@ -72,10 +72,10 @@ public abstract class ExampleBytecodeRootNode extends RootNode implements Byteco
7272
``````
7373
The class must have a two-argument constructor that takes a `TruffleLanguage<?>` and a `FrameDescriptor` (or `FrameDescriptor.Builder`). This constructor is used by the generated code to instantiate root nodes, so any other instance fields must be initialized separately.
7474

75-
Inside the bytecode class we define custom operations. Each operation is structured similarly to a Truffle DSL node, except it does not need to be a subclass of `Node` and all of its specializations should be `static`. In our example language, the `+` operator can be expressed with its own operation:
75+
Inside the bytecode class we define [custom operations](UserGuide.md#defining-custom-operations). Each operation is structured similarly to a Truffle DSL node, except it does not need to be a subclass of `Node` and all of its specializations should be `static` (see the `Operation` Javadoc for full details). In our example language, the `+` operator can be expressed with its own operation:
7676

7777
```java
78-
// place inside ExampleBytecodeRootNode
78+
// define inside ExampleBytecodeRootNode
7979
@Operation
8080
public static final class Add {
8181
@Specialization
@@ -94,42 +94,47 @@ public static final class Add {
9494

9595
Within operations, we can use most of the Truffle DSL, including `@Cached` and `@Bind` parameters, guards, and specialization limits. We cannot use features that require node instances, such as `@NodeChild`, `@NodeField`, nor any instance fields or methods.
9696

97-
One limitation of custom operations is that they eagerly evaluate all of their operands. They cannot perform conditional execution, loops, etc. For those use-cases, we have to use the built-in operations or define custom short-circuiting operations.
97+
One limitation of custom operations is that they eagerly evaluate all of their operands. They cannot perform conditional execution, loops, etc. For those use-cases, we have to use the [built-in operations](UserGuide.md#control-flow-operations) or define [custom short-circuiting operations](UserGuide.md#defining-short-circuiting-custom-operations).
9898

99-
From this simple description, the DSL will generate a `ExampleBytecodeRootNodeGen` class that contains a full bytecode interpreter definition.
99+
The bytecode class and its operations define a specification for the interpreter.
100+
From this specification, the DSL generates an entire bytecode interpreter definition inside the `ExampleBytecodeRootNodeGen` class.
100101

101102
### Converting a program to bytecode
102103

103104
In order to execute a guest program, we need to convert it to the bytecode defined by the generated interpreter.
104105
We refer to this process as "parsing" the bytecode root node.
105106
<!-- We refer to the process of converting a guest program to bytecode (and thereby creating a `BytecodeRootNode`) as parsing. -->
106107

107-
To parse a program to a bytecode root node, we encode the program in terms of operations.
108-
We invoke methods on the generated `Builder` class to construct these operations; the builder translates these method calls to a sequence of bytecodes that can be executed by the generated interpreter.
109-
110-
111-
For this example, let's assume the guest program has already been parsed to an AST as follows:
108+
For this example, let's assume the guest program can be parsed to an AST with the following node kinds:
112109

113110
```java
114111
class Expr { }
115112
class AddExpr extends Expr { Expr left; Expr right; }
116113
class IntExpr extends Expr { int value; }
117114
class StringExpr extends Expr { String value; }
118115
```
119-
Let's also assume there is a simple visitor pattern implemented over the AST.
120116

121-
The expression `1 + 2` can be expressed as operations `(Add (LoadConstant 1) (LoadConstant 2))`. It can be parsed using the following sequence of builder calls:
117+
To parse a program to a bytecode root node, we use the generated `Builder` class to encode the program as a "tree" of operations.
118+
For each operation `X`, the builder defines `beginX` and `endX` methods that can be used to encode the operation.
119+
Simple operations that have no data dependency (i.e., no children) instead have `emitX` methods.
120+
The builder translates calls to these methods to a flat sequence of bytecodes that can be executed by the generated interpreter.
121+
122+
As an example, the expression `1 + (2 + 3)` can be expressed as operations `(Add (LoadConstant 1) (Add (LoadConstant 2) (LoadConstant 3)))`. It can be parsed using the following sequence of builder calls:
122123

123124
```java
124125
b.beginAdd();
125-
b.emitLoadConstant(1);
126-
b.emitLoadConstant(2);
126+
b.emitLoadConstant(1);
127+
b.beginAdd();
128+
b.emitLoadConstant(2);
129+
b.emitLoadConstant(3);
130+
b.endAdd();
127131
b.endAdd();
128132
```
129133

130-
You can think of the `beginX` and `endX` as opening and closing `<X>` and `</X>` XML tags, while `emitX` is the empty tag `<X/>` used when the operation does not take children. Each operation has either `beginX` and `endX` methods or an `emitX` method.
134+
This sequence of calls automatically produces bytecode to perform the computation.
131135

132-
We can then write a visitor to construct bytecode from the AST representation:
136+
Observe that the sequence of builder calls is essentially a traversal of the operations "tree".
137+
A simple way to encode this traversal is with a visitor over the AST:
133138

134139
```java
135140
class ExampleBytecodeVisitor implements ExprVisitor {
@@ -156,14 +161,14 @@ class ExampleBytecodeVisitor implements ExprVisitor {
156161
}
157162
```
158163

159-
Now that we have a visitor, we can define a `parse` method. This method converts an AST to a `ExampleBytecodeRootNode`, which can then be executed by the language runtime:
164+
Now that we have a visitor, we can define a top-level `parse` method. This method converts an AST to an `ExampleBytecodeRootNode`, which can then be executed by the language runtime:
160165

161166
```java
162167
public static ExampleBytecodeRootNode parseExample(ExampleLanguage language, Expr program) {
163168
var nodes = ExampleBytecodeRootNodeGen.create(
164169
BytecodeConfig.DEFAULT,
165170
builder -> {
166-
// Root operation must enclose each function. It is further explained later.
171+
// Root operation must enclose each function. See the User Guide for details.
167172
builder.beginRoot(language);
168173

169174
// This root node returns the result of executing the expression,
@@ -184,13 +189,21 @@ public static ExampleBytecodeRootNode parseExample(ExampleLanguage language, Exp
184189
}
185190
```
186191

187-
We first invoke the `ExampleBytecodeRootNodeGen#create` function, which is the entry-point for parsing. Its first argument is a `BytecodeConfig`, which defines a parsing mode. `BytecodeConfig.DEFAULT` will suffice for our purposes (there are other modes that include source positions and/or instrumentation info; see [Reparsing](#reparsing)).
192+
We first invoke the `ExampleBytecodeRootNodeGen#create` function, which is the entry-point for parsing.
193+
Its first argument is a `BytecodeConfig`, which defines a [parsing mode](UserGuide.md#parsing-modes).
194+
The default mode is sufficient for most use cases.
195+
196+
The second argument is the parser. The parser implements the `BytecodeParser` functional interface, which uses a supplied `Builder` argument to parse a guest language program.
197+
In this example, the parser uses the visitor to parse `program`, wrapping the operations in `Root` and `Return` operations.
198+
The parser must be deterministic (i.e., each parse should produce the same sequence of `Builder` calls), since it may be called more than once to implement [reparsing](#reparsing).
188199

189-
The second argument is the parser. The parser is an implementation of the `BytecodeParser` functional interface, which is responsible for parsing a program using a given `Builder` parameter.
190-
In this example, the parser uses the visitor to parse `program`, wrapping the operations within `Root` and `Return` operations.
191-
The parser must be deterministic (i.e., if invoked multiple times, it should invoke the same sequence of `Builder` methods), since it may be called more than once to implement reparsing (see [Reparsing](#reparsing)).
200+
The result is a `BytecodeNodes` instance, which acts as a wrapper class for the `BytecodeRootNode`s produced by the parse (along with other shared information). The nodes can be extracted using `getNode()` or `getNodes()`.
192201

193-
The result is a `BytecodeNodes` instance, which acts as a wrapper class for the `BytecodeRootNode`s produced by the parse (along with other shared information). The nodes can be extracted using the `getNode()` or `getNodes()`.
202+
And that's it! The `parse` method returns a root node containing a sequence of bytecode.
203+
When the root node is invoked, it executes the bytecode using the generated bytecode interpreter.
194204

195-
And that's it! During parsing, the builder generates a sequence of bytecode for each root node. The generated bytecode interpreter executes this bytecode sequence when a root node is executed.
205+
## Next steps
196206

207+
This introduction covers the basics of Bytecode DSL.
208+
For more specific usage information, consult the [User guide](UserGuide.md) and [Javadoc](https://www.graalvm.org/truffle/javadoc/com/oracle/truffle/api/bytecode/package-summary.html).
209+
The Bytecode DSL implementations for [SimpleLanguage](https://github.com/oracle/graal/blob/master/truffle/src/com.oracle.truffle.sl/src/com/oracle/truffle/sl/bytecode/SLBytecodeRootNode.java) and [GraalPython](https://github.com/oracle/graalpython/blob/master/graalpython/com.oracle.graal.python/src/com/oracle/graal/python/nodes/bytecode_dsl/PBytecodeDSLRootNode.java) may also be useful references.

truffle/docs/bytecode_dsl/Tracing.md renamed to truffle/docs/bytecode_dsl/Optimization.md

Lines changed: 38 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,51 @@
11
# Optimization
22

3-
Bytecode interpreters commonly employ [quickening](https://dl.acm.org/doi/10.1145/1869631.1869633) and [superinstructions](https://dl.acm.org/doi/abs/10.1145/1059579.1059583) to achieve better interpreted performance. This section discusses how to employ these optimizations in Bytecode DSL interpreters.
3+
Bytecode interpreters commonly employ a variety of optimizations to achieve better interpreted performance.
4+
This section discusses how to employ these optimizations in Bytecode DSL interpreters.
5+
6+
## Boxing elimination
7+
8+
A major source of overhead in interpreted code (for both Truffle AST and bytecode interpreters) is boxing.
9+
By default, values are passed between operations as objects, which forces primitive values to be boxed up.
10+
Often, the boxed value is subsequently unboxed when it gets consumed.
11+
12+
Boxing elimination avoids these unnecessary boxing steps.
13+
The interpreter can speculatively rewrite bytecode to pass primitive values between operations whenever possible.
14+
15+
To enable boxing elimination, specify a set of `boxingEliminationTypes` on the `@GenerateBytecode` annotation. For example, the following configuration
16+
17+
```
18+
@GenerateBytecode(
19+
...
20+
boxingEliminationTypes = {long.class, boolean.class}
21+
)
22+
```
23+
24+
will instruct the interpreter to automatically avoid boxing for `long` and `boolean` values.
25+
26+
Boxing elimination is implemented using quickening, which is described below.
427

528
## Quickening
629

7-
**TODO**: talk about how quickening works, how it connects with BE, @ForceQuickening, and tracing
30+
[Quickening](https://dl.acm.org/doi/10.1145/1869631.1869633) is a general technique to rewrite an instruction with a specialized version that (typically) requires less work.
31+
Bytecode DSL supports quickened operations, which handle a subset of the specializations defined by an operation.
32+
33+
Quickened operations can reduce the amount of work required to evaluate an operation.
34+
For example, a quickened operation that only accepts `int` inputs can avoid operand boxing and the additional type checks required by the general operation.
35+
36+
Quickened instructions can be automatically derived using [tracing](#tracing), or specified manually using `@ForceQuickening`.
37+
838

939
## Superinstructions
1040

1141
**Note: Superinstructions are not yet supported**.
1242

43+
[Superinstructions](https://dl.acm.org/doi/abs/10.1145/1059579.1059583) combine common sequences of instructions together into single instructions.
44+
Using superinstructions can reduce the overhead of instruction dispatch, and it can enable the host compiler to perform optimizations across the instructions (e.g., eliding a stack push for a value that is subsequently popped).
45+
46+
Superinstructions can be automatically derived using [tracing](#tracing).
47+
48+
1349
## Tracing
1450

1551
**Note: Tracing is not yet supported**.

0 commit comments

Comments
 (0)