Continue with docs

DSouzaM · DSouzaM · commit 6be1c2032c29 · 2024-01-24T08:50:26.000-05:00
diff --git a/truffle/docs/bytecode_dsl/BytecodeDSL.md b/truffle/docs/bytecode_dsl/BytecodeDSL.md
@@ -13,11 +13,11 @@ Though Truffle AST interpreters enjoy excellent peak performance, they can strug
 - *Memory footprint*. Trees are not compact data structures. A root node's entire AST, with all of its state (e.g., `@Cached` parameters) must be allocated before it can execute. This allocation is especially detrimental for code that is only executed a handful of times (e.g., bootstrap code).
 - *Interpreted performance*. AST interpreters contain many highly polymorphic `execute` call sites that are difficult for the JVM to optimize. These sites pose no problem for runtime-compiled code (where partial evaluation can eliminate the polymorphism), but cold code that runs in the interpreter suffers from poor performance.
 
-Bytecode interpreters enjoy the same peak performance as ASTs, but they can also be encoded with less memory and are more amenable to optimization (e.g., via [host compilation](HostCompilation.md)). Unfortunately, these benefits come at a cost: bytecode interpreters are more difficult and tedious to implement properly. Bytecode DSL simplifies the implementation effort for bytecode interpreters by generating them automatically from AST node-like specifications called "operations".
+Bytecode interpreters enjoy the same peak performance as ASTs, but they can also be encoded with less memory and are more amenable to optimization (e.g., via [host compilation](../HostCompilation.md)). Unfortunately, these benefits come at a cost: bytecode interpreters are more difficult and tedious to implement properly. Bytecode DSL reduces the implementation effort by generating a bytecode interpreter automatically from a set of AST node-like specifications called "operations".
 
 ## Operations
 
-An operation in Bytecode DSL is an atomic unit of language semantics. Each operation can be executed, performing some computation and optionally returning a value. Operations can be nested together to form a program. As an example, the following pseudocode
+An operation in Bytecode DSL is a basic unit of language semantics. Each operation can be executed, performing some computation and optionally returning a value. Operations can be nested together to form a program. As an example, the following pseudocode
 ```python
 if 1 == 2:
     print("what")
@@ -39,9 +39,9 @@ Each of these operations has its own execution semantics. For example, the `IfTh
 
 The operations in Bytecode DSL are divided into two groups: built-in and custom.
 
-- Built-in operations come with the DSL itself, and their semantics cannot be changed. They model behaviour that is common across languages, such as control flow (`IfThen`, `While`, etc.), constant accesses (`LoadConstant`) and local variable manipulation (`LoadLocal`, `StoreLocal`). We describe the precise semantics of the built-in operations later in [Built-in Operations](#built-in-operations).
+- Built-in operations come with the DSL itself, and their semantics cannot be changed. They model behaviour that is common across languages, such as control flow (`IfThen`, `While`, etc.), constant accesses (`LoadConstant`) and local variable manipulation (`LoadLocal`, `StoreLocal`). We describe the precise semantics of the built-in operations later in [Built-in Operations](UserGuide.md#built-in-operations).
 
-- Custom operations are provided by the language. They model language-specific behaviour, such as the semantics of operators, value conversions, calls, etc. In our previous example, `Equals`, `CallFunction` and `LoadGlobal` are custom operations. There are two kinds of custom operations: regular (eager) operations and short-circuiting operations.
+- Custom operations are provided by the language. They model language-specific behaviour, such as the semantics of operators, value conversions, calls, etc. In our previous example, `Equals`, `CallFunction` and `LoadGlobal` are custom operations. We can define two kinds of custom operations: [regular (eager) operations](UserGuide.md#defining-custom-operations) and [short-circuiting operations](UserGuide.md#defining-short-circuiting-custom-operations).
 
 ## Simple example
 
@@ -60,10 +60,10 @@ As an example, let us implement a Bytecode DSL interpreter for a simple language
 
 ### Defining the Bytecode class
 
-The entry-point to a Bytecode DSL interpreter is the `@GenerateBytecode` annotation. This annotation must be attached to a class that `extends RootNode` and `implements BytecodeRootNode`:
+The entry-point to a Bytecode DSL interpreter is the `@GenerateBytecode` annotation. It should annotate a class that `extends RootNode` and `implements BytecodeRootNode`:
 
 ```java
-@GenerateBytecode
+@GenerateBytecode(...)
 public abstract class ExampleBytecodeRootNode extends RootNode implements BytecodeRootNode {
     public ExampleBytecodeRootNode(TruffleLanguage<?> language, FrameDescriptor frameDescriptor) {
         ...
@@ -72,10 +72,10 @@ public abstract class ExampleBytecodeRootNode extends RootNode implements Byteco
 ``````
 The class must have a two-argument constructor that takes a `TruffleLanguage<?>` and a `FrameDescriptor` (or `FrameDescriptor.Builder`). This constructor is used by the generated code to instantiate root nodes, so any other instance fields must be initialized separately.
 
-Inside the bytecode class we define custom operations. Each operation is structured similarly to a Truffle DSL node, except it does not need to be a subclass of `Node` and all of its specializations should be `static`. In our example language, the `+` operator can be expressed with its own operation:
+Inside the bytecode class we define [custom operations](UserGuide.md#defining-custom-operations). Each operation is structured similarly to a Truffle DSL node, except it does not need to be a subclass of `Node` and all of its specializations should be `static` (see the `Operation` Javadoc for full details). In our example language, the `+` operator can be expressed with its own operation:
 
 ```java
-// place inside ExampleBytecodeRootNode
+// define inside ExampleBytecodeRootNode
 @Operation
 public static final class Add {
     @Specialization
@@ -94,42 +94,47 @@ public static final class Add {
 
 Within operations, we can use most of the Truffle DSL, including `@Cached` and `@Bind` parameters, guards, and specialization limits. We cannot use features that require node instances, such as `@NodeChild`, `@NodeField`, nor any instance fields or methods.
 
-One limitation of custom operations is that they eagerly evaluate all of their operands. They cannot perform conditional execution, loops, etc. For those use-cases, we have to use the built-in operations or define custom short-circuiting operations.
+One limitation of custom operations is that they eagerly evaluate all of their operands. They cannot perform conditional execution, loops, etc. For those use-cases, we have to use the [built-in operations](UserGuide.md#control-flow-operations) or define [custom short-circuiting operations](UserGuide.md#defining-short-circuiting-custom-operations).
 
-From this simple description, the DSL will generate a `ExampleBytecodeRootNodeGen` class that contains a full bytecode interpreter definition.
+The bytecode class and its operations define a specification for the interpreter.
+From this specification, the DSL generates an entire bytecode interpreter definition inside the `ExampleBytecodeRootNodeGen` class.
 
 ### Converting a program to bytecode
 
 In order to execute a guest program, we need to convert it to the bytecode defined by the generated interpreter.
 We refer to this process as "parsing" the bytecode root node.
 <!-- We refer to the process of converting a guest program to bytecode (and thereby creating a `BytecodeRootNode`) as parsing. -->
 
-To parse a program to a bytecode root node, we encode the program in terms of operations.
-We invoke methods on the generated `Builder` class to construct these operations; the builder translates these method calls to a sequence of bytecodes that can be executed by the generated interpreter.
-
-
-For this example, let's assume the guest program has already been parsed to an AST as follows:
+For this example, let's assume the guest program can be parsed to an AST with the following node kinds:
 
 ```java
 class Expr { }
 class AddExpr extends Expr { Expr left; Expr right; }
 class IntExpr extends Expr { int value; }
 class StringExpr extends Expr { String value; }
 ```
-Let's also assume there is a simple visitor pattern implemented over the AST.
 
-The expression `1 + 2` can be expressed as operations `(Add (LoadConstant 1) (LoadConstant 2))`. It can be parsed using the following sequence of builder calls:
+To parse a program to a bytecode root node, we use the generated `Builder` class to encode the program as a "tree" of operations.
+For each operation `X`, the builder defines `beginX` and `endX` methods that can be used to encode the operation.
+Simple operations that have no data dependency (i.e., no children) instead have `emitX` methods.
+The builder translates calls to these methods to a flat sequence of bytecodes that can be executed by the generated interpreter.
+
+As an example, the expression `1 + (2 + 3)` can be expressed as operations `(Add (LoadConstant 1) (Add (LoadConstant 2) (LoadConstant 3)))`. It can be parsed using the following sequence of builder calls:
 
 ```java
 b.beginAdd();
-b.emitLoadConstant(1);
-b.emitLoadConstant(2);
+  b.emitLoadConstant(1);
+  b.beginAdd();
+    b.emitLoadConstant(2);
+    b.emitLoadConstant(3);
+  b.endAdd();
 b.endAdd();
 ```
 
-You can think of the `beginX` and `endX` as opening and closing `<X>` and `</X>` XML tags, while `emitX` is the empty tag `<X/>` used when the operation does not take children. Each operation has either `beginX` and `endX` methods or an `emitX` method.
+This sequence of calls automatically produces bytecode to perform the computation.
 
-We can then write a visitor to construct bytecode from the AST representation:
+Observe that the sequence of builder calls is essentially a traversal of the operations "tree".
+A simple way to encode this traversal is with a visitor over the AST:
 
 ```java
 class ExampleBytecodeVisitor implements ExprVisitor {
@@ -156,14 +161,14 @@ class ExampleBytecodeVisitor implements ExprVisitor {
 }
 ```
 
-Now that we have a visitor, we can define a `parse` method. This method converts an AST to a `ExampleBytecodeRootNode`, which can then be executed by the language runtime:
+Now that we have a visitor, we can define a top-level `parse` method. This method converts an AST to an `ExampleBytecodeRootNode`, which can then be executed by the language runtime:
 
 ```java
 public static ExampleBytecodeRootNode parseExample(ExampleLanguage language, Expr program) {
     var nodes = ExampleBytecodeRootNodeGen.create(
         BytecodeConfig.DEFAULT,
         builder -> {
-            // Root operation must enclose each function. It is further explained later.
+            // Root operation must enclose each function. See the User Guide for details.
             builder.beginRoot(language);
 
             // This root node returns the result of executing the expression,
@@ -184,13 +189,21 @@ public static ExampleBytecodeRootNode parseExample(ExampleLanguage language, Exp
 }
 ```
 
-We first invoke the `ExampleBytecodeRootNodeGen#create` function, which is the entry-point for parsing. Its first argument is a `BytecodeConfig`, which defines a parsing mode. `BytecodeConfig.DEFAULT` will suffice for our purposes (there are other modes that include source positions and/or instrumentation info; see [Reparsing](#reparsing)).
+We first invoke the `ExampleBytecodeRootNodeGen#create` function, which is the entry-point for parsing.
+Its first argument is a `BytecodeConfig`, which defines a [parsing mode](UserGuide.md#parsing-modes).
+The default mode is sufficient for most use cases.
+
+The second argument is the parser. The parser implements the `BytecodeParser` functional interface, which uses a supplied `Builder` argument to parse a guest language program.
+In this example, the parser uses the visitor to parse `program`, wrapping the operations in `Root` and `Return` operations.
+The parser must be deterministic (i.e., each parse should produce the same sequence of `Builder` calls), since it may be called more than once to implement [reparsing](#reparsing).
 
-The second argument is the parser. The parser is an implementation of the `BytecodeParser` functional interface, which is responsible for parsing a program using a given `Builder` parameter.
-In this example, the parser uses the visitor to parse `program`, wrapping the operations within `Root` and `Return` operations.
-The parser must be deterministic (i.e., if invoked multiple times, it should invoke the same sequence of `Builder` methods), since it may be called more than once to implement reparsing (see [Reparsing](#reparsing)).
+The result is a `BytecodeNodes` instance, which acts as a wrapper class for the `BytecodeRootNode`s produced by the parse (along with other shared information). The nodes can be extracted using `getNode()` or `getNodes()`.
 
-The result is a `BytecodeNodes` instance, which acts as a wrapper class for the `BytecodeRootNode`s produced by the parse (along with other shared information). The nodes can be extracted using the `getNode()` or `getNodes()`.
+And that's it! The `parse` method returns a root node containing a sequence of bytecode.
+When the root node is invoked, it executes the bytecode using the generated bytecode interpreter.
 
-And that's it! During parsing, the builder generates a sequence of bytecode for each root node. The generated bytecode interpreter executes this bytecode sequence when a root node is executed.
+## Next steps
 
+This introduction covers the basics of Bytecode DSL.
+For more specific usage information, consult the [User guide](UserGuide.md) and [Javadoc](https://www.graalvm.org/truffle/javadoc/com/oracle/truffle/api/bytecode/package-summary.html).
+The Bytecode DSL implementations for [SimpleLanguage](https://github.com/oracle/graal/blob/master/truffle/src/com.oracle.truffle.sl/src/com/oracle/truffle/sl/bytecode/SLBytecodeRootNode.java) and [GraalPython](https://github.com/oracle/graalpython/blob/master/graalpython/com.oracle.graal.python/src/com/oracle/graal/python/nodes/bytecode_dsl/PBytecodeDSLRootNode.java) may also be useful references.
diff --git a/truffle/docs/bytecode_dsl/Optimization.md b/truffle/docs/bytecode_dsl/Optimization.md
@@ -1,15 +1,51 @@
 # Optimization
 
-Bytecode interpreters commonly employ [quickening](https://dl.acm.org/doi/10.1145/1869631.1869633) and [superinstructions](https://dl.acm.org/doi/abs/10.1145/1059579.1059583) to achieve better interpreted performance. This section discusses how to employ these optimizations in Bytecode DSL interpreters.
+Bytecode interpreters commonly employ a variety of optimizations to achieve better interpreted performance. 
+This section discusses how to employ these optimizations in Bytecode DSL interpreters.
+
+## Boxing elimination
+
+A major source of overhead in interpreted code (for both Truffle AST and bytecode interpreters) is boxing.
+By default, values are passed between operations as objects, which forces primitive values to be boxed up.
+Often, the boxed value is subsequently unboxed when it gets consumed.
+
+Boxing elimination avoids these unnecessary boxing steps.
+The interpreter can speculatively rewrite bytecode to pass primitive values between operations whenever possible.
+
+To enable boxing elimination, specify a set of `boxingEliminationTypes` on the `@GenerateBytecode` annotation. For example, the following configuration
+
+```
+@GenerateBytecode(
+    ...
+    boxingEliminationTypes = {long.class, boolean.class}
+)
+```
+
+will instruct the interpreter to automatically avoid boxing for `long` and `boolean` values.
+
+Boxing elimination is implemented using quickening, which is described below.
 
 ## Quickening
 
-**TODO**: talk about how quickening works, how it connects with BE, @ForceQuickening, and tracing
+[Quickening](https://dl.acm.org/doi/10.1145/1869631.1869633) is a general technique to rewrite an instruction with a specialized version that (typically) requires less work.
+Bytecode DSL supports quickened operations, which handle a subset of the specializations defined by an operation.
+
+Quickened operations can reduce the amount of work required to evaluate an operation.
+For example, a quickened operation that only accepts `int` inputs can avoid operand boxing and the additional type checks required by the general operation.
+
+Quickened instructions can be automatically derived using [tracing](#tracing), or specified manually using `@ForceQuickening`.
+
 
 ## Superinstructions
 
 **Note: Superinstructions are not yet supported**.
 
+[Superinstructions](https://dl.acm.org/doi/abs/10.1145/1059579.1059583) combine common sequences of instructions together into single instructions.
+Using superinstructions can reduce the overhead of instruction dispatch, and it can enable the host compiler to perform optimizations across the instructions (e.g., eliding a stack push for a value that is subsequently popped).
+
+Superinstructions can be automatically derived using [tracing](#tracing).
+
+
 ## Tracing
 
 **Note: Tracing is not yet supported**.
diff --git a/truffle/docs/bytecode_dsl/UserGuide.md b/truffle/docs/bytecode_dsl/UserGuide.md