dotnet · jamesmontemagno · Apr 7, 2025 · Mar 27, 2025 · Apr 3, 2025 · Apr 3, 2025
diff --git a/release-notes/10.0/preview/preview3/runtime.md b/release-notes/10.0/preview/preview3/runtime.md
@@ -8,6 +8,94 @@ Here's a summary of what's new in the .NET Runtime in this preview release:
 
 - [What's new in .NET 10](https://learn.microsoft.com/dotnet/core/whats-new/dotnet-10/overview) documentation
 
-## Feature
+## Stack Allocation of Small Arrays of Reference Types
 
-Something about the feature
+Since .NET 9's release, we have introduced new enhancements to the JIT compiler's ability to stack-allocate objects that don't outlive their creation contexts. Preview 1 expanded the JIT's stack allocation optimization to small, fixed-sized arrays of value types. This means small arrays of types not tracked by the garbage collector (GC) are allocated on the stack instead of the heap when it is safe to do so, reducing GC pressure and unlocking additional optimizations like scalar promotion. However, this optimization would not kick in for examples like the below:
+
+```csharp
+static void Print()
+{
+    string[] words = {"Hello", "World!"};
+    foreach (var str in words)
+    {
+        Console.WriteLine(str);
+    }
+}
+```
+
+The lifetime of `words` is scoped to the `Print` method, and the JIT can already stack-allocate the strings `"Hello"` and `"world!"`. However, the fact that `words` is an array of `strings`, a reference type, would previously stop the JIT from stack-allocating it. Now, the JIT can eliminate every heap allocation in the above example. At the assembly level, the code for `Print` used to look like this:
+```asm
+Program:Print() (FullOpts):
+       push     rbp
+       push     r15
+       push     rbx
+       lea      rbp, [rsp+0x10]
+       mov      rdi, 0x7624BAEF8360      ; System.String[]
+       mov      esi, 2
+       call     CORINFO_HELP_NEWARR_1_OBJ
+       mov      rdi, 0x762534A02D10      ; 'Hello'
+       mov      gword ptr [rax+0x10], rdi
+       mov      rdi, 0x762534A02D30      ; 'World!'
+       mov      gword ptr [rax+0x18], rdi
+       lea      rbx, bword ptr [rax+0x10]
+       mov      r15d, 2
+G_M2084_IG03:  ;; offset=0x0043
+       mov      rdi, gword ptr [rbx]
+       call     [System.Console:WriteLine(System.String)]
+       add      rbx, 8
+       dec      r15d
+       jne      SHORT G_M2084_IG03
+       pop      rbx
+       pop      r15
+       pop      rbp
+       ret
+```
+
+Notice how we call `CORINFO_HELP_NEWARR_1_OBJ` to allocate `words` on the heap. Now, the assembly looks like this:
+```
+Program:Print() (FullOpts):
+       push     rbp
+       push     r15
+       push     rbx
+       sub      rsp, 32
+       lea      rbp, [rsp+0x30]
+       vxorps   xmm8, xmm8, xmm8
+       vmovdqu  ymmword ptr [rbp-0x30], ymm8
+       mov      rdi, 0x700BFC98B0C0      ; System.String[]
+       mov      qword ptr [rbp-0x30], rdi
+       lea      rdi, [rbp-0x30]
+       mov      dword ptr [rdi+0x08], 2
+       lea      rbx, [rbp-0x30]
+       mov      rdi, 0x700C76402D10      ; 'Hello'
+       mov      gword ptr [rbx+0x10], rdi
+       mov      rdi, 0x700C76402D30      ; 'World!'
+       mov      gword ptr [rbx+0x18], rdi
+       add      rbx, 16
+       mov      r15d, 2
+G_M2084_IG03:  ;; offset=0x005A
+       mov      rdi, gword ptr [rbx]
+       call     [System.Console:WriteLine(System.String)]
+       add      rbx, 8
+       dec      r15d
+       jne      SHORT G_M2084_IG03
+       add      rsp, 32
+       pop      rbx
+       pop      r15
+       pop      rbp
+       ret
+```
+
+For more information on stack allocation improvements in the JIT compiler, check out [dotnet/runtime #108913](https://github.com/dotnet/runtime/issues/108913).
+
+## Improved Code Layout
+
+The JIT compiler organizes your method's code into basic blocks that can only be entered at the first instruction, and exited via the last instruction. As long as the JIT appends a jump instruction to the end of each block, the program can be laid out in any block order without changing runtime behavior. However, some layouts produce better runtime performance than others:
+
+* Placing a block before its successor in memory means the JIT does not need to emit a jump instruction to it, reducing code size and the potential for CPU pipelining penalties.
+* Placing frequently-executed blocks near each other increases their likelihood of sharing an instruction cache line, reducing instruction cache misses.
+
+Thus, the JIT has an optimization where it tries to find a block ordering that exhibits the above traits. Previously, the JIT would compute a reverse postorder (RPO) traversal of the program's flowgraph as an initial layout, and then make iterative transformations to it. RPO tends to produce layouts with little branching by placing each block before its successors, unless the block is in a loop. If profile data suggests a block is rarely executed, the JIT then moves it to the end of the method to compact the hotter parts of the method. Finally, the JIT tries to eliminate hot branches by moving the successor of the branch up to its predecessor.
+
+The challenge of finding a performant block layout stems from the fact that the above goals are frequently orthogonal to each other. For example, in the process of eliminating a hot branch, the JIT might move other hot blocks further away from their successors, reducing hot code density overall. Because each transformation is local in scope, it's difficult to model how one transformation's changes affect another's. To solve this, the JIT now models the block reordering problem as a reduction of the asymmetric [Travelling Salesman Problem](https://en.wikipedia.org/wiki/Travelling_salesman_problem), and implements the [3-opt](https://en.wikipedia.org/wiki/3-opt) heuristic for finding a near-optimal traversal. The "distance" between each block is modeled by the execution count of the preceding block, multiplied by the likelihood that the block branches to its successor. The JIT then searches for a layout with the shortest distance from the method entry to the method exit, frequently yielding a layout with dense hot paths, and relatively short branches.
+
+To learn more about improvements to code layout in .NET 10, check out [dotnet/runtime #107749](https://github.com/dotnet/runtime/issues/107749).