Skip to content

Commit a8c08b1

Browse files
rednaxelafxmaropu
authored andcommitted
[SPARK-31187][SQL] Sort the whole-stage codegen debug output by codegenStageId
### What changes were proposed in this pull request? Spark SQL's whole-stage codegen (WSCG) supports dumping the generated code to help with debugging. One way to get the generated code is through `df.queryExecution.debug.codegen`, or SQL `EXPLAIN CODEGEN` statement. The generated code is currently printed without specific ordering, which can make debugging a bit annoying. This PR makes a minor improvement to sort the codegen dump by the `codegenStageId`, ascending. After this change, the following query: ```scala spark.range(10).agg(sum('id)).queryExecution.debug.codegen ``` will always dump the generated code in a natural, stable order. A version of this example with shorter output is: ``` spark.range(10).agg(sum('id)).queryExecution.debug.codegenToSeq.map(_._1).foreach(println) *(1) HashAggregate(keys=[], functions=[partial_sum(id#8L)], output=[sum#15L]) +- *(1) Range (0, 10, step=1, splits=16) *(2) HashAggregate(keys=[], functions=[sum(id#8L)], output=[sum(id)#12L]) +- Exchange SinglePartition, true, [id=#30] +- *(1) HashAggregate(keys=[], functions=[partial_sum(id#8L)], output=[sum#15L]) +- *(1) Range (0, 10, step=1, splits=16) ``` The number of codegen stages within a single SQL query tends to be very small, most likely < 50, so the overhead of adding the sorting shouldn't be significant. ### Why are the changes needed? Minor improvement to aid WSCG debugging. ### Does this PR introduce any user-facing change? No user-facing change for end-users; minor change for developers who debug WSCG generated code. ### How was this patch tested? Manually tested the output; all other tests still pass. Closes #27955 from rednaxelafx/codegen. Authored-by: Kris Mok <[email protected]> Signed-off-by: Takeshi Yamamuro <[email protected]> (cherry picked from commit a177628) Signed-off-by: Takeshi Yamamuro <[email protected]>
1 parent d712a7a commit a8c08b1

File tree

1 file changed

+1
-1
lines changed
  • sql/core/src/main/scala/org/apache/spark/sql/execution/debug

1 file changed

+1
-1
lines changed

sql/core/src/main/scala/org/apache/spark/sql/execution/debug/package.scala

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -113,7 +113,7 @@ package object debug {
113113
s
114114
case s => s
115115
}
116-
codegenSubtrees.toSeq.map { subtree =>
116+
codegenSubtrees.toSeq.sortBy(_.codegenStageId).map { subtree =>
117117
val (_, source) = subtree.doCodeGen()
118118
val codeStats = try {
119119
CodeGenerator.compile(source)._2

0 commit comments

Comments
 (0)