Commit 5d09828
[SPARK-37447][SQL] Cache LogicalPlan.isStreaming() result in a lazy val
### What changes were proposed in this pull request?
This PR adds caching to `LogicalPlan.isStreaming()`: the default implementation's result will now be cached in a `private lazy val`.
### Why are the changes needed?
This improves the performance of the `DeduplicateRelations` analyzer rule.
The default implementation of `isStreaming` recursively visits every node in the tree. `DeduplicateRelations.renewDuplicatedRelations` is recursively invoked on every node in the tree and each invocation calls `isStreaming`. This leads to `O(n^2)` invocations of `isStreaming` on leaf nodes.
Caching `isStreaming` avoids this performance problem.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Correctness should be covered by existing tests.
This significantly improved `DeduplicateRelations` performance in local microbenchmarking with large query plans (~20% reduction in that rule's runtime in one of my tests).
Closes #34691 from JoshRosen/cache-LogicalPlan.isStreaming.
Authored-by: Josh Rosen <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>1 parent a3886ba commit 5d09828
File tree
1 file changed
+2
-1
lines changed- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical
1 file changed
+2
-1
lines changedLines changed: 2 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
41 | 41 | | |
42 | 42 | | |
43 | 43 | | |
44 | | - | |
| 44 | + | |
| 45 | + | |
45 | 46 | | |
46 | 47 | | |
47 | 48 | | |
| |||
0 commit comments