Skip to content

Commit 5d09828

Browse files
JoshRosencloud-fan
authored andcommitted
[SPARK-37447][SQL] Cache LogicalPlan.isStreaming() result in a lazy val
### What changes were proposed in this pull request? This PR adds caching to `LogicalPlan.isStreaming()`: the default implementation's result will now be cached in a `private lazy val`. ### Why are the changes needed? This improves the performance of the `DeduplicateRelations` analyzer rule. The default implementation of `isStreaming` recursively visits every node in the tree. `DeduplicateRelations.renewDuplicatedRelations` is recursively invoked on every node in the tree and each invocation calls `isStreaming`. This leads to `O(n^2)` invocations of `isStreaming` on leaf nodes. Caching `isStreaming` avoids this performance problem. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Correctness should be covered by existing tests. This significantly improved `DeduplicateRelations` performance in local microbenchmarking with large query plans (~20% reduction in that rule's runtime in one of my tests). Closes #34691 from JoshRosen/cache-LogicalPlan.isStreaming. Authored-by: Josh Rosen <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
1 parent a3886ba commit 5d09828

File tree

1 file changed

+2
-1
lines changed
  • sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical

1 file changed

+2
-1
lines changed

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,8 @@ abstract class LogicalPlan
4141
def metadataOutput: Seq[Attribute] = children.flatMap(_.metadataOutput)
4242

4343
/** Returns true if this subtree has data from a streaming data source. */
44-
def isStreaming: Boolean = children.exists(_.isStreaming)
44+
def isStreaming: Boolean = _isStreaming
45+
private[this] lazy val _isStreaming = children.exists(_.isStreaming)
4546

4647
override def verboseStringWithSuffix(maxFields: Int): String = {
4748
super.verboseString(maxFields) + statsCache.map(", " + _.toString).getOrElse("")

0 commit comments

Comments
 (0)