-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-48047][SQL] Reduce memory pressure of empty TreeNode tags #46285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala
Outdated
Show resolved
Hide resolved
|
|
||
| def getTagValue[T](tag: TreeNodeTag[T]): Option[T] = { | ||
| tags.get(tag).map(_.asInstanceOf[T]) | ||
| if (_tags eq null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense to add a comment clarifying the intent of using both _tags and tags here -- "To avoid initializing the tags when getTagValue is called on a TreeNode without any tags set"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I think the intent is clear enough as is
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala
Show resolved
Hide resolved
|
test failures look unrelated. |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Thank you, @n-young-db and all.
|
Merged to master for Apache Spark 4.0.0. |
| // so we make a compromise here to copy tags to node with no tags | ||
| if (tags.isEmpty) { | ||
| if (isTagsEmpty && !other.isTagsEmpty) { | ||
| tags ++= other.tags |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since tags are def, doesn't this line mutate nothing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, the map is probably pass-by reference.
### What changes were proposed in this pull request? - Changed the `tags` variable of the `TreeNode` class to initialize lazily. This will reduce unnecessary driver memory pressure. ### Why are the changes needed? - Plans with large expression or operator trees are known to cause driver memory pressure; this is one step in alleviating that issue. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing UT covers behavior. Outwards facing behavior does not change. ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#46285 from n-young-db/treenode-tags. Authored-by: Nick Young <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
tagsvariable of theTreeNodeclass to initialize lazily. This will reduce unnecessary driver memory pressure.Why are the changes needed?
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Existing UT covers behavior. Outwards facing behavior does not change.
Was this patch authored or co-authored using generative AI tooling?
No