Skip to content

Conversation

@n-young-db
Copy link
Contributor

What changes were proposed in this pull request?

  • Changed the tags variable of the TreeNode class to initialize lazily. This will reduce unnecessary driver memory pressure.

Why are the changes needed?

  • Plans with large expression or operator trees are known to cause driver memory pressure; this is one step in alleviating that issue.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing UT covers behavior. Outwards facing behavior does not change.

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Apr 29, 2024

def getTagValue[T](tag: TreeNodeTag[T]): Option[T] = {
tags.get(tag).map(_.asInstanceOf[T])
if (_tags eq null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to add a comment clarifying the intent of using both _tags and tags here -- "To avoid initializing the tags when getTagValue is called on a TreeNode without any tags set"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I think the intent is clear enough as is

@HyukjinKwon
Copy link
Member

test failures look unrelated.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @n-young-db and all.

@dongjoon-hyun
Copy link
Member

Merged to master for Apache Spark 4.0.0.

// so we make a compromise here to copy tags to node with no tags
if (tags.isEmpty) {
if (isTagsEmpty && !other.isTagsEmpty) {
tags ++= other.tags
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since tags are def, doesn't this line mutate nothing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, the map is probably pass-by reference.

JacobZheng0927 pushed a commit to JacobZheng0927/spark that referenced this pull request May 11, 2024
### What changes were proposed in this pull request?

- Changed the `tags` variable of the `TreeNode` class to initialize lazily. This will reduce unnecessary driver memory pressure.

### Why are the changes needed?

- Plans with large expression or operator trees are known to cause driver memory pressure; this is one step in alleviating that issue.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing UT covers behavior. Outwards facing behavior does not change.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#46285 from n-young-db/treenode-tags.

Authored-by: Nick Young <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants