Skip to content

Conversation

@mcdull-zhang
Copy link
Contributor

What changes were proposed in this pull request?

UnsafeHashedRelation should serialize numKeys out

Why are the changes needed?

One case I found was this:
We turned on ReusedExchange(BroadcastExchange), but the returned UnsafeHashedRelation is missing numKeys.

The reason is that the current type of TorrentBroadcast._value is SoftReference, so the UnsafeHashedRelation obtained by deserialization loses numKeys, which will lead to incorrect calculation results.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Added a line of assert to an existing unit test

@github-actions github-actions bot added the SQL label Mar 13, 2022
Copy link
Member

@wangyum wangyum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also encountered this issue after replacing SoftReference with WeakReference. cc @cloud-fan

readInt: () => Int,
readLong: () => Long,
readBuffer: (Array[Byte], Int, Int) => Unit): Unit = {
numKeys = readInt()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how did we get the num keys before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This field has not been serialized from the beginning.
The reason why this problem was not triggered before is because TorrentBroadcast._value is like this:
private lazy val _value: T = readBroadcastBlock()

It was modified after #22995 and became like this:
private var _value: SoftReference[T] = _

The driver side may deserialize UnsafeHashedRelation, and then cause this problem.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The executors also need to deserialize the hash relation, why they are fine?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, the numKeys field is only used in the UnsafeHashedRelation.keys() method, and UnsafeHashedRelation.keys() is only used on the driver side, so the executor will not trigger this problem.

And the numKeys field does not affect the execution of the UnsafeHashedRelation.get() method, so there was no problem before.

My personal understanding, you can also help analyze.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@cloud-fan
Copy link
Contributor

thanks, merging to master/3.2!

@cloud-fan cloud-fan closed this in 8476c8b Mar 16, 2022
cloud-fan pushed a commit that referenced this pull request Mar 16, 2022
### What changes were proposed in this pull request?
UnsafeHashedRelation should serialize numKeys out

### Why are the changes needed?
One case I found was this:
We turned on ReusedExchange(BroadcastExchange), but the returned UnsafeHashedRelation is missing numKeys.

The reason is that the current type of TorrentBroadcast._value is SoftReference, so the UnsafeHashedRelation obtained by deserialization loses numKeys, which will lead to incorrect calculation results.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Added a line of assert to an existing unit test

Closes #35836 from mcdull-zhang/UnsafeHashed.

Authored-by: mcdull-zhang <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 8476c8b)
Signed-off-by: Wenchen Fan <[email protected]>
@ulysses-you
Copy link
Contributor

@cloud-fan it seems this pr does not backport into branch-3.3, guess the master has become 3.4 at that time ...

@cloud-fan
Copy link
Contributor

hmm, @mcdull-zhang can you open a backport PR for 3.3? thanks!

cloud-fan pushed a commit that referenced this pull request Mar 23, 2022
### What changes were proposed in this pull request?
UnsafeHashedRelation should serialize numKeys out

### Why are the changes needed?
One case I found was this:
We turned on ReusedExchange(BroadcastExchange), but the returned UnsafeHashedRelation is missing numKeys.

The reason is that the current type of TorrentBroadcast._value is SoftReference, so the UnsafeHashedRelation obtained by deserialization loses numKeys, which will lead to incorrect calculation results.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Added a line of assert to an existing unit test

Closes #35836 from mcdull-zhang/UnsafeHashed.

Authored-by: mcdull-zhang <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
@cloud-fan
Copy link
Contributor

I've backported it to 3.3

@mcdull-zhang
Copy link
Contributor Author

hmm, @mcdull-zhang can you open a backport PR for 3.3? thanks!

oh sorry, I just saw these conversations.

@mcdull-zhang mcdull-zhang deleted the UnsafeHashed branch March 24, 2022 12:26
kazuyukitanimura pushed a commit to kazuyukitanimura/spark that referenced this pull request Aug 10, 2022
### What changes were proposed in this pull request?
UnsafeHashedRelation should serialize numKeys out

### Why are the changes needed?
One case I found was this:
We turned on ReusedExchange(BroadcastExchange), but the returned UnsafeHashedRelation is missing numKeys.

The reason is that the current type of TorrentBroadcast._value is SoftReference, so the UnsafeHashedRelation obtained by deserialization loses numKeys, which will lead to incorrect calculation results.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Added a line of assert to an existing unit test

Closes apache#35836 from mcdull-zhang/UnsafeHashed.

Authored-by: mcdull-zhang <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 8476c8b)
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants