Skip to content

Conversation

@szehon-ho
Copy link
Member

What changes were proposed in this pull request?

If spark.sql.v2.bucketing.allowJoinKeysSubsetOfPartitionKeys.enabled is true, change KeyGroupedPartitioning.satisfies0(distribution) check from all clustering keys (here, join keys) being in partition keys, to the two sets overlapping.

Why are the changes needed?

If spark.sql.v2.bucketing.allowJoinKeysSubsetOfPartitionKeys.enabled is true, then SPJ no longer triggers if there are more join keys than partition keys. But SPJ is supported in this case if flag is false.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Added tests in KeyGroupedPartitioningSuite

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label May 1, 2024
  ### What changes were proposed in this pull request?
If spark.sql.v2.bucketing.allowJoinKeysSubsetOfPartitionKeys.enabled is true, change KeyGroupedPartitioning.satisfies0(distribution) check from all clustering keys (here, join keys)  being in partition keys, to the two sets overlapping.

  ### Why are the changes needed?
If spark.sql.v2.bucketing.allowJoinKeysSubsetOfPartitionKeys.enabled is true, then SPJ no longer triggers if there are more join keys than partition keys. But SPJ is supported in this case if flag is false.

  ### Does this PR introduce _any_ user-facing change?
No

  ### How was this patch tested?
-Added tests in KeyGroupedPartitioningSuite
@szehon-ho szehon-ho force-pushed the fix_spj_less_join_key branch from 18aec40 to c2c2659 Compare May 2, 2024 00:49
@szehon-ho
Copy link
Member Author

@sunchao I think its a simple fix, can you take a look?

Copy link
Member

@sunchao sunchao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sunchao sunchao closed this in 5ec62a7 May 2, 2024
@sunchao
Copy link
Member

sunchao commented May 2, 2024

Merged to master, thanks @szehon-ho ! Do you think we need to backport this to branch-3.4 and branch-3.5?

@szehon-ho
Copy link
Member Author

Thanks for fast review! Yea will do that.

@szehon-ho
Copy link
Member Author

Actually just checked, looks like original pr #42306 was not backported because it is a new feature and not bug fix. So I think no need.

IgorBerman pushed a commit to IgorBerman/spark that referenced this pull request Jun 18, 2025
### What changes were proposed in this pull request?
If spark.sql.v2.bucketing.allowJoinKeysSubsetOfPartitionKeys.enabled is true, change KeyGroupedPartitioning.satisfies0(distribution) check from all clustering keys (here, join keys)  being in partition keys, to the two sets overlapping.

  ### Why are the changes needed?
If spark.sql.v2.bucketing.allowJoinKeysSubsetOfPartitionKeys.enabled is true, then SPJ no longer triggers if there are more join keys than partition keys. But SPJ is supported in this case if flag is false.

  ### Does this PR introduce _any_ user-facing change?
No

  ### How was this patch tested?
Added tests in KeyGroupedPartitioningSuite

 ### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#46325 from szehon-ho/fix_spj_less_join_key.

Authored-by: Szehon Ho <[email protected]>
Signed-off-by: Chao Sun <[email protected]>
(cherry picked from commit 5ec62a7)
turboFei pushed a commit to turboFei/spark that referenced this pull request Nov 6, 2025
If spark.sql.v2.bucketing.allowJoinKeysSubsetOfPartitionKeys.enabled is true, change KeyGroupedPartitioning.satisfies0(distribution) check from all clustering keys (here, join keys)  being in partition keys, to the two sets overlapping.

  ### Why are the changes needed?
If spark.sql.v2.bucketing.allowJoinKeysSubsetOfPartitionKeys.enabled is true, then SPJ no longer triggers if there are more join keys than partition keys. But SPJ is supported in this case if flag is false.

  ### Does this PR introduce _any_ user-facing change?
No

  ### How was this patch tested?
Added tests in KeyGroupedPartitioningSuite

 ### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#46325 from szehon-ho/fix_spj_less_join_key.

Authored-by: Szehon Ho <[email protected]>
Signed-off-by: Chao Sun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants