You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SPARK-38124][SQL][SS] Introduce StatefulOpClusteredDistribution and apply to stream-stream join
### What changes were proposed in this pull request?
This PR revives `HashClusteredDistribution` and renames to `StatefulOpClusteredDistribution` so that the rationalization of the distribution is clear from the name. Renaming is safe because this class no longer needs to be general one - in SPARK-35703 we moved out the usages of `HashClusteredDistribution` to `ClusteredDistribution`; stateful operators are exceptions.
Only `HashPartitioning` with same expressions and number of partitions can satisfy `StatefulOpClusteredDistribution`. That said, we cannot modify `HashPartitioning` unless we clone `HashPartitioning` and assign the clone to `StatefulOpClusteredDistribution`.
This PR documents the expectation of stateful operator on partitioning in the classdoc of `StatefulOpClusteredDistribution`.
This PR also changes stream-stream join to use `StatefulOpClusteredDistribution` instead of `ClusteredDistribution`. This effectively reverts a part of SPARK-35703 which hasn't been shipped to any releases. This PR doesn't deal with other stateful operators since it has been long standing issue (probably Spark 2.2.0+) and we need a plan for dealing with existing state.
### Why are the changes needed?
Spark does not guarantee stable physical partitioning for stateful operators across query lifetime, and due to the relaxed distribution requirement it is hard to expect what would be the current physical partitioning of the state.
(We expect hash partitioning with grouping keys, but ClusteredDistribution does not "guarantee" the partitioning. It is much more relaxed.)
This PR will enforce the physical partitioning of stream-stream join operators to be hash partition with grouping keys, which is our general expectation of state store partitioning.
### Does this PR introduce _any_ user-facing change?
No, since SPARK-35703 hasn't been shipped to any release yet.
### How was this patch tested?
Existing tests.
Closes#35419 from HeartSaVioR/SPARK-38124.
Authored-by: Jungtaek Lim <[email protected]>
Signed-off-by: Jungtaek Lim <[email protected]>
0 commit comments