-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-12125][SQL] pull out nondeterministic expressions from Join #10128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-12125][SQL] pull out nondeterministic expressions from Join #10128
Conversation
pull out nondeterministic expressions from Join pull out nondeterministic expressions from Join
|
@rxin @cloud-fan @chenghao-intel @jeanlyn Could you give some suggestions on this PR? |
|
I think it's not a good example to show that we need to allow nondeterministic expressions in join codition. We can use |
|
I can understand the motivation of this change, we do have workaround for relieving the data skew, but we probably don't want to change the existing SQL queries based on legacy system (like Hive). |
|
@cloud-fan I think your case is different from @zhonghaihua 's. The sql only deal with some join keys ('' and null) before shuffle to handle those pointless key cause skew during join operator, while |
|
@cloud-fan Thanks for your advice. But, as @jeanlyn said, |
|
This seems like a reasonable thing to do, but the implementation seems unnecessarily complex. Why not just:
|
|
ok to test |
|
Test build #48778 has finished for PR 10128 at commit
|
|
@marmbrus Thanks for your suggestions. I think your idea can simply solve problem. But in some situations, this seems not very suitable. When |
|
Test build #48833 has finished for PR 10128 at commit
|
Why? Multiplying by |
|
@marmbrus you are right. But i think @zhonghaihua 's solution is try to reduce cartesian product possibility, right? |
|
How does this make a difference in join selection? I think the logic in |
|
Oh, I see what you are trying to do. Hmm, this feels like a hack to me. If we want to fix skewed joins I'm thinking we need a more principled solution. |
|
It's difference from join selection, it just pull out nondeterministic expressions of join condition to the left or right children, but it seems that it can reuse the code of |
|
Thanks for the pull request. I'm going through a list of pull requests to cut them down since the sheer number is breaking some of the tooling we have. Due to lack of activity on this pull request, I'm going to push a commit to close it. Feel free to reopen it or create a new one. |
Currently,
nondeterministic expressionsare only allowed inProjectorFilter,And only when we use nondeterministic expressions inUnaryNodecan be pulled out.But,Sometime in many case,we will use nondeterministic expressions to process
join keysavoiding data skew.for example:This PR introduce a mechanism to pull out nondeterministic expressions from
Join,so we can use nondeterministic expression inJoinappropriately.