-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-21520][SQL]Improvement a special case for non-deterministic projects in optimizer #18892
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
71aba4d to
a1c7559
Compare
|
ok to test |
|
@heary-cao can you improve the PR description, and explain what special cases you are improving. |
|
Test build #80450 has finished for PR 18892 at commit
|
987a30d to
e168fb8
Compare
|
Test build #80482 has finished for PR 18892 at commit
|
|
Test build #80483 has finished for PR 18892 at commit
|
e168fb8 to
b01fec8
Compare
|
Test build #80486 has finished for PR 18892 at commit
|
b01fec8 to
1ee1a76
Compare
|
It sounds like we should not do it in optimizer. The same comment is also applicable to #18918. |
|
Test build #80531 has finished for PR 18892 at commit
|
b023f54 to
72e0252
Compare
|
Test build #80764 has finished for PR 18892 at commit
|
72e0252 to
15596ee
Compare
|
Test build #80773 has finished for PR 18892 at commit
|
15596ee to
9cf8243
Compare
|
Test build #80815 has finished for PR 18892 at commit
|
9cf8243 to
d163b57
Compare
|
Test build #80821 has finished for PR 18892 at commit
|
|
this PR going to fix in PhysicalOperation is submitted at #18969. |
What changes were proposed in this pull request?
Currently, Did a lot of special handling for non-deterministic projects and filters in optimizer. but not good enough. this patch add a new special case for non-deterministic projects. Deal with that we only need to read user needs fields for non-deterministic projects in optimizer.
For example, the fields of project contains nondeterministic function(rand function), after a executedPlan optimizer generated:
HiveTableScan will read all the fields from table. but we only need to ‘d004’ . it will affect the performance of task.
How was this patch tested?
Should be covered existing test cases and add test cases.