-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-6910] [SQL] Support for pushing predicates down to metastore for partition pruning #7421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #37372 has finished for PR 7421 at commit
|
|
Hmm, jenkins failed for a unknown reason- |
|
ok to test |
|
Test build #37421 has finished for PR 7421 at commit
|
|
I'm going to trigger a bunch of test runs. Lets see what happens... |
|
Test build #1087 has finished for PR 7421 at commit
|
|
Test build #1083 has finished for PR 7421 at commit
|
|
Test build #1085 has finished for PR 7421 at commit
|
|
Test build #1086 has finished for PR 7421 at commit
|
|
Test build #1084 has finished for PR 7421 at commit
|
|
4/5 passed... Do you think this is because of multithreading in unit tests? Otherwise, I have no explanation. |
|
getPartitionsByFilter is really a great improvement, normally in a production hive data warehouse, there are tables with huge amount of partitions. looking forward to see this will be included in next release :) |
|
I am repeatedly running the sql/hive unit tests after synchronizing |
|
I am pretty confused about why this this failing, but only sometimes. I don't think it could be locking because Have you been able to reproduce the failure locally? |
|
Investigated the following 3 build failure samples:
Firstly, this issue couldn't be steadily reproduced, and only showed up on Jenkins occasionally. An obvious guess is that it's probably a concurrency bug and only occurs in highly concurrent jobs. (Notice that Secondly, all 3 build failures behaved extremely consistently: 18 And I got another interesting finding after single step debugging a failed test case. The following stacktrace snippet appears in all 3 build failures: The marked code path showed above is actually NEVER executed in normal cases. To be more specific, the Haven't got any clue how this state gets corrupted yet. My guess is that there is a race condition during |
|
It seems that Hive prefers to access the underlying metastore database via direct SQL, and uses JDO ORM as a fallback. The existing bug doesn't show up because usually either direct SQL or JDO ORM is capable to do the work. But in case of Before fixing the root cause, I guess we can workaround this issue by setting |
|
Experimenting the workaround mentioned above in PR #7492. |
…failures caused by in apache#7421
…or partition pruning This PR forks PR #7421 authored by piaozhexiu and adds [a workaround] [1] for fixing the occasional test failures occurred in PR #7421. Please refer to these [two] [2] [comments] [3] for details. [1]: liancheng@536ac41 [2]: #7421 (comment) [3]: #7421 (comment) Author: Cheolsoo Park <[email protected]> Author: Cheng Lian <[email protected]> Author: Michael Armbrust <[email protected]> Closes #7492 from liancheng/pr-7421-workaround and squashes the following commits: 5599cc4 [Cheolsoo Park] Predicate pushdown to hive metastore 536ac41 [Cheng Lian] Sets hive.metastore.integral.jdo.pushdown to true to workaround test failures caused by in #7421
|
Closing as it is merged as part of #7492. |
|
@liancheng @piaozhexiu Have you cherry-pick this PR to spark branch-1.5? |
|
@litao-buptsse yes, this patch is committed in branch-1.5. You need to set |
|
@piaozhexiu OK, I got it, thank you very much! |
@marmbrus @liancheng per request, I am reopening PR that contains #7216 and #7386.
Can you help me to understand unit test failures?