-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-8312] [SQL] Populate statistics info of hive tables if it's needed to be #6767
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
c2c7d87 to
7a534df
Compare
|
ok to test @yhuai |
|
Test build #41905 has finished for PR 6767 at commit
|
|
Test build #42636 has finished for PR 6767 at commit
|
|
Test build #42652 has finished for PR 6767 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
plan foreach ?
|
@navis I like the idea to move the partition prunning stuff into the |
|
Test build #46308 has finished for PR 6767 at commit
|
|
Test build #46396 has finished for PR 6767 at commit
|
|
Test build #46501 has finished for PR 6767 at commit
|
|
Test build #46503 has finished for PR 6767 at commit
|
|
@chenghao-intel Sorry for long delay. Could see this again when your time allowed? |
|
Thanks for the pull request. I'm going through a list of pull requests to cut them down since the sheer number is breaking some of the tooling we have. Due to lack of activity on this pull request, I'm going to push a commit to close it. Feel free to reopen it or create a new one. |
Currently, spark-sql uses stats in metastore for estimating size of hive table, which means analyze command should be executed before accessing the table for better planning especially for joins. But still with the stats, it cannot reflect real input size of the query when partition prunning predicate exists in it.
Even worse is that hive cannot update metastore stats for external tables, which is fixed recently in HIVE-6727. The issue detail says the bug is applied to all hive version between 0.13.0 and 1.2.0