-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-17063] [SQL] Improve performance of MSCK REPAIR TABLE with Hive metastore #14607
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| CatalogTablePartition(spec, table.storage.copy(locationUri = Some(location.toUri.toString))) | ||
| // Hive metastore may not have enough memory to handle millions of partitions in single RPC, | ||
| // we should split them into smaller batches. | ||
| partitionSpecsAndLocs.iterator.grouped(1024).foreach { batch => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be great to add some logging so there is a way to indicate progress.
|
Test build #63633 has finished for PR 14607 at commit
|
|
Test build #63635 has finished for PR 14607 at commit
|
|
Test build #63636 has finished for PR 14607 at commit
|
|
Test build #63646 has finished for PR 14607 at commit
|
|
@yhuai @sameeragarwal @rxin I had updated the MSCK REPAIR TABLE to list all the leaf files in parallel to avoid the listing in Hive metastore, hopefully this could speed up it a lot (not benchmarked yet). |
|
Test build #63707 has finished for PR 14607 at commit
|
|
Test build #63708 has finished for PR 14607 at commit
|
|
Test build #63709 has finished for PR 14607 at commit
|
|
@davies Can you create a new JIRA ticket for this change? It is a non-trivial follow-up. |
|
Test build #63793 has finished for PR 14607 at commit
|
|
Test build #63794 has finished for PR 14607 at commit
|
|
Test build #63800 has finished for PR 14607 at commit
|
|
|
||
| // These are two fast stats in Hive Metastore | ||
| // see https://github.com/apache/hive/blob/master/ | ||
| // common/src/java/org/apache/hadoop/hive/common/StatsSetupConst.java#L88 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is actually not a stable identifier (i.e. line 88 will likely change quickly) -- maybe just point to the file.
|
Test build #64017 has finished for PR 14607 at commit
|
|
Test build #64024 has finished for PR 14607 at commit
|
| tableName: TableIdentifier, | ||
| cmd: String = "ALTER TABLE RECOVER PARTITIONS") extends RunnableCommand { | ||
|
|
||
| // These are two fast stats in Hive Metastore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a little more context here so that this comment is self explanatory -- perhaps something along the lines of "These are list of statistics that can be collected quickly without requiring a scan of the data"
|
LGTM pending jenkins |
|
Test build #64311 has finished for PR 14607 at commit
|
|
Test build #64310 has finished for PR 14607 at commit
|
|
Merging into master and 2.0 branch. |
…e metastore This PR split the the single `createPartitions()` call into smaller batches, which could prevent Hive metastore from OOM (caused by millions of partitions). It will also try to gather all the fast stats (number of files and total size of all files) in parallel to avoid the bottle neck of listing the files in metastore sequential, which is controlled by spark.sql.gatherFastStats (enabled by default). Tested locally with 10000 partitions and 100 files with embedded metastore, without gathering fast stats in parallel, adding partitions took 153 seconds, after enable that, gathering the fast stats took about 34 seconds, adding these partitions took 25 seconds (most of the time spent in object store), 59 seconds in total, 2.5X faster (with larger cluster, gathering will much faster). Author: Davies Liu <[email protected]> Closes #14607 from davies/repair_batch. (cherry picked from commit 48caec2) Signed-off-by: Davies Liu <[email protected]>
What changes were proposed in this pull request?
This PR split the the single
createPartitions()call into smaller batches, which could prevent Hive metastore from OOM (caused by millions of partitions).It will also try to gather all the fast stats (number of files and total size of all files) in parallel to avoid the bottle neck of listing the files in metastore sequential, which is controlled by spark.sql.gatherFastStats (enabled by default).
How was this patch tested?
Tested locally with 10000 partitions and 100 files with embedded metastore, without gathering fast stats in parallel, adding partitions took 153 seconds, after enable that, gathering the fast stats took about 34 seconds, adding these partitions took 25 seconds (most of the time spent in object store), 59 seconds in total, 2.5X faster (with larger cluster, gathering will much faster).