-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-18173][SQL] data source tables should support truncating partition #15688
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mind changing rectangles2 to rectPartitioned or similar? Think the test would get simpler to read (later).
|
Test build #67788 has finished for PR 15688 at commit
|
|
This lgtm |
|
If we try to truncate a partition that does not exist, we just do nothing? |
|
retest this please |
|
Test build #67847 has finished for PR 15688 at commit
|
|
@gatorsmile that's a good point. I checked with hive, the behaviour is:
I have updated this PR according to this |
|
Test build #67871 has finished for PR 15688 at commit
|
|
retest this please |
|
Test build #67882 has finished for PR 15688 at commit
|
|
retest this please |
|
Test build #67887 has finished for PR 15688 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does partial mean here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should document it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the specified column names are not partitioning columns, we will get a strange error:
key not found: height
java.util.NoSuchElementException: key not found: height
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we will/should hit this in the ExternalCatalog level, as most of partition related methods assumes the column names in partition spec are partition columns, e.g. loadPartition, loadDynamicPartitions, dropPartitions, etc.
If we do hit this, we should fix it at the caller side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The above message was got from the statement
sql("TRUNCATE TABLE partTable PARTITION (height =1)")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comparing the size is not enough. We also need to check whether all the columns are partitioning columns.
|
Could we add all the positive and negative cases in |
|
Test build #68188 has finished for PR 15688 at commit
|
|
retest this please |
|
Test build #68195 has finished for PR 15688 at commit
|
|
|
||
| // do nothing if no partition is matched for the given non-partial partition spec | ||
| // TODO: This behaviour is different from Hive, we should decide whether we need to follow | ||
| // Hive's behaviour or stick with our existing behaviour later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's our and hive's behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
our behaviour: do nothing if no partition is matched, no matter whether the given partition spec is partial or not.
Hive's behaviour: when no partition is matched, do nothing if the given partition spec is partial, throw exception if the given partition spec is non-partial.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we document the behavior difference in the code comments, if we decide to stick with our existing behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually think Hive's behavior makes sense here. If I'm giving you an exact match, you should warn me if there is an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The alternative, of course, is to provide "truncate if exists", which doesn't throw exceptions if that's desired.
|
cc @gatorsmile |
|
Regarding the current code changes, LGTM Like what @cloud-fan said in the code comment, we need to decide whether we should follow Hive. |
|
Test build #68241 has finished for PR 15688 at commit
|
|
Thanks - merging in master/branch-2.1. |
|
@cloud-fan can you create a follow-up pr to switch over to Hive's behavior? |
…tion ## What changes were proposed in this pull request? Previously `TRUNCATE TABLE ... PARTITION` will always truncate the whole table for data source tables, this PR fixes it and improve `InMemoryCatalog` to make this command work with it. ## How was this patch tested? existing tests Author: Wenchen Fan <[email protected]> Closes #15688 from cloud-fan/truncate. (cherry picked from commit 46b2e49) Signed-off-by: Reynold Xin <[email protected]>
…hed for the given non-partial partition spec ## What changes were proposed in this pull request? a follow up of #15688 ## How was this patch tested? updated test in `DDLSuite` Author: Wenchen Fan <[email protected]> Closes #15805 from cloud-fan/truncate. (cherry picked from commit 73feaa3) Signed-off-by: Wenchen Fan <[email protected]>
…hed for the given non-partial partition spec ## What changes were proposed in this pull request? a follow up of apache#15688 ## How was this patch tested? updated test in `DDLSuite` Author: Wenchen Fan <[email protected]> Closes apache#15805 from cloud-fan/truncate.
…tion ## What changes were proposed in this pull request? Previously `TRUNCATE TABLE ... PARTITION` will always truncate the whole table for data source tables, this PR fixes it and improve `InMemoryCatalog` to make this command work with it. ## How was this patch tested? existing tests Author: Wenchen Fan <[email protected]> Closes apache#15688 from cloud-fan/truncate.
…hed for the given non-partial partition spec ## What changes were proposed in this pull request? a follow up of apache#15688 ## How was this patch tested? updated test in `DDLSuite` Author: Wenchen Fan <[email protected]> Closes apache#15805 from cloud-fan/truncate.
What changes were proposed in this pull request?
Previously
TRUNCATE TABLE ... PARTITIONwill always truncate the whole table for data source tables, this PR fixes it and improveInMemoryCatalogto make this command work with it.How was this patch tested?
existing tests