-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-32480] Support insert overwrite to move data to trash #29319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
c2ab531 to
9f6729a
Compare
hm, we need to move data into a trash even though users intend to OVERWRITE it with this command? Probably, you need to describe more about what's a benefit for users. |
A similar jira was raised for truncate table also SPARK-32481, I think it is just to be safe to first move it to trash first and it also provides the flexibility to user if they want to refer the deleted data. To not force this implementation I have added a configuration also to disable this. |
9447091 to
14b040b
Compare
2c21f5d to
8b67fee
Compare
|
Hi @dongjoon-hyun , can you please review this one also. |
|
Gentle ping @dongjoon-hyun |
|
ok to test |
|
Test build #127921 has finished for PR 29319 at commit
|
|
Test build #127918 has finished for PR 29319 at commit
|
|
Test build #127923 has finished for PR 29319 at commit
|
|
Test build #128050 has finished for PR 29319 at commit
|
|
I am doing this too. this pr can avoid our user write wrong dir path that have data (such as DB's path, happened before). |
| throw new RuntimeException( | ||
| s"Cannot remove partition directory '$path'") | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are too many case
- Hive version > 2.0 & Hive version < 2.9
- insert overwrite table
- dynamic partition insert
| if (Option(existFile.getPath) != createdTempDir) fs.delete(existFile.getPath, true) | ||
| if (Option(existFile.getPath) != createdTempDir) { | ||
| Utils.moveToTrashOrDelete(fs, existFile.getPath, isTrashEnabled, hadoopConf) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am doing this too. this pr (such as INSERT OVERWRITE DIRECTION part ) can avoid our user write wrong dir path that have data (such as DB's path, happened before).
Move data to trash can make recovery of production data faster in the event of such a disaster.
FYI @maropu
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
What changes were proposed in this pull request?
Instead of deleting the data, we can move the data to trash.
Based on the configuration provided by the user it will be deleted permanently from the trash.
Why are the changes needed?
Instead of directly deleting the data, we can provide flexibility to move data to the trash and then delete it permanently.
Does this PR introduce any user-facing change?
Yes, After insert overwrite the data is not permanently deleted now.
It is first moved to the trash and then after the given time deleted permanently;
How was this patch tested?
Manually