-
Couldn't load subscription status.
- Fork 3.4k
HBASE-27579 CatalogJanitor can cause data loss due to errors during cleanMergeRegion #4986
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
🎊 +1 overall
This message was automatically generated. |
| LOG.error("Error trying to determine if region exists, assuming exists and has references", | ||
| ioe); | ||
| return new Pair<>(Boolean.TRUE, Boolean.TRUE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is the big difference b/ the two implementations right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. Previously we'd treat this error as "lets assume it doesn't exist and go forward with cleanup". This assumption was inaccurate in at least some failure modes, so unsafe. The new assumption is "we don't know if the region exists, so let's not risk cleaning up". This is safer, but the downside is we might wait longer to GC merged regions. I think that's not a problem at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is safer, but the downside is we might wait longer to GC merged regions. I think that's not a problem at all.
+1
|
🎊 +1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
Thanks all for the review. Having trouble with some flaky tests. I'm going to let it run once more to see if we can get a clean build. So far I've run the failing tests manually and they succeed locally. |
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
…leanMergeRegion (#4986) Signed-off-by: Andrew Purtell <[email protected]> Signed-off-by: Viraj Jasani <[email protected]> Signed-off-by: Duo Zhang <[email protected]>
…leanMergeRegion (#4986) Signed-off-by: Andrew Purtell <[email protected]> Signed-off-by: Viraj Jasani <[email protected]> Signed-off-by: Duo Zhang <[email protected]>
…leanMergeRegion (#4986) Signed-off-by: Andrew Purtell <[email protected]> Signed-off-by: Viraj Jasani <[email protected]> Signed-off-by: Duo Zhang <[email protected]>
…leanMergeRegion (apache#4986) Signed-off-by: Andrew Purtell <[email protected]> Signed-off-by: Viraj Jasani <[email protected]> Signed-off-by: Duo Zhang <[email protected]>
…leanMergeRegion (apache#4986) Signed-off-by: Andrew Purtell <[email protected]> Signed-off-by: Viraj Jasani <[email protected]> Signed-off-by: Duo Zhang <[email protected]> (cherry picked from commit a7f776f) Change-Id: I19696f331e463024e4205036c95a09a5aab75816
There already existed a robust reference check method for cleaning splits,
checkDaughterInFs. I cleaned that up a little bit and re-use it in cleanMergeRegion. This way both cleaning splits and merges use the same reference checking logic. This should be easier to maintain going forward.I added tests in TestCatalogJanitor -- there were none for merges, so i added a base case and a failure mode case.
I decided not to modify HRegionFileSystem.openRegionFromFileSystem -- even if I modified that to throw FileNotFoundException, from a defensive coding perspective I'd rather not assume that exception refers to the region dir specifically. This code is highly destructive, so we rely on an explicit
existscall and only delete if that successfully returns false within our method.