-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-22227][CORE] DiskBlockManager.getAllBlocks now tolerates temp files #19458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has the effect of swallowing a number of possible exceptions. I think at least you'd want to unpack this and log the error. But is there a more explicit way of excluding temp files? it seems like getAllFiles shouldn't return those either?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has the effect of swallowing a number of possible exceptions.
It does. Right now the only possible exception coming fromBlockId.applyisIllegalStateException, but I agree that usingTrymakes the code fragile. My intention was to make an easy to read proof of concept and then adjust the patch according to the feedback.
What do you think about adding a "safe" parsing method to BlockId? The metho would return an Option instead of throwing and would only be used in DiskBlockManager for now? BlockId.apply can then be expressed parse(s).getOrElse { throw ... }.
Alernatively, we can expose Block.isValid which would combine the individual regexes into one.
But is there a more explicit way of excluding temp files? it seems like getAllFiles shouldn't return those either?
I think this is tricky to do without leaking some abstractions, e.g. the ".UUID" naming scheme in Utils.tempFileWith and Temp*BlockId naming scheme.
jiangxb1987
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, cc @cloud-fan
|
Instead of filtering out temp blocks, why not adding parsing rule for |
Because this fixes the issue for 2 out of 3 possible temp files. The unhandled case is produced by
I agree that it makes sense to keep those in sync, therefore I prefer to introduce
Possibly, but in any case |
|
Yes, I agree in any case it should not throw an exception. But in this PR you filtered out temp shuffle/local blocks, do you think this block is valid or not, are they blocks? So I'd like not filtering out those blocks, instead adding two parsing rules for those blocks. And for any other illegal files (cannot be parsed) catch and log the exception. |
|
I would say that temporary shuffle blocks are also blocks. As for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks not so necessary to define a new method guess for the use here only. I think here we can still use apply and catch/log the exception. In another word, we can simply changes apply() and use it here, defining new guess method is not so necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for using try-catch here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do. Should I log the exception even if the file has been produced by Utils.tempFileWith?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't need to log here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean we don't need to log at all?
|
ping @superbobry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: extend SparkException
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My question here: is it really something we need to warn users? What should they do about it? Here we are getting block ids from file names, so it's legal to meet a temporary file at this place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not convinced we should log either. I've added this line because it was suggested by @jerryshao.
|
retest this please |
|
Test build #83014 has finished for PR 19458 at commit
|
…files Prior to this commit getAllBlocks implicitly assumed that the directories managed by the DiskBlockManager contain only the files corresponding to valid block IDs. In reality this assumption was violated during shuffle, which produces temporary files in the same directory as the resulting blocks. As a result, calls to getAllBlocks during shuffle were unreliable. The fix could be made more efficient, but this is probably good enough.
This commit also introduces a safe alternative to BlockId.apply which return option instead of throwing an exception.
cf9db5c to
ff9a6ae
Compare
|
retest this please |
|
There's a UT failure (https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83014/testReport/junit/org.apache.spark.storage/BlockIdSuite/test_bad_deserialization/). @superbobry please fix this failure. |
|
Test build #83034 has finished for PR 19458 at commit
|
|
retest this please. |
|
Test build #83043 has finished for PR 19458 at commit
|
|
retest this please |
|
Test build #83048 has finished for PR 19458 at commit
|
…files Prior to this commit getAllBlocks implicitly assumed that the directories managed by the DiskBlockManager contain only the files corresponding to valid block IDs. In reality, this assumption was violated during shuffle, which produces temporary files in the same directory as the resulting blocks. As a result, calls to getAllBlocks during shuffle were unreliable. The fix could be made more efficient, but this is probably good enough. `DiskBlockManagerSuite` Author: Sergei Lebedev <[email protected]> Closes #19458 from superbobry/block-id-option. (cherry picked from commit b377ef1) Signed-off-by: Wenchen Fan <[email protected]>
|
thanks, merging to master/2.2 |
…files Prior to this commit getAllBlocks implicitly assumed that the directories managed by the DiskBlockManager contain only the files corresponding to valid block IDs. In reality, this assumption was violated during shuffle, which produces temporary files in the same directory as the resulting blocks. As a result, calls to getAllBlocks during shuffle were unreliable. The fix could be made more efficient, but this is probably good enough. `DiskBlockManagerSuite` Author: Sergei Lebedev <[email protected]> Closes apache#19458 from superbobry/block-id-option. (cherry picked from commit b377ef1) Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
Prior to this commit getAllBlocks implicitly assumed that the directories
managed by the DiskBlockManager contain only the files corresponding to
valid block IDs. In reality, this assumption was violated during shuffle,
which produces temporary files in the same directory as the resulting
blocks. As a result, calls to getAllBlocks during shuffle were unreliable.
The fix could be made more efficient, but this is probably good enough.
How was this patch tested?
DiskBlockManagerSuite