-
Notifications
You must be signed in to change notification settings - Fork 9.2k
HDFS-16422. Fix thread safety of EC decoding during concurrent preads #3881
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
💔 -1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
@sodonnel Do you have time to take a look on this pr? Thanks a lot. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, we should make RawErasureDecoder.decode(ByteBuffer[] inputs, int[] erasedIndexes, ByteBuffer[] outputs) synchronized instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your review. @jojochuang
Yes, that's better. I will update it in a few minutes.
|
💔 -1 overall
This message was automatically generated. |
65aedd2 to
4045218
Compare
|
💔 -1 overall
This message was automatically generated. |
4045218 to
a1b7145
Compare
|
💔 -1 overall
This message was automatically generated. |
|
@whbing Did you encounter this problem? |
jojochuang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fix looks good but I don't like the test.
It looks like a very heavy test for something trivial. The test ran for 5 minutes and timed out on my laptop. From the look of the test it seems to have some randomness embedded in it.
IMO, I can accept the PR if the test is removed. The fix itself is good enough.
|
Thanks for your review. @jojochuang |
|
💔 -1 overall
This message was automatically generated. |
59885e8 to
7f72a4c
Compare
|
💔 -1 overall
This message was automatically generated. |
(cherry picked from commit 0e74f1e)
|
Thanks for the fix. Should we also make the encoder thread safe? Lines 40 to 41 in 7f72a4c
|
|
Not sure.... there's a reentrant lock inside AbstractNativeRawEncoder but it's the read lock that's held, not write lock, to improve concurrency (HADOOP-15499 I suppose data race problem is considered? |
|
hey, please remember to keep the JIRA ID on the commit message. thanks. |
(cherry picked from commit 0e74f1e)
…apache#3881) (cherry picked from commit 0e74f1e) Change-Id: I8ad77ac5a0e967645c3c15844cb3b5dc4c30a89f
(cherry picked from commit 0e74f1e)
…rrent preads (apache#3881) (cherry picked from commit 0e74f1e) (cherry picked from commit 9071c96) (cherry picked from commit 6219247) Change-Id: I6aae0c9c8f29ed5b1746bd1e37e7880e9f1d4929
Reading data on an erasure-coded file with missing replicas(internal block of block group) will cause online reconstruction: read
dataUnitspart of data and decode them into the target missing data. EachDFSStripedInputStreamobject has aRawErasureDecoderobject, and when we doing pread concurrently,RawErasureDecoder.decodewill be invoked concurrently too.RawErasureDecoder.decodeis not thread safe, as a result of that we get wrong data from pread occasionally.