-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-20868][CORE] UnsafeShuffleWriter should verify the position after FileChannel.transferTo #18091
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @JoshRosen |
|
I'm reviewing this now. For reference, #2824 was the earlier PR this references. |
|
Test build #77304 has finished for PR 18091 at commit
|
|
Hmm, looks like the test failures are legitimate. |
| for (int partition = 0; partition < numPartitions; partition++) { | ||
| for (int i = 0; i < spills.length; i++) { | ||
| final long partitionLengthInSpill = spills[i].partitionLengths[partition]; | ||
| long bytesToTransfer = partitionLengthInSpill; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit: we don't need this bytesToTransfer anymore. We can remove this mutable variable and just use partitionLengthInSpill in its place.
| while (count < bytesToCopy) { | ||
| count += input.transferTo(count, bytesToCopy - count, output) | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add an assert(count == bytesToCopy) here, just to be safe?
| spillInputChannelPositions[i] += actualBytesTransferred; | ||
| bytesToTransfer -= actualBytesTransferred; | ||
| } | ||
| Utils.copyFileStreamNIO(spillInputChannel, mergedFileOutputChannel, bytesToTransfer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not equivalent to the old code. The copyFileStreamNIO method is assuming that you're starting to transfer from position 0 in the input channel, which is only true on the first iteration of the outer loop.
I think you need to add a fourth argument to copyFileStreamNIO to specify the starting position of the input channel.
|
Test build #77323 has finished for PR 18091 at commit
|
|
retest this please |
|
Test build #77338 has started for PR 18091 at commit |
|
retest this please |
|
Test build #77350 has finished for PR 18091 at commit
|
|
retest this please |
|
Test build #77364 has finished for PR 18091 at commit
|
|
LGTM |
…ter FileChannel.transferTo ## What changes were proposed in this pull request? Long time ago we fixed a [bug](https://issues.apache.org/jira/browse/SPARK-3948) in shuffle writer about `FileChannel.transferTo`. We were not very confident about that fix, so we added a position check after the writing, try to discover the bug earlier. However this checking is missing in the new `UnsafeShuffleWriter`, this PR adds it. https://issues.apache.org/jira/browse/SPARK-18105 maybe related to that `FileChannel.transferTo` bug, hopefully we can find out the root cause after adding this position check. ## How was this patch tested? N/A Author: Wenchen Fan <[email protected]> Closes #18091 from cloud-fan/shuffle. (cherry picked from commit d9ad789) Signed-off-by: Wenchen Fan <[email protected]>
…ter FileChannel.transferTo ## What changes were proposed in this pull request? Long time ago we fixed a [bug](https://issues.apache.org/jira/browse/SPARK-3948) in shuffle writer about `FileChannel.transferTo`. We were not very confident about that fix, so we added a position check after the writing, try to discover the bug earlier. However this checking is missing in the new `UnsafeShuffleWriter`, this PR adds it. https://issues.apache.org/jira/browse/SPARK-18105 maybe related to that `FileChannel.transferTo` bug, hopefully we can find out the root cause after adding this position check. ## How was this patch tested? N/A Author: Wenchen Fan <[email protected]> Closes #18091 from cloud-fan/shuffle. (cherry picked from commit d9ad789) Signed-off-by: Wenchen Fan <[email protected]>
…ter FileChannel.transferTo ## What changes were proposed in this pull request? Long time ago we fixed a [bug](https://issues.apache.org/jira/browse/SPARK-3948) in shuffle writer about `FileChannel.transferTo`. We were not very confident about that fix, so we added a position check after the writing, try to discover the bug earlier. However this checking is missing in the new `UnsafeShuffleWriter`, this PR adds it. https://issues.apache.org/jira/browse/SPARK-18105 maybe related to that `FileChannel.transferTo` bug, hopefully we can find out the root cause after adding this position check. ## How was this patch tested? N/A Author: Wenchen Fan <[email protected]> Closes #18091 from cloud-fan/shuffle. (cherry picked from commit d9ad789) Signed-off-by: Wenchen Fan <[email protected]>
|
thanks for the review, merging to master/2.2/2.1/2.0! |
What changes were proposed in this pull request?
Long time ago we fixed a bug in shuffle writer about
FileChannel.transferTo. We were not very confident about that fix, so we added a position check after the writing, try to discover the bug earlier.However this checking is missing in the new
UnsafeShuffleWriter, this PR adds it.https://issues.apache.org/jira/browse/SPARK-18105 maybe related to that
FileChannel.transferTobug, hopefully we can find out the root cause after adding this position check.How was this patch tested?
N/A