Skip to content

Conversation

@cloud-fan
Copy link
Contributor

What changes were proposed in this pull request?

Long time ago we fixed a bug in shuffle writer about FileChannel.transferTo. We were not very confident about that fix, so we added a position check after the writing, try to discover the bug earlier.

However this checking is missing in the new UnsafeShuffleWriter, this PR adds it.

https://issues.apache.org/jira/browse/SPARK-18105 maybe related to that FileChannel.transferTo bug, hopefully we can find out the root cause after adding this position check.

How was this patch tested?

N/A

@cloud-fan
Copy link
Contributor Author

cloud-fan commented May 24, 2017

cc @JoshRosen

@JoshRosen
Copy link
Contributor

I'm reviewing this now. For reference, #2824 was the earlier PR this references.

@SparkQA
Copy link

SparkQA commented May 24, 2017

Test build #77304 has finished for PR 18091 at commit fb3b31e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@JoshRosen
Copy link
Contributor

Hmm, looks like the test failures are legitimate.

for (int partition = 0; partition < numPartitions; partition++) {
for (int i = 0; i < spills.length; i++) {
final long partitionLengthInSpill = spills[i].partitionLengths[partition];
long bytesToTransfer = partitionLengthInSpill;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit: we don't need this bytesToTransfer anymore. We can remove this mutable variable and just use partitionLengthInSpill in its place.

while (count < bytesToCopy) {
count += input.transferTo(count, bytesToCopy - count, output)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add an assert(count == bytesToCopy) here, just to be safe?

spillInputChannelPositions[i] += actualBytesTransferred;
bytesToTransfer -= actualBytesTransferred;
}
Utils.copyFileStreamNIO(spillInputChannel, mergedFileOutputChannel, bytesToTransfer);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not equivalent to the old code. The copyFileStreamNIO method is assuming that you're starting to transfer from position 0 in the input channel, which is only true on the first iteration of the outer loop.

I think you need to add a fourth argument to copyFileStreamNIO to specify the starting position of the input channel.

@SparkQA
Copy link

SparkQA commented May 25, 2017

Test build #77323 has finished for PR 18091 at commit c79de07.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented May 25, 2017

Test build #77338 has started for PR 18091 at commit c79de07.

@cloud-fan
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented May 25, 2017

Test build #77350 has finished for PR 18091 at commit c79de07.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented May 25, 2017

Test build #77364 has finished for PR 18091 at commit c79de07.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jiangxb1987
Copy link
Contributor

LGTM

asfgit pushed a commit that referenced this pull request May 26, 2017
…ter FileChannel.transferTo

## What changes were proposed in this pull request?

Long time ago we fixed a [bug](https://issues.apache.org/jira/browse/SPARK-3948) in shuffle writer about `FileChannel.transferTo`. We were not very confident about that fix, so we added a position check after the writing, try to discover the bug earlier.

 However this checking is missing in the new `UnsafeShuffleWriter`, this PR adds it.

https://issues.apache.org/jira/browse/SPARK-18105 maybe related to that `FileChannel.transferTo` bug, hopefully we can find out the root cause after adding this position check.

## How was this patch tested?

N/A

Author: Wenchen Fan <[email protected]>

Closes #18091 from cloud-fan/shuffle.

(cherry picked from commit d9ad789)
Signed-off-by: Wenchen Fan <[email protected]>
asfgit pushed a commit that referenced this pull request May 26, 2017
…ter FileChannel.transferTo

## What changes were proposed in this pull request?

Long time ago we fixed a [bug](https://issues.apache.org/jira/browse/SPARK-3948) in shuffle writer about `FileChannel.transferTo`. We were not very confident about that fix, so we added a position check after the writing, try to discover the bug earlier.

 However this checking is missing in the new `UnsafeShuffleWriter`, this PR adds it.

https://issues.apache.org/jira/browse/SPARK-18105 maybe related to that `FileChannel.transferTo` bug, hopefully we can find out the root cause after adding this position check.

## How was this patch tested?

N/A

Author: Wenchen Fan <[email protected]>

Closes #18091 from cloud-fan/shuffle.

(cherry picked from commit d9ad789)
Signed-off-by: Wenchen Fan <[email protected]>
asfgit pushed a commit that referenced this pull request May 26, 2017
…ter FileChannel.transferTo

## What changes were proposed in this pull request?

Long time ago we fixed a [bug](https://issues.apache.org/jira/browse/SPARK-3948) in shuffle writer about `FileChannel.transferTo`. We were not very confident about that fix, so we added a position check after the writing, try to discover the bug earlier.

 However this checking is missing in the new `UnsafeShuffleWriter`, this PR adds it.

https://issues.apache.org/jira/browse/SPARK-18105 maybe related to that `FileChannel.transferTo` bug, hopefully we can find out the root cause after adding this position check.

## How was this patch tested?

N/A

Author: Wenchen Fan <[email protected]>

Closes #18091 from cloud-fan/shuffle.

(cherry picked from commit d9ad789)
Signed-off-by: Wenchen Fan <[email protected]>
@cloud-fan
Copy link
Contributor Author

thanks for the review, merging to master/2.2/2.1/2.0!

@asfgit asfgit closed this in d9ad789 May 26, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants