[SPARK-28570][CORE][SHUFFLE] Make UnsafeShuffleWriter use the new API. #25304

mccheah · 2019-07-31T00:21:28Z

What changes were proposed in this pull request?

Uses the APIs introduced in SPARK-28209 in the UnsafeShuffleWriter.

How was this patch tested?

Since this is just a refactor, existing unit tests should cover the relevant code paths. Micro-benchmarks from the original fork where this code was built show no degradation in performance.

mccheah · 2019-07-31T00:25:26Z

@squito @vanzin @ifilonenko @yifeih

SparkQA · 2019-07-31T00:35:24Z

Test build #108423 has finished for PR 25304 at commit 0ee9311.

This patch fails Java style tests.
This patch merges cleanly.
This patch adds no public classes.

core/src/main/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriter.java

SparkQA · 2019-07-31T02:26:43Z

Test build #108425 has finished for PR 25304 at commit 1d08d80.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-07-31T03:23:07Z

Test build #108427 has finished for PR 25304 at commit a02448c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

core/src/main/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriter.java

squito · 2019-07-31T18:27:15Z

core/src/main/java/org/apache/spark/shuffle/sort/io/LocalDiskShuffleMapOutputWriter.java

  private void initStream() throws IOException {
    if (outputFileStream == null) {
+      // This file needs to opened in append mode in order to work around a Linux kernel bug that
+      // affects transferTo; see SPARK-3948 for more details.


comment is actually more appropriate in initChannel() (transferTo is only used w/ the channel).

btw, why does initChannel set the member variable outputFileStream? Couldn't it just set

outputChannel = new FileOutputStream(outputTempFile, true).getChannel();?

SparkQA · 2019-07-31T20:09:39Z

Test build #108487 has finished for PR 25304 at commit 86b2c4f.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-08-03T01:37:52Z

Test build #108587 has finished for PR 25304 at commit fc798b7.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

mccheah · 2019-08-05T18:43:33Z

@squito addressed the comments here.

squito

minor things, but I think it mostly looks good

squito · 2019-08-06T15:28:57Z

core/src/main/java/org/apache/spark/shuffle/api/SingleFileShuffleMapOutputWriter.java

+public interface SingleFileShuffleMapOutputWriter {
+
+  /**
+   * Transfer a file that contains the bytes of all the splits written by this map task.


splits -> partitions

squito · 2019-08-06T15:32:58Z

core/src/main/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriter.java

+              shuffleId, mapId, taskContext.taskAttemptId());
+      if (maybeSingleFileWriter.isPresent()) {
+        // Here, we don't need to perform any metrics updates because the bytes written to this
+        // output file would have already been counted as shuffle bytes written.


this comment is only true for the local disk implementation. eg. if some other implementation did take advantage of the single merged file somehow, and wrote it all to a remote store, it would be doing another write.

But I am not really worried about this, as I don't think any other store will actually use this ...

Where could we move this?

@squito I think this SingleSpillShuffleMapOutputWriter can be pretty useful, it may avoid some byte-to-byte read/write, instead the custom store can have implementation with more performant.

I dunno if there is a great alternative. We could say its the job of individual implementations to increment the metrics, and then move this comment into LocalDiskSingleSpillMapOutputWriter on why the metrics aren't incremented. But we're specifically trying to avoid exposing metrics to the api. you could also have transferMapSpillFile() return the number of bytes written, and then the existing implementation would return 0.

It all kinda feels like overkill to me. @gczsjdy I agree its possible for another store to take advantage of this, but do you have a specific case in mind? I'd like to avoid adding too many things to the api and keep things simple (with odd cases just to support the existing implementation).

squito · 2019-08-06T15:34:04Z

core/src/main/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriter.java

+      //
+      // We allow the individual merge methods to report their own IO times since different merge
+      // strategies use different IO techniques.  We count IO during merge towards the shuffle
+      // shuffle write time, which appears to be consistent with the "not bypassing merge-sort"


typo: shuffle shuffle (it was there before, might as well fix it now)

squito · 2019-08-06T15:38:04Z

core/src/main/java/org/apache/spark/shuffle/sort/io/LocalDiskSingleFileMapOutputWriter.java

+import org.apache.spark.util.Utils;
+
+public class LocalDiskSingleFileMapOutputWriter
+    implements SingleFileShuffleMapOutputWriter {


sorry I know this was my naming suggestion earlier, but after more thought, how about File -> Spill?
LocalDiskSingleSpillMapOutputWriter
SingleSpillShuffleMapOutputWriter

squito · 2019-08-06T15:42:51Z

core/src/main/java/org/apache/spark/shuffle/sort/io/LocalDiskSingleFileMapOutputWriter.java

+  public void transferMapOutputFile(
+      File mapOutputFile,
+      long[] partitionLengths) throws IOException {
+    File outputFile = blockResolver.getDataFile(shuffleId, mapId);


I think a brief comment here would help, eg. "we've only got one spill file, which is already in the right format of the final data file. So no merging to do, just move it to the right location"

squito · 2019-08-06T15:43:08Z

core/src/main/java/org/apache/spark/shuffle/sort/io/LocalDiskSingleFileMapOutputWriter.java

+
+  @Override
+  public void transferMapOutputFile(
+      File mapOutputFile,


and maybe rename this to mapSpillFile

squito · 2019-08-06T15:43:38Z

core/src/main/java/org/apache/spark/shuffle/api/SingleFileShuffleMapOutputWriter.java

+import java.io.File;
+
+import java.io.IOException;
+import org.apache.spark.annotation.Private;


import grouping

squito · 2019-08-06T15:52:05Z

core/src/main/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriter.java

+        try {
+          for (int i = 0; i < spills.length; i++) {
+            long partitionLengthInSpill = 0L;
+            partitionLengthInSpill += spills[i].partitionLengths[partition];


final long partitionLengthInSpill = spills[i].partitionLengths[partition];

I think this comment got overlooked (probably because github hides long diffs, I hate that ...)

vanzin · 2019-08-06T21:50:44Z

core/src/main/java/org/apache/spark/shuffle/api/ShuffleExecutorComponents.java

+   *                         with the same (shuffleId, mapId) pair can be distinguished by the
+   *                         different values of mapTaskAttemptId.
+   */
+  default Optional<SingleFileShuffleMapOutputWriter> createSingleFileMapOutputWriter(


This feels like a lot of indirection to implement one method... what about returning a boolean if the transfer is supported, or throwing UnsupportedOperationException (although that's a bit slower)?

(If the transfer is supported but fails you'd still throw an IOException.)

I prefer Optional to both of these, specifically because throwing an exception by default forces a try...catch on the caller, and a boolean requires a separate method call outside of the plugin tree to implement which splices the logic between the plugin tree and the UnsafeShuffleWriter.

mccheah · 2019-08-17T00:13:34Z

Addressed most things, except #25304 (comment) - not sure how important it is or where to move the comment if at all.

SparkQA · 2019-08-17T00:44:01Z

Test build #109243 has finished for PR 25304 at commit 48335fa.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gczsjdy · 2019-08-17T02:44:37Z

core/src/main/java/org/apache/spark/shuffle/api/ShuffleExecutorComponents.java

+   * single partition file, as the entire result of a map task, to the backing store.
+   * <p>
+   * Most implementations should return the default {@link Optional#empty()} to indicate that
+   * they do not support this optimization. This primarily is for backwards-compatibility in


Probably it's better to indicate what kinds of implementations may support this optimization? Otherwise it's confusing. I think the storage that has API like move supports this optimization, is this right?

Truth be told, even plugins that support remote FS move would unlikely be able to support this well - one would still have to transfer the whole file up to the remote storage layer, but that could just as easily be done by writing the data from the file through an output stream.

I think only implementations that stage the files locally could support this in any meaningful way at all.

I'm ok with leaving out the docs if only because very very few implementations should even care about this API.

yeah I don't think its necessary to go into more details. I would say the "casual" plugin developer wouldn't bother with this, and if they're really serious, they can look at what the existing implementation does. The comment is sufficient for that.

SparkQA · 2019-08-17T02:55:26Z

Test build #109255 has finished for PR 25304 at commit cf046df.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

mccheah · 2019-08-22T12:36:30Z

@squito or @vanzin - wanted to check if this is ok to merge.

…ctor-unsafe-writer

mccheah · 2019-08-27T00:45:19Z

retest this please

SparkQA · 2019-08-27T03:20:11Z

Test build #109763 has finished for PR 25304 at commit 0cb1a44.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

…ctor-unsafe-writer

SparkQA · 2019-08-31T01:58:09Z

Test build #109972 has finished for PR 25304 at commit c125a14.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

mccheah · 2019-09-03T20:00:38Z

@vanzin or @squito is this good to merge?

vanzin

Just minor things.

vanzin · 2019-09-04T17:28:40Z

core/src/main/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriter.java

+              mapId,
+              taskContext.taskAttemptId(),
+              partitioner.numPartitions());
+      mapWriter.commitAllPartitions();


Can you just return the value returned here instead of creating a new array?

vanzin · 2019-09-04T17:34:57Z

core/src/main/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriter.java

-          writeMetrics.incWriteTime(System.nanoTime() - writeStartTime);
-          bytesWrittenToMergedFile += partitionLengthInSpill;
-          partitionLengths[partition] += partitionLengthInSpill;
+        boolean copyThrewExecption = true;


vanzin · 2019-09-04T17:39:08Z

core/src/main/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriter.java

+        boolean copyThrewExecption = true;
+        ShufflePartitionWriter writer = mapWriter.getPartitionWriter(partition);
+        WritableByteChannelWrapper resolvedChannel = writer.openChannelWrapper()
+            .orElseGet(() -> new StreamFallbackChannelWrapper(openStreamUnchecked(writer)));


I commented on another PR about this, but StreamFallbackChannelWrapper is probably going to be slower than calling mergeSpillsWithFileStream in this case, because of the extra buffering the Channels.createChannel wrapper adds.

Probably not a huge deal, but seems easy to call mergeSpillsWithFileStream if the implementation does not return a channel wrapper.

The tricky part here is that you only know if the channel wrapper is supported when we've started looking through the partitions in this loop - which would mean we would have to abort iteration early and switch the merge strategy. Can we leave it as-is?

Argh, another reason why I prefer explicit capability checks over using optionals. Ok to leave like this for now, I guess.

vanzin · 2019-09-04T17:39:25Z

core/src/main/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriter.java

+            final FileChannel spillInputChannel = spillInputChannels[i];
+            final long writeStartTime = System.nanoTime();
+            Utils.copyFileStreamNIO(
+                    spillInputChannel,


This looks indented too far.

vanzin · 2019-09-04T17:40:17Z

core/src/main/java/org/apache/spark/shuffle/sort/io/LocalDiskShuffleMapOutputWriter.java

+    // after calling transferTo in kernel version 2.6.32. This issue is described at
+    // https://bugs.openjdk.java.net/browse/JDK-7052359 and SPARK-3948.
+    if (outputFileChannel != null
+        && outputFileChannel.position() != bytesWrittenToMergedFile) {


nit: && goes in previous line

vanzin · 2019-09-04T17:43:27Z

core/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java

    when(shuffleBlockResolver.getDataFile(anyInt(), anyInt())).thenReturn(mergedOutputFile);
-    doAnswer(invocationOnMock -> {
+
+    Answer renameTempAnswer = invocationOnMock -> {


This doesn't give you a "raw type" warning? (Answer has a type parameter.)

SparkQA · 2019-09-07T03:24:19Z

Test build #110273 has finished for PR 25304 at commit f8f95f3.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-09-08T07:05:02Z

Test build #110298 has finished for PR 25304 at commit 94a3653.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

mccheah · 2019-09-09T06:46:35Z

retest this please

SparkQA · 2019-09-09T07:05:02Z

Test build #110335 has finished for PR 25304 at commit 94a3653.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

vanzin · 2019-09-09T18:16:23Z

core/src/main/java/org/apache/spark/shuffle/api/ShuffleExecutorComponents.java


 import java.io.IOException;

+import java.util.Optional;


nit: import grouping

vanzin · 2019-09-09T18:18:37Z

core/src/main/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriter.java

    final CompressionCodec compressionCodec = CompressionCodec$.MODULE$.createCodec(sparkConf);
    final boolean fastMergeEnabled =
-      (boolean) sparkConf.get(package$.MODULE$.SHUFFLE_UNDAFE_FAST_MERGE_ENABLE());
+        (boolean) sparkConf.get(package$.MODULE$.SHUFFLE_UNDAFE_FAST_MERGE_ENABLE());


You didn't really need to touch these lines, but since you did, there's a typo in that constant's name.

vanzin · 2019-09-09T18:22:10Z

core/src/main/java/org/apache/spark/shuffle/sort/io/LocalDiskSingleSpillMapOutputWriter.java

+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import org.apache.spark.shuffle.IndexShuffleBlockResolver;


nit: add empty line before

vanzin · 2019-09-09T18:23:27Z

retest this please

squito · 2019-09-09T19:16:31Z

lgtm aside from marcelo's comments

SparkQA · 2019-09-09T20:49:08Z

Test build #110365 has finished for PR 25304 at commit 94a3653.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-09-10T05:07:49Z

Test build #110385 has finished for PR 25304 at commit 85a101d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

mccheah · 2019-09-10T17:14:08Z

@squito @vanzin good to merge this?

vanzin · 2019-09-11T00:29:32Z

Merging to master.

## What changes were proposed in this pull request? Uses the APIs introduced in SPARK-28209 in the UnsafeShuffleWriter. ## How was this patch tested? Since this is just a refactor, existing unit tests should cover the relevant code paths. Micro-benchmarks from the original fork where this code was built show no degradation in performance. Closes apache#25304 from mccheah/shuffle-writer-refactor-unsafe-writer. Lead-authored-by: mcheah <[email protected]> Co-authored-by: mccheah <[email protected]> Signed-off-by: Marcelo Vanzin <[email protected]>

mccheah added 2 commits July 30, 2019 17:16

[SPARK-28570][CORE][SHUFFLE] Make UnsafeShuffleWriter use the new API.

ca58a99

Don't do task context things

0ee9311

Import sorting

1d08d80

dongjoon-hyun added SPARK CORE SHUFFLE labels Jul 31, 2019

Move more imports

a02448c

squito reviewed Jul 31, 2019

View reviewed changes

Address comments

86b2c4f

squito reviewed Jul 31, 2019

View reviewed changes

core/src/main/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriter.java Show resolved Hide resolved

squito reviewed Jul 31, 2019

View reviewed changes

mccheah added 2 commits August 2, 2019 16:02

Put back optimization on single spill file

91d7062

Address other comments

fc798b7

squito reviewed Aug 6, 2019

View reviewed changes

vanzin reviewed Aug 6, 2019

View reviewed changes

mccheah added 2 commits August 16, 2019 15:06

Address comments

48335fa

Fix typo

cf046df

gczsjdy reviewed Aug 17, 2019

View reviewed changes

mccheah added 2 commits August 26, 2019 15:33

Address comment

97cd3e7

Merge remote-tracking branch 'origin/master' into shuffle-writer-refa…

37a72bc

…ctor-unsafe-writer

mccheah added 3 commits August 30, 2019 14:18

Merge remote-tracking branch 'origin/master' into shuffle-writer-refa…

320cafe

…ctor-unsafe-writer

Don't store partition lengths twice

41ceaea

Fix build

c125a14

vanzin reviewed Sep 4, 2019

View reviewed changes

Address comments

f8f95f3

Fix one more typo

94a3653

vanzin reviewed Sep 9, 2019

View reviewed changes

Address comments

85a101d

vanzin closed this in 7f36cd2 Sep 11, 2019

HyukjinKwon mentioned this pull request Sep 20, 2019

[SPARK-29072][CORE] Put back usage of TimeTrackingOutputStream for UnsafeShuffleWriter and SortShuffleWriter #25780

Closed

[SPARK-28570][CORE][SHUFFLE] Make UnsafeShuffleWriter use the new API. #25304

[SPARK-28570][CORE][SHUFFLE] Make UnsafeShuffleWriter use the new API. #25304

Uh oh!

Conversation

mccheah commented Jul 31, 2019

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

mccheah commented Jul 31, 2019

Uh oh!

SparkQA commented Jul 31, 2019

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SparkQA commented Jul 31, 2019

Uh oh!

SparkQA commented Jul 31, 2019

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jul 31, 2019

Uh oh!

SparkQA commented Aug 3, 2019

Uh oh!

mccheah commented Aug 5, 2019

Uh oh!

squito left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mccheah commented Aug 17, 2019

Uh oh!

SparkQA commented Aug 17, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Aug 17, 2019

Uh oh!

mccheah commented Aug 22, 2019

Uh oh!

mccheah commented Aug 27, 2019

Uh oh!

SparkQA commented Aug 27, 2019

Uh oh!

SparkQA commented Aug 31, 2019

vanzin Sep 9, 2019 •

edited

Loading