[SPARK-22357][CORE][FOLLOWUP] SparkContext.binaryFiles ignore minPartitions parameter #22356

srowen · 2018-09-06T18:20:44Z

What changes were proposed in this pull request?

This adds a test following #21638

How was this patch tested?

Existing tests and new test.

srowen · 2018-09-06T18:20:58Z

CC @bomeng @gatorsmile

imatiach-msft

nice test!

imatiach-msft

comments

imatiach-msft · 2018-09-06T18:26:02Z

core/src/test/scala/org/apache/spark/FileSuite.scala

+        StandardCharsets.UTF_8)
+    }
+
+    assert(sc.binaryFiles(tempDirPath, minPartitions = 1).getNumPartitions === 1)


nitpick: maybe put these three asserts in a loop

imatiach-msft · 2018-09-06T18:27:03Z

core/src/test/scala/org/apache/spark/FileSuite.scala


+  test("SPARK-22357 test binaryFiles minPartitions") {
+    sc = new SparkContext(new SparkConf().setAppName("test").setMaster("local")
+      .set("spark.files.openCostInBytes", "0")


why is this setting needed: spark.files.openCostInBytes

This removes its effect in the section of code we're really trying to test:

def setMinPartitions(sc: SparkContext, context: JobContext, minPartitions: Int) { val defaultMaxSplitBytes = sc.getConf.get(config.FILES_MAX_PARTITION_BYTES) val openCostInBytes = sc.getConf.get(config.FILES_OPEN_COST_IN_BYTES) val defaultParallelism = Math.max(sc.defaultParallelism, minPartitions) val files = listStatus(context).asScala val totalBytes = files.filterNot(_.isDirectory).map(_.getLen + openCostInBytes).sum val bytesPerCore = totalBytes / defaultParallelism val maxSplitSize = Math.min(defaultMaxSplitBytes, Math.max(openCostInBytes, bytesPerCore)) super.setMaxSplitSize(maxSplitSize) }

ah, I see, thanks for pointing that out!

SparkQA · 2018-09-06T22:50:10Z

Test build #95769 has finished for PR 22356 at commit 84dd4a7.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-09-06T23:12:47Z

Test build #95771 has finished for PR 22356 at commit 6e1d8fd.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

bomeng · 2018-09-06T23:22:22Z

Thanks for taking my codes. Looks good.

srowen · 2018-09-07T04:43:42Z

Merged to master/2.4

…itions parameter ## What changes were proposed in this pull request? This adds a test following #21638 ## How was this patch tested? Existing tests and new test. Closes #22356 from srowen/SPARK-22357.2. Authored-by: Sean Owen <[email protected]> Signed-off-by: Sean Owen <[email protected]> (cherry picked from commit 4e3365b) Signed-off-by: Sean Owen <[email protected]>

Add test for binaryFiles minPartitions

84dd4a7

imatiach-msft approved these changes Sep 6, 2018

View reviewed changes

Make tests a loop

6e1d8fd

HyukjinKwon approved these changes Sep 6, 2018

View reviewed changes

asfgit closed this in 4e3365b Sep 7, 2018

srowen deleted the SPARK-22357.2 branch September 20, 2018 10:52

[SPARK-22357][CORE][FOLLOWUP] SparkContext.binaryFiles ignore minPartitions parameter #22356

[SPARK-22357][CORE][FOLLOWUP] SparkContext.binaryFiles ignore minPartitions parameter #22356

Uh oh!

Conversation

srowen commented Sep 6, 2018

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

srowen commented Sep 6, 2018

Uh oh!

imatiach-msft left a comment

Choose a reason for hiding this comment

Uh oh!

imatiach-msft left a comment

Choose a reason for hiding this comment

Uh oh!

imatiach-msft Sep 6, 2018

Choose a reason for hiding this comment

Uh oh!

srowen Sep 6, 2018

Choose a reason for hiding this comment

Uh oh!

imatiach-msft Sep 6, 2018

Choose a reason for hiding this comment

Uh oh!

srowen Sep 6, 2018

Choose a reason for hiding this comment

Uh oh!

imatiach-msft Sep 6, 2018

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Sep 6, 2018

Uh oh!

SparkQA commented Sep 6, 2018

Uh oh!

bomeng commented Sep 6, 2018

Uh oh!

srowen commented Sep 7, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants