Skip to content

Conversation

@srowen
Copy link
Member

@srowen srowen commented Sep 6, 2018

What changes were proposed in this pull request?

This adds a test following #21638

How was this patch tested?

Existing tests and new test.

@srowen
Copy link
Member Author

srowen commented Sep 6, 2018

CC @bomeng @gatorsmile

Copy link
Contributor

@imatiach-msft imatiach-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice test!

Copy link
Contributor

@imatiach-msft imatiach-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments

StandardCharsets.UTF_8)
}

assert(sc.binaryFiles(tempDirPath, minPartitions = 1).getNumPartitions === 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: maybe put these three asserts in a loop

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, sure


test("SPARK-22357 test binaryFiles minPartitions") {
sc = new SparkContext(new SparkConf().setAppName("test").setMaster("local")
.set("spark.files.openCostInBytes", "0")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this setting needed: spark.files.openCostInBytes

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This removes its effect in the section of code we're really trying to test:

def setMinPartitions(sc: SparkContext, context: JobContext, minPartitions: Int) {
    val defaultMaxSplitBytes = sc.getConf.get(config.FILES_MAX_PARTITION_BYTES)
    val openCostInBytes = sc.getConf.get(config.FILES_OPEN_COST_IN_BYTES)
    val defaultParallelism = Math.max(sc.defaultParallelism, minPartitions)
    val files = listStatus(context).asScala
    val totalBytes = files.filterNot(_.isDirectory).map(_.getLen + openCostInBytes).sum
    val bytesPerCore = totalBytes / defaultParallelism
    val maxSplitSize = Math.min(defaultMaxSplitBytes, Math.max(openCostInBytes, bytesPerCore))
    super.setMaxSplitSize(maxSplitSize)
  }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, I see, thanks for pointing that out!

@SparkQA
Copy link

SparkQA commented Sep 6, 2018

Test build #95769 has finished for PR 22356 at commit 84dd4a7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 6, 2018

Test build #95771 has finished for PR 22356 at commit 6e1d8fd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@bomeng
Copy link
Contributor

bomeng commented Sep 6, 2018

Thanks for taking my codes. Looks good.

@srowen
Copy link
Member Author

srowen commented Sep 7, 2018

Merged to master/2.4

asfgit pushed a commit that referenced this pull request Sep 7, 2018
…itions parameter

## What changes were proposed in this pull request?

This adds a test following #21638

## How was this patch tested?

Existing tests and new test.

Closes #22356 from srowen/SPARK-22357.2.

Authored-by: Sean Owen <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
(cherry picked from commit 4e3365b)
Signed-off-by: Sean Owen <[email protected]>
@asfgit asfgit closed this in 4e3365b Sep 7, 2018
@srowen srowen deleted the SPARK-22357.2 branch September 20, 2018 10:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants