Skip to content

Conversation

@gengliangwang
Copy link
Member

@gengliangwang gengliangwang commented Apr 15, 2019

What changes were proposed in this pull request?

This patch modifies SparkBuild so that the largest / slowest test suites (or collections of suites) can run in their own forked JVMs, allowing them to be run in parallel with each other. This opt-in / whitelisting approach allows us to increase parallelism without having to fix a long-tail of flakiness / brittleness issues in tests which aren't performance bottlenecks.

See comments in SparkBuild.scala for information on the details, including a summary of why we sometimes opt to run entire groups of tests in a single forked JVM .

The time of full new pull request test in Jenkins is reduced by around 53%:
before changes: 4hr 40min
after changes: 2hr 13min

How was this patch tested?

Unit test

@SparkQA
Copy link

SparkQA commented Apr 15, 2019

Test build #104584 has finished for PR 24373 at commit 2600785.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Apr 15, 2019

It's worth trying, but it won't help the Maven build. What about rewriting the test suites to just run the test cases in parallel? http://doc.scalatest.org/3.0.1-2.12/org/scalatest/ParallelTestExecution.html does this.

// addition to JVM startup time and JIT warmup, it appears that initialization of Derby
// metastores can be very slow so creating a fresh warehouse per suite is inefficient.
//
// 2. When parallelizing within a project we need to give each forked JVM a different tmpdir
Copy link
Member Author

@gengliangwang gengliangwang Apr 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srowen It is hard to successfully run test cases in parallel. See the comment here. E.g, it possible that multiple test cases use the same table location spark-warehouse/t for a temporary table.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get it, but this won't help the Maven build and is kind of brittle. Is it really hard to just set temp dirs differently for different suites?

Can a suite run suites in scalatest? and parallelize suites that way?

Copy link
Member Author

@gengliangwang gengliangwang Apr 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get it, but this won't help the Maven build and is kind of brittle.

From the Jenkins log, I can see that we are using SBT for test PRs. And I can see that the test time of this PR is about 106 minutes from the email notification, which is much better now (before changes it takes around 3.5 hours from https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/104580/testReport/history/)

Is it really hard to just set temp dirs differently for different suites?

I think fixing that problem would be a huge amount of work for limited payoff in most cases because most test suites are short-running.

Can a suite run suites in scalatest? and parallelize suites that way?

Do you mean run all test suite in parallel? We can enable parallelExecution in Test (http://www.scalatest.org/user_guide/using_scalatest_with_sbt). But we still face the problem of colliding warehouse paths.

@gengliangwang
Copy link
Member Author

@gatorsmile @cloud-fan

@SparkQA
Copy link

SparkQA commented Apr 15, 2019

Test build #104591 has finished for PR 24373 at commit 2e2c8f1.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 15, 2019

Test build #104594 has finished for PR 24373 at commit fded60e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

@srowen SBT tests are being used for PR tests. Let us improve the SBT tests first and at least it can speed up the development speeds.

@srowen
Copy link
Member

srowen commented Apr 15, 2019

Yeah I think that's a good argument. it's a bit of a hack but not bad.

@gatorsmile
Copy link
Member

gatorsmile commented Apr 15, 2019

I just triggered 6 tests. Let us see whether the tests become more flaky, or it could introduce new flaky tests.

@SparkQA
Copy link

SparkQA commented Apr 15, 2019

Test build #4712 has finished for PR 24373 at commit fded60e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 15, 2019

Test build #4710 has finished for PR 24373 at commit fded60e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 15, 2019

Test build #4708 has finished for PR 24373 at commit fded60e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 15, 2019

Test build #4709 has finished for PR 24373 at commit fded60e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 15, 2019

Test build #4711 has finished for PR 24373 at commit fded60e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 15, 2019

Test build #4707 has finished for PR 24373 at commit fded60e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 16, 2019

Test build #104609 has finished for PR 24373 at commit 475c8ef.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 16, 2019

Test build #4713 has finished for PR 24373 at commit 475c8ef.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 16, 2019

Test build #4714 has finished for PR 24373 at commit 475c8ef.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 16, 2019

Test build #4715 has finished for PR 24373 at commit 475c8ef.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gengliangwang
Copy link
Member Author

retest this please.

2 similar comments
@gengliangwang
Copy link
Member Author

retest this please.

@gengliangwang
Copy link
Member Author

retest this please.

@SparkQA
Copy link

SparkQA commented Apr 16, 2019

Test build #104615 has finished for PR 24373 at commit d2ac3af.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dilipbiswal
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Apr 16, 2019

Test build #104616 has finished for PR 24373 at commit d2ac3af.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 17, 2019

Test build #4731 has finished for PR 24373 at commit 5047e5a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 17, 2019

Test build #4735 has finished for PR 24373 at commit 57e18c2.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 17, 2019

Test build #104670 has finished for PR 24373 at commit 57e18c2.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 17, 2019

Test build #4734 has finished for PR 24373 at commit 57e18c2.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 17, 2019

Test build #4736 has finished for PR 24373 at commit 57e18c2.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 17, 2019

Test build #4737 has finished for PR 24373 at commit 57e18c2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

+1 Looks worth trying.

@SparkQA
Copy link

SparkQA commented Apr 18, 2019

Test build #4740 has finished for PR 24373 at commit 70874b7.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 18, 2019

Test build #104690 has finished for PR 24373 at commit 70874b7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 18, 2019

Test build #4739 has finished for PR 24373 at commit 70874b7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 18, 2019

Test build #4742 has finished for PR 24373 at commit 70874b7.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 18, 2019

Test build #4741 has finished for PR 24373 at commit 70874b7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master! Let's see how the PR build works.

@cloud-fan cloud-fan closed this in 9c238b8 Apr 18, 2019
@cloud-fan
Copy link
Contributor

Oops, I triggered the merge script at 70874b7 , and then got stuck by network until now. @gengliangwang can you send a new PR for the commits after 70874b7?

@SparkQA
Copy link

SparkQA commented Apr 18, 2019

Test build #104699 has finished for PR 24373 at commit 4b61d0a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gengliangwang
Copy link
Member Author

Oops, I triggered the merge script at 70874b7 , and then got stuck by network until now. @gengliangwang can you send a new PR for the commits after 70874b7?

Sure, I have created #24404 .

@SparkQA
Copy link

SparkQA commented Apr 18, 2019

Test build #4744 has finished for PR 24373 at commit 4b61d0a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 18, 2019

Test build #4743 has finished for PR 24373 at commit 4b61d0a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 18, 2019

Test build #4746 has finished for PR 24373 at commit 4b61d0a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 18, 2019

Test build #4747 has finished for PR 24373 at commit 4b61d0a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 18, 2019

Test build #4745 has finished for PR 24373 at commit 4b61d0a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Hi, All.
We are hitting the timeout in branch-2.4 SBT build, too.
Instead of increasing the timeout, it would be great if we can have this in branch-2.4.
I'll make a backporting PR since branch-2.4 is our LTS branch.

dongjoon-hyun pushed a commit that referenced this pull request Sep 20, 2019
…rked JVMs for higher parallelism

## What changes were proposed in this pull request?

This is a backport of #24373 , #24404 and #24434

This patch modifies SparkBuild so that the largest / slowest test suites (or collections of suites) can run in their own forked JVMs, allowing them to be run in parallel with each other. This opt-in / whitelisting approach allows us to increase parallelism without having to fix a long-tail of flakiness / brittleness issues in tests which aren't performance bottlenecks.

See comments in SparkBuild.scala for information on the details, including a summary of why we sometimes opt to run entire groups of tests in a single forked JVM .

The time of full new pull request test in Jenkins is reduced by around 53%:
before changes: 4hr 40min
after changes: 2hr 13min

## How was this patch tested?

Unit test

Closes #25861 from dongjoon-hyun/SPARK-27460.

Lead-authored-by: Gengliang Wang <[email protected]>
Co-authored-by: gatorsmile <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants