-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-27460][TESTS] Running slowest test suites in their own forked JVMs for higher parallelism #24373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #104584 has finished for PR 24373 at commit
|
|
It's worth trying, but it won't help the Maven build. What about rewriting the test suites to just run the test cases in parallel? http://doc.scalatest.org/3.0.1-2.12/org/scalatest/ParallelTestExecution.html does this. |
| // addition to JVM startup time and JIT warmup, it appears that initialization of Derby | ||
| // metastores can be very slow so creating a fresh warehouse per suite is inefficient. | ||
| // | ||
| // 2. When parallelizing within a project we need to give each forked JVM a different tmpdir |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@srowen It is hard to successfully run test cases in parallel. See the comment here. E.g, it possible that multiple test cases use the same table location spark-warehouse/t for a temporary table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get it, but this won't help the Maven build and is kind of brittle. Is it really hard to just set temp dirs differently for different suites?
Can a suite run suites in scalatest? and parallelize suites that way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get it, but this won't help the Maven build and is kind of brittle.
From the Jenkins log, I can see that we are using SBT for test PRs. And I can see that the test time of this PR is about 106 minutes from the email notification, which is much better now (before changes it takes around 3.5 hours from https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/104580/testReport/history/)
Is it really hard to just set temp dirs differently for different suites?
I think fixing that problem would be a huge amount of work for limited payoff in most cases because most test suites are short-running.
Can a suite run suites in scalatest? and parallelize suites that way?
Do you mean run all test suite in parallel? We can enable parallelExecution in Test (http://www.scalatest.org/user_guide/using_scalatest_with_sbt). But we still face the problem of colliding warehouse paths.
|
Test build #104591 has finished for PR 24373 at commit
|
|
Test build #104594 has finished for PR 24373 at commit
|
|
@srowen SBT tests are being used for PR tests. Let us improve the SBT tests first and at least it can speed up the development speeds. |
|
Yeah I think that's a good argument. it's a bit of a hack but not bad. |
|
I just triggered 6 tests. Let us see whether the tests become more flaky, or it could introduce new flaky tests. |
|
Test build #4712 has finished for PR 24373 at commit
|
|
Test build #4710 has finished for PR 24373 at commit
|
|
Test build #4708 has finished for PR 24373 at commit
|
|
Test build #4709 has finished for PR 24373 at commit
|
|
Test build #4711 has finished for PR 24373 at commit
|
|
Test build #4707 has finished for PR 24373 at commit
|
|
Test build #104609 has finished for PR 24373 at commit
|
|
Test build #4713 has finished for PR 24373 at commit
|
|
Test build #4714 has finished for PR 24373 at commit
|
|
Test build #4715 has finished for PR 24373 at commit
|
|
retest this please. |
2 similar comments
|
retest this please. |
|
retest this please. |
|
Test build #104615 has finished for PR 24373 at commit
|
|
retest this please |
|
Test build #104616 has finished for PR 24373 at commit
|
|
Test build #4731 has finished for PR 24373 at commit
|
|
Test build #4735 has finished for PR 24373 at commit
|
|
Test build #104670 has finished for PR 24373 at commit
|
|
Test build #4734 has finished for PR 24373 at commit
|
|
Test build #4736 has finished for PR 24373 at commit
|
|
Test build #4737 has finished for PR 24373 at commit
|
|
+1 Looks worth trying. |
|
Test build #4740 has finished for PR 24373 at commit
|
|
Test build #104690 has finished for PR 24373 at commit
|
|
Test build #4739 has finished for PR 24373 at commit
|
|
Test build #4742 has finished for PR 24373 at commit
|
|
Test build #4741 has finished for PR 24373 at commit
|
|
thanks, merging to master! Let's see how the PR build works. |
|
Oops, I triggered the merge script at 70874b7 , and then got stuck by network until now. @gengliangwang can you send a new PR for the commits after 70874b7? |
|
Test build #104699 has finished for PR 24373 at commit
|
Sure, I have created #24404 . |
|
Test build #4744 has finished for PR 24373 at commit
|
|
Test build #4743 has finished for PR 24373 at commit
|
|
Test build #4746 has finished for PR 24373 at commit
|
|
Test build #4747 has finished for PR 24373 at commit
|
|
Test build #4745 has finished for PR 24373 at commit
|
|
Hi, All. |
…rked JVMs for higher parallelism ## What changes were proposed in this pull request? This is a backport of #24373 , #24404 and #24434 This patch modifies SparkBuild so that the largest / slowest test suites (or collections of suites) can run in their own forked JVMs, allowing them to be run in parallel with each other. This opt-in / whitelisting approach allows us to increase parallelism without having to fix a long-tail of flakiness / brittleness issues in tests which aren't performance bottlenecks. See comments in SparkBuild.scala for information on the details, including a summary of why we sometimes opt to run entire groups of tests in a single forked JVM . The time of full new pull request test in Jenkins is reduced by around 53%: before changes: 4hr 40min after changes: 2hr 13min ## How was this patch tested? Unit test Closes #25861 from dongjoon-hyun/SPARK-27460. Lead-authored-by: Gengliang Wang <[email protected]> Co-authored-by: gatorsmile <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
This patch modifies SparkBuild so that the largest / slowest test suites (or collections of suites) can run in their own forked JVMs, allowing them to be run in parallel with each other. This opt-in / whitelisting approach allows us to increase parallelism without having to fix a long-tail of flakiness / brittleness issues in tests which aren't performance bottlenecks.
See comments in SparkBuild.scala for information on the details, including a summary of why we sometimes opt to run entire groups of tests in a single forked JVM .
The time of full new pull request test in Jenkins is reduced by around 53%:
before changes: 4hr 40min
after changes: 2hr 13min
How was this patch tested?
Unit test