-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-27460][TESTS] Running slowest test suites in their own forked JVMs for higher parallelism #24373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
[SPARK-27460][TESTS] Running slowest test suites in their own forked JVMs for higher parallelism #24373
Changes from all commits
Commits
Show all changes
18 commits
Select commit
Hold shift + click to select a range
2600785
parallelize tests
gengliangwang 2e2c8f1
try increasing the number of concurrent test groups
gengliangwang fded60e
use 8 concurrentRestrictions again; parallize more test suites
gengliangwang 475c8ef
try reducing ForkedTestGroup
gengliangwang d2ac3af
remove some test suites
gengliangwang ef928e2
revise
gengliangwang 4087ac1
do not register ShutdownDeleteDir hook in withTempDir
gengliangwang 8775274
Revert "do not register ShutdownDeleteDir hook in withTempDir"
gengliangwang 0969508
try not to delete temp dir
gengliangwang d40e7da
Revert "try not to delete temp dir"
gengliangwang 22616e0
delete temp dir
gengliangwang 03ec1a2
Revert "delete temp dir"
gengliangwang 025f4b7
revise regex
gengliangwang 5047e5a
try adding HiveExternalCatalogSuite to whilte list
gengliangwang 920433e
revise WAREHOUSE_PATH
gengliangwang 57e18c2
fix
gengliangwang 70874b7
add org.apache.spark.sql.hive.StatisticsSuite
gengliangwang 4b61d0a
add HiveClientVersions
gengliangwang File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -430,6 +430,84 @@ object SparkBuild extends PomBuild { | |
| else x.settings(Seq[Setting[_]](): _*) | ||
| } ++ Seq[Project](OldDeps.project) | ||
| } | ||
|
|
||
| if (!sys.env.contains("SERIAL_SBT_TESTS")) { | ||
| allProjects.foreach(enable(SparkParallelTestGrouping.settings)) | ||
| } | ||
| } | ||
|
|
||
| object SparkParallelTestGrouping { | ||
| // Settings for parallelizing tests. The basic strategy here is to run the slowest suites (or | ||
| // collections of suites) in their own forked JVMs, allowing us to gain parallelism within a | ||
| // SBT project. Here, we take a whitelisting approach where the default behavior is to run all | ||
| // tests sequentially in a single JVM, requiring us to manually opt-in to the extra parallelism. | ||
| // | ||
| // There are a reasons why such a whitelist approach is good: | ||
| // | ||
| // 1. Launching one JVM per suite adds significant overhead for short-running suites. In | ||
| // addition to JVM startup time and JIT warmup, it appears that initialization of Derby | ||
| // metastores can be very slow so creating a fresh warehouse per suite is inefficient. | ||
| // | ||
| // 2. When parallelizing within a project we need to give each forked JVM a different tmpdir | ||
| // so that the metastore warehouses do not collide. Unfortunately, it seems that there are | ||
| // some tests which have an overly tight dependency on the default tmpdir, so those fragile | ||
| // tests need to continue re-running in the default configuration (or need to be rewritten). | ||
| // Fixing that problem would be a huge amount of work for limited payoff in most cases | ||
| // because most test suites are short-running. | ||
| // | ||
|
|
||
| private val testsWhichShouldRunInTheirOwnDedicatedJvm = Set( | ||
| "org.apache.spark.DistributedSuite", | ||
| "org.apache.spark.sql.catalyst.expressions.DateExpressionsSuite", | ||
| "org.apache.spark.sql.catalyst.expressions.HashExpressionsSuite", | ||
| "org.apache.spark.sql.catalyst.expressions.CastSuite", | ||
| "org.apache.spark.sql.catalyst.expressions.MathExpressionsSuite", | ||
| "org.apache.spark.sql.hive.HiveExternalCatalogSuite", | ||
| "org.apache.spark.sql.hive.StatisticsSuite", | ||
| "org.apache.spark.sql.hive.execution.HiveCompatibilitySuite", | ||
| "org.apache.spark.sql.hive.client.VersionsSuite", | ||
| "org.apache.spark.sql.hive.client.HiveClientVersions", | ||
| "org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite", | ||
| "org.apache.spark.ml.classification.LogisticRegressionSuite", | ||
| "org.apache.spark.ml.classification.LinearSVCSuite", | ||
| "org.apache.spark.sql.SQLQueryTestSuite" | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. how do we come up with this list?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Test suites that:
This is a preliminary list. |
||
| ) | ||
|
|
||
| private val DEFAULT_TEST_GROUP = "default_test_group" | ||
|
|
||
| private def testNameToTestGroup(name: String): String = name match { | ||
| case _ if testsWhichShouldRunInTheirOwnDedicatedJvm.contains(name) => name | ||
| case _ => DEFAULT_TEST_GROUP | ||
| } | ||
|
|
||
| lazy val settings = Seq( | ||
| testGrouping in Test := { | ||
| val tests: Seq[TestDefinition] = (definedTests in Test).value | ||
| val defaultForkOptions = ForkOptions( | ||
| bootJars = Nil, | ||
| javaHome = javaHome.value, | ||
| connectInput = connectInput.value, | ||
| outputStrategy = outputStrategy.value, | ||
| runJVMOptions = (javaOptions in Test).value, | ||
| workingDirectory = Some(baseDirectory.value), | ||
| envVars = (envVars in Test).value | ||
| ) | ||
| tests.groupBy(test => testNameToTestGroup(test.name)).map { case (groupName, groupTests) => | ||
| val forkOptions = { | ||
| if (groupName == DEFAULT_TEST_GROUP) { | ||
| defaultForkOptions | ||
| } else { | ||
| defaultForkOptions.copy(runJVMOptions = defaultForkOptions.runJVMOptions ++ | ||
| Seq(s"-Djava.io.tmpdir=${baseDirectory.value}/target/tmp/$groupName")) | ||
| } | ||
| } | ||
| new Tests.Group( | ||
| name = groupName, | ||
| tests = groupTests, | ||
| runPolicy = Tests.SubProcess(forkOptions)) | ||
| } | ||
| }.toSeq | ||
| ) | ||
| } | ||
|
|
||
| object Core { | ||
|
|
@@ -910,8 +988,14 @@ object TestSettings { | |
| testOptions in Test += Tests.Argument(TestFrameworks.JUnit, "-v", "-a"), | ||
| // Enable Junit testing. | ||
| libraryDependencies += "com.novocode" % "junit-interface" % "0.11" % "test", | ||
| // Only allow one test at a time, even across projects, since they run in the same JVM | ||
| parallelExecution in Test := false, | ||
| // `parallelExecutionInTest` controls whether test suites belonging to the same SBT project | ||
| // can run in parallel with one another. It does NOT control whether tests execute in parallel | ||
| // within the same JVM (which is controlled by `testForkedParallel`) or whether test cases | ||
| // within the same suite can run in parallel (which is a ScalaTest runner option which is passed | ||
| // to the underlying runner but is not a SBT-level configuration). This needs to be `true` in | ||
| // order for the extra parallelism enabled by `SparkParallelTestGrouping` to take effect. | ||
| // The `SERIAL_SBT_TESTS` check is here so the extra parallelism can be feature-flagged. | ||
| parallelExecution in Test := { if (sys.env.contains("SERIAL_SBT_TESTS")) false else true }, | ||
| // Make sure the test temp directory exists. | ||
| resourceGenerators in Test += Def.macroValueI(resourceManaged in Test map { outDir: File => | ||
| var dir = new File(testTempDir) | ||
|
|
@@ -932,7 +1016,12 @@ object TestSettings { | |
| } | ||
| Seq.empty[File] | ||
| }).value, | ||
| concurrentRestrictions in Global += Tags.limit(Tags.Test, 1) | ||
| concurrentRestrictions in Global := { | ||
| // The number of concurrent test groups is empirically chosen based on experience | ||
| // with Jenkins flakiness. | ||
| if (sys.env.contains("SERIAL_SBT_TESTS")) (concurrentRestrictions in Global).value | ||
| else Seq(Tags.limit(Tags.ForkedTestGroup, 4)) | ||
| } | ||
| ) | ||
|
|
||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@srowen It is hard to successfully run test cases in parallel. See the comment here. E.g, it possible that multiple test cases use the same table location
spark-warehouse/tfor a temporary table.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get it, but this won't help the Maven build and is kind of brittle. Is it really hard to just set temp dirs differently for different suites?
Can a suite run suites in scalatest? and parallelize suites that way?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the Jenkins log, I can see that we are using SBT for test PRs. And I can see that the test time of this PR is about 106 minutes from the email notification, which is much better now (before changes it takes around 3.5 hours from https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/104580/testReport/history/)
I think fixing that problem would be a huge amount of work for limited payoff in most cases because most test suites are short-running.
Do you mean run all test suite in parallel? We can enable
parallelExecution in Test(http://www.scalatest.org/user_guide/using_scalatest_with_sbt). But we still face the problem of colliding warehouse paths.