-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-25630][TEST] Reduce test time of HadoopFsRelationTest #22643
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-25630][TEST] Reduce test time of HadoopFsRelationTest #22643
Conversation
|
Test build #97000 has finished for PR 22643 at commit
|
| } else { | ||
| "" | ||
| } | ||
| test(s"test all data types - $dataType$extraMessage") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR accidentally seems to disable parquet.enable.dictionary = true cases even in ParquetHadoopFsRelationSuite. Could you fix that? After fixing that, we need to measure the time redunction again.
[info] ParquetHadoopFsRelationSuite:
[info] - test all data types - StringType (830 milliseconds)
...| // more cores, the issue can be reproduced steadily. Fortunately our Jenkins builder meets this | ||
| // requirement. We probably want to move this test case to spark-integration-tests or spark-perf | ||
| // later. | ||
| test("SPARK-8406: Avoids name collision while writing files") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just move this to ParquetHadoopFsRelationSuite.scala
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
|
Test build #97052 has finished for PR 22643 at commit
|
|
retest this please. |
|
Test build #97068 has finished for PR 22643 at commit
|
|
@dongjoon-hyun please take another look, thanks! |
|
LGTM Thanks! Merged to master. |
## What changes were proposed in this pull request? There was 5 suites extends `HadoopFsRelationTest`, for testing "orc"/"parquet"/"text"/"json" data sources. This PR refactor the base trait `HadoopFsRelationTest`: 1. Rename unnecessary loop for setting parquet conf 2. The test case `SPARK-8406: Avoids name collision while writing files` takes about 14 to 20 seconds. As now all the file format data source are using common code, for creating result files, we can test one data source(Parquet) only to reduce test time. To run related 5 suites: ``` ./build/sbt "hive/testOnly *HadoopFsRelationSuite" ``` The total test run time is reduced from 5 minutes 40 seconds to 3 minutes 50 seconds. ## How was this patch tested? Unit test Closes apache#22643 from gengliangwang/refactorHadoopFsRelationTest. Authored-by: Gengliang Wang <[email protected]> Signed-off-by: gatorsmile <[email protected]>
What changes were proposed in this pull request?
There was 5 suites extends
HadoopFsRelationTest, for testing "orc"/"parquet"/"text"/"json" data sources.This PR refactor the base trait
HadoopFsRelationTest:SPARK-8406: Avoids name collision while writing filestakes about 14 to 20 seconds. As now all the file format data source are using common code, for creating result files, we can test one data source(Parquet) only to reduce test time.To run related 5 suites:
The total test run time is reduced from 5 minutes 40 seconds to 3 minutes 50 seconds.
How was this patch tested?
Unit test