[SPARK-25630][TEST] Reduce test time of HadoopFsRelationTest #22643

gengliangwang · 2018-10-05T16:00:17Z

What changes were proposed in this pull request?

There was 5 suites extends HadoopFsRelationTest, for testing "orc"/"parquet"/"text"/"json" data sources.
This PR refactor the base trait HadoopFsRelationTest:

Rename unnecessary loop for setting parquet conf
The test case SPARK-8406: Avoids name collision while writing files takes about 14 to 20 seconds. As now all the file format data source are using common code, for creating result files, we can test one data source(Parquet) only to reduce test time.

To run related 5 suites:

./build/sbt "hive/testOnly *HadoopFsRelationSuite"

The total test run time is reduced from 5 minutes 40 seconds to 3 minutes 50 seconds.

How was this patch tested?

Unit test

SparkQA · 2018-10-05T18:32:39Z

Test build #97000 has finished for PR 22643 at commit 9a74db0.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2018-10-05T19:21:22Z

sql/hive/src/test/scala/org/apache/spark/sql/sources/HadoopFsRelationTest.scala

+      } else {
+        ""
+      }
+      test(s"test all data types - $dataType$extraMessage") {


This PR accidentally seems to disable parquet.enable.dictionary = true cases even in ParquetHadoopFsRelationSuite. Could you fix that? After fixing that, we need to measure the time redunction again.

https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97000/consoleFull

[info] ParquetHadoopFsRelationSuite: [info] - test all data types - StringType (830 milliseconds) ...

gatorsmile · 2018-10-05T21:46:12Z

sql/hive/src/test/scala/org/apache/spark/sql/sources/HadoopFsRelationTest.scala

  // more cores, the issue can be reproduced steadily.  Fortunately our Jenkins builder meets this
  // requirement.  We probably want to move this test case to spark-integration-tests or spark-perf
  // later.
  test("SPARK-8406: Avoids name collision while writing files") {


Just move this to ParquetHadoopFsRelationSuite.scala

SparkQA · 2018-10-06T16:47:07Z

Test build #97052 has finished for PR 22643 at commit 59ca9e0.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

gengliangwang · 2018-10-06T19:10:30Z

retest this please.

SparkQA · 2018-10-06T22:01:04Z

Test build #97068 has finished for PR 22643 at commit 59ca9e0.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gengliangwang · 2018-10-08T03:43:37Z

@dongjoon-hyun please take another look, thanks!

gatorsmile · 2018-10-08T20:04:33Z

LGTM

Thanks! Merged to master.

## What changes were proposed in this pull request? There was 5 suites extends `HadoopFsRelationTest`, for testing "orc"/"parquet"/"text"/"json" data sources. This PR refactor the base trait `HadoopFsRelationTest`: 1. Rename unnecessary loop for setting parquet conf 2. The test case `SPARK-8406: Avoids name collision while writing files` takes about 14 to 20 seconds. As now all the file format data source are using common code, for creating result files, we can test one data source(Parquet) only to reduce test time. To run related 5 suites: ``` ./build/sbt "hive/testOnly *HadoopFsRelationSuite" ``` The total test run time is reduced from 5 minutes 40 seconds to 3 minutes 50 seconds. ## How was this patch tested? Unit test Closes apache#22643 from gengliangwang/refactorHadoopFsRelationTest. Authored-by: Gengliang Wang <[email protected]> Signed-off-by: gatorsmile <[email protected]>

refactor HadoopFsRelationTest

9a74db0

dongjoon-hyun reviewed Oct 5, 2018

View reviewed changes

gatorsmile reviewed Oct 5, 2018

View reviewed changes

address comments

59ca9e0

asfgit closed this in 6a60fb0 Oct 8, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-25630][TEST] Reduce test time of HadoopFsRelationTest #22643

[SPARK-25630][TEST] Reduce test time of HadoopFsRelationTest #22643

Uh oh!

gengliangwang commented Oct 5, 2018 •

edited

Loading

Uh oh!

SparkQA commented Oct 5, 2018

Uh oh!

dongjoon-hyun Oct 5, 2018

Uh oh!

gatorsmile Oct 5, 2018

Uh oh!

dongjoon-hyun Oct 5, 2018

Uh oh!

SparkQA commented Oct 6, 2018

Uh oh!

gengliangwang commented Oct 6, 2018

Uh oh!

SparkQA commented Oct 6, 2018

Uh oh!

gengliangwang commented Oct 8, 2018

Uh oh!

gatorsmile commented Oct 8, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SPARK-25630][TEST] Reduce test time of HadoopFsRelationTest #22643

[SPARK-25630][TEST] Reduce test time of HadoopFsRelationTest #22643

Uh oh!

Conversation

gengliangwang commented Oct 5, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Oct 5, 2018

Uh oh!

dongjoon-hyun Oct 5, 2018

Choose a reason for hiding this comment

Uh oh!

gatorsmile Oct 5, 2018

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Oct 5, 2018

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Oct 6, 2018

Uh oh!

gengliangwang commented Oct 6, 2018

Uh oh!

SparkQA commented Oct 6, 2018

Uh oh!

gengliangwang commented Oct 8, 2018

Uh oh!

gatorsmile commented Oct 8, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

gengliangwang commented Oct 5, 2018 •

edited

Loading