Skip to content

Conversation

@gengliangwang
Copy link
Member

@gengliangwang gengliangwang commented Oct 5, 2018

What changes were proposed in this pull request?

There was 5 suites extends HadoopFsRelationTest, for testing "orc"/"parquet"/"text"/"json" data sources.
This PR refactor the base trait HadoopFsRelationTest:

  1. Rename unnecessary loop for setting parquet conf
  2. The test case SPARK-8406: Avoids name collision while writing files takes about 14 to 20 seconds. As now all the file format data source are using common code, for creating result files, we can test one data source(Parquet) only to reduce test time.

To run related 5 suites:

./build/sbt "hive/testOnly *HadoopFsRelationSuite"

The total test run time is reduced from 5 minutes 40 seconds to 3 minutes 50 seconds.

How was this patch tested?

Unit test

@SparkQA
Copy link

SparkQA commented Oct 5, 2018

Test build #97000 has finished for PR 22643 at commit 9a74db0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

} else {
""
}
test(s"test all data types - $dataType$extraMessage") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR accidentally seems to disable parquet.enable.dictionary = true cases even in ParquetHadoopFsRelationSuite. Could you fix that? After fixing that, we need to measure the time redunction again.

[info] ParquetHadoopFsRelationSuite:
[info] - test all data types - StringType (830 milliseconds)
...

// more cores, the issue can be reproduced steadily. Fortunately our Jenkins builder meets this
// requirement. We probably want to move this test case to spark-integration-tests or spark-perf
// later.
test("SPARK-8406: Avoids name collision while writing files") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just move this to ParquetHadoopFsRelationSuite.scala

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@SparkQA
Copy link

SparkQA commented Oct 6, 2018

Test build #97052 has finished for PR 22643 at commit 59ca9e0.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gengliangwang
Copy link
Member Author

retest this please.

@SparkQA
Copy link

SparkQA commented Oct 6, 2018

Test build #97068 has finished for PR 22643 at commit 59ca9e0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gengliangwang
Copy link
Member Author

@dongjoon-hyun please take another look, thanks!

@gatorsmile
Copy link
Member

LGTM

Thanks! Merged to master.

@asfgit asfgit closed this in 6a60fb0 Oct 8, 2018
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
## What changes were proposed in this pull request?
There was 5 suites extends `HadoopFsRelationTest`,  for testing "orc"/"parquet"/"text"/"json" data sources.
This PR refactor the base trait `HadoopFsRelationTest`:
1. Rename unnecessary loop for setting parquet conf
2. The test case `SPARK-8406: Avoids name collision while writing files` takes about 14 to 20 seconds. As now all the file format data source are using common code, for creating result files, we can test one data source(Parquet) only to reduce test time.

To run related 5 suites:
```
./build/sbt "hive/testOnly *HadoopFsRelationSuite"
```
The total test run time is reduced from 5 minutes 40 seconds to 3 minutes 50 seconds.

## How was this patch tested?

Unit test

Closes apache#22643 from gengliangwang/refactorHadoopFsRelationTest.

Authored-by: Gengliang Wang <[email protected]>
Signed-off-by: gatorsmile <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants