[SPARK-36682][CORE][TEST] Add Hadoop sequence file test for different Hadoop codecs #33924

viirya · 2021-09-07T07:45:19Z

What changes were proposed in this pull request?

This patch proposes to add e2e tests for using Hadoop codecs to write sequence files.

Why are the changes needed?

To improve test coverage.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Added tests.

viirya · 2021-09-07T07:47:11Z

core/src/test/scala/org/apache/spark/FileSuite.scala

+  Seq((new DefaultCodec(), "default"), (new BZip2Codec(), "bzip2"), (new GzipCodec(), "gzip"),
+    (new Lz4Codec(), "lz4"), (new SnappyCodec, "snappy")).foreach { case (codec, codecName) =>


lz4 codecs currently fails due to SPARK-36669.
snappy codec currently fails due to SPARK-36681.

Thank you for filing JIRAs.

SparkQA · 2021-09-07T08:31:42Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47543/

SparkQA · 2021-09-07T09:29:03Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47543/

SparkQA · 2021-09-07T09:41:49Z

Test build #143040 has finished for PR 33924 at commit a0b115f.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2021-09-07T17:25:20Z

We have dedicated JIRAs for snappy and lz4 issues. We already verified the issues in above CI test, going to remove them first to make test passed.

SparkQA · 2021-09-07T18:13:12Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47557/

viirya · 2021-09-07T18:16:01Z

cc @sunchao @cloud-fan @dbtsai too

dbtsai

LGTM. Thanks!

SparkQA · 2021-09-07T18:22:10Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47557/

sunchao

LGTM

SparkQA · 2021-09-07T19:56:54Z

Test build #143054 has finished for PR 33924 at commit a9139b3.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dbtsai · 2021-09-07T20:03:34Z

core/src/test/scala/org/apache/spark/FileSuite.scala

+    }
+  }
+
+  // Hadoop "gzip" codec doesn't support sequence file yet.


Hadoop "gzip" codec requires native library installed for gzip compressed seq file.

dbtsai · 2021-09-07T20:05:12Z

core/src/test/scala/org/apache/spark/FileSuite.scala

+  }
+
+  // Hadoop "gzip" codec doesn't support sequence file yet.
+  // Hadoop "zstd" codec needs native library installed.


zstd is the same as gzip before your gzip codec is released in hadoop. Maybe just say

Hadoop "gzip" and "zstd" codecs requires native library installed for sequence files

Thanks. Revised as suggested.

viirya · 2021-09-07T20:19:01Z

The last commit is comment doc only. Thanks for reviewing. Merging to master/3.2.

… Hadoop codecs ### What changes were proposed in this pull request? This patch proposes to add e2e tests for using Hadoop codecs to write sequence files. ### Why are the changes needed? To improve test coverage. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added tests. Closes #33924 from viirya/hadoop-seq-test. Authored-by: Liang-Chi Hsieh <[email protected]> Signed-off-by: Liang-Chi Hsieh <[email protected]> (cherry picked from commit 6745d77) Signed-off-by: Liang-Chi Hsieh <[email protected]>

dongjoon-hyun · 2021-09-07T20:24:22Z

+1, late LGTM.

HyukjinKwon · 2021-09-08T01:19:14Z

LGTM2

cloud-fan · 2021-09-08T02:49:54Z

late LGTM

… Hadoop codecs ### What changes were proposed in this pull request? This patch proposes to add e2e tests for using Hadoop codecs to write sequence files. ### Why are the changes needed? To improve test coverage. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added tests. Closes apache#33924 from viirya/hadoop-seq-test. Authored-by: Liang-Chi Hsieh <[email protected]> Signed-off-by: Liang-Chi Hsieh <[email protected]> (cherry picked from commit 6745d77) Signed-off-by: Liang-Chi Hsieh <[email protected]>

Add Hadoop sequence test for different codecs.

a0b115f

github-actions bot added the CORE label Sep 7, 2021

viirya mentioned this pull request Sep 7, 2021

[SPARK-36670][SQL][TEST] Add FileSourceCodecSuite #33912

Closed

viirya commented Sep 7, 2021

View reviewed changes

Remove "snappy" and "lz4" which cannot work for now.

a9139b3

dbtsai requested review from dbtsai and dongjoon-hyun September 7, 2021 18:19

dbtsai approved these changes Sep 7, 2021

View reviewed changes

sunchao approved these changes Sep 7, 2021

View reviewed changes

dbtsai reviewed Sep 7, 2021

View reviewed changes

Revise comment.

4aee74c

viirya closed this in 6745d77 Sep 7, 2021

viirya deleted the hadoop-seq-test branch September 7, 2021 20:20

		Seq((new DefaultCodec(), "default"), (new BZip2Codec(), "bzip2"), (new GzipCodec(), "gzip"),
		(new Lz4Codec(), "lz4"), (new SnappyCodec, "snappy")).foreach { case (codec, codecName) =>

[SPARK-36682][CORE][TEST] Add Hadoop sequence file test for different Hadoop codecs #33924

[SPARK-36682][CORE][TEST] Add Hadoop sequence file test for different Hadoop codecs #33924

Uh oh!

Conversation

viirya commented Sep 7, 2021

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

viirya Sep 7, 2021

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Sep 7, 2021

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Sep 7, 2021

Uh oh!

SparkQA commented Sep 7, 2021

Uh oh!

SparkQA commented Sep 7, 2021

Uh oh!

viirya commented Sep 7, 2021

Uh oh!

SparkQA commented Sep 7, 2021

Uh oh!

viirya commented Sep 7, 2021

Uh oh!

dbtsai left a comment

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Sep 7, 2021

Uh oh!

sunchao left a comment

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Sep 7, 2021

Uh oh!

dbtsai Sep 7, 2021

Choose a reason for hiding this comment

Uh oh!

dbtsai Sep 7, 2021

Choose a reason for hiding this comment

Uh oh!

viirya Sep 7, 2021

Choose a reason for hiding this comment

Uh oh!

viirya commented Sep 7, 2021

Uh oh!

dongjoon-hyun commented Sep 7, 2021

Uh oh!

HyukjinKwon commented Sep 8, 2021

Uh oh!

cloud-fan commented Sep 8, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants