Skip to content

Conversation

@viirya
Copy link
Member

@viirya viirya commented Sep 7, 2021

What changes were proposed in this pull request?

This patch proposes to add e2e tests for using Hadoop codecs to write sequence files.

Why are the changes needed?

To improve test coverage.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Added tests.

Comment on lines 140 to 141
Seq((new DefaultCodec(), "default"), (new BZip2Codec(), "bzip2"), (new GzipCodec(), "gzip"),
(new Lz4Codec(), "lz4"), (new SnappyCodec, "snappy")).foreach { case (codec, codecName) =>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lz4 codecs currently fails due to SPARK-36669.
snappy codec currently fails due to SPARK-36681.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for filing JIRAs.

@SparkQA
Copy link

SparkQA commented Sep 7, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47543/

@SparkQA
Copy link

SparkQA commented Sep 7, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47543/

@SparkQA
Copy link

SparkQA commented Sep 7, 2021

Test build #143040 has finished for PR 33924 at commit a0b115f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Sep 7, 2021

We have dedicated JIRAs for snappy and lz4 issues. We already verified the issues in above CI test, going to remove them first to make test passed.

@SparkQA
Copy link

SparkQA commented Sep 7, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47557/

@viirya
Copy link
Member Author

viirya commented Sep 7, 2021

cc @sunchao @cloud-fan @dbtsai too

Copy link
Member

@dbtsai dbtsai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@SparkQA
Copy link

SparkQA commented Sep 7, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47557/

Copy link
Member

@sunchao sunchao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@SparkQA
Copy link

SparkQA commented Sep 7, 2021

Test build #143054 has finished for PR 33924 at commit a9139b3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

}
}

// Hadoop "gzip" codec doesn't support sequence file yet.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hadoop "gzip" codec requires native library installed for gzip compressed seq file.

}

// Hadoop "gzip" codec doesn't support sequence file yet.
// Hadoop "zstd" codec needs native library installed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zstd is the same as gzip before your gzip codec is released in hadoop. Maybe just say

Hadoop "gzip" and "zstd" codecs requires native library installed for sequence files

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Revised as suggested.

@viirya
Copy link
Member Author

viirya commented Sep 7, 2021

The last commit is comment doc only. Thanks for reviewing. Merging to master/3.2.

@viirya viirya closed this in 6745d77 Sep 7, 2021
viirya added a commit that referenced this pull request Sep 7, 2021
… Hadoop codecs

### What changes were proposed in this pull request?

This patch proposes to add e2e tests for using Hadoop codecs to write sequence files.

### Why are the changes needed?

To improve test coverage.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Added tests.

Closes #33924 from viirya/hadoop-seq-test.

Authored-by: Liang-Chi Hsieh <[email protected]>
Signed-off-by: Liang-Chi Hsieh <[email protected]>
(cherry picked from commit 6745d77)
Signed-off-by: Liang-Chi Hsieh <[email protected]>
@viirya viirya deleted the hadoop-seq-test branch September 7, 2021 20:20
@dongjoon-hyun
Copy link
Member

+1, late LGTM.

@HyukjinKwon
Copy link
Member

LGTM2

@cloud-fan
Copy link
Contributor

late LGTM

flyrain pushed a commit to flyrain/spark that referenced this pull request Sep 21, 2021
… Hadoop codecs

### What changes were proposed in this pull request?

This patch proposes to add e2e tests for using Hadoop codecs to write sequence files.

### Why are the changes needed?

To improve test coverage.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Added tests.

Closes apache#33924 from viirya/hadoop-seq-test.

Authored-by: Liang-Chi Hsieh <[email protected]>
Signed-off-by: Liang-Chi Hsieh <[email protected]>
(cherry picked from commit 6745d77)
Signed-off-by: Liang-Chi Hsieh <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants