- 
                Notifications
    You must be signed in to change notification settings 
- Fork 28.9k
[SPARK-36682][CORE][TEST] Add Hadoop sequence file test for different Hadoop codecs #33924
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| Seq((new DefaultCodec(), "default"), (new BZip2Codec(), "bzip2"), (new GzipCodec(), "gzip"), | ||
| (new Lz4Codec(), "lz4"), (new SnappyCodec, "snappy")).foreach { case (codec, codecName) => | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lz4 codecs currently fails due to SPARK-36669.
snappy codec currently fails due to SPARK-36681.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for filing JIRAs.
| Kubernetes integration test starting | 
| Kubernetes integration test status failure | 
| Test build #143040 has finished for PR 33924 at commit  
 | 
| We have dedicated JIRAs for snappy and lz4 issues. We already verified the issues in above CI test, going to remove them first to make test passed. | 
| Kubernetes integration test starting | 
| cc @sunchao @cloud-fan @dbtsai too | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
| Kubernetes integration test status failure | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| Test build #143054 has finished for PR 33924 at commit  
 | 
| } | ||
| } | ||
|  | ||
| // Hadoop "gzip" codec doesn't support sequence file yet. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hadoop "gzip" codec requires native library installed for gzip compressed seq file.
| } | ||
|  | ||
| // Hadoop "gzip" codec doesn't support sequence file yet. | ||
| // Hadoop "zstd" codec needs native library installed. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
zstd is the same as gzip before your gzip codec is released in hadoop. Maybe just say
Hadoop "gzip" and "zstd" codecs requires native library installed for sequence files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Revised as suggested.
| The last commit is comment doc only. Thanks for reviewing. Merging to master/3.2. | 
… Hadoop codecs ### What changes were proposed in this pull request? This patch proposes to add e2e tests for using Hadoop codecs to write sequence files. ### Why are the changes needed? To improve test coverage. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added tests. Closes #33924 from viirya/hadoop-seq-test. Authored-by: Liang-Chi Hsieh <[email protected]> Signed-off-by: Liang-Chi Hsieh <[email protected]> (cherry picked from commit 6745d77) Signed-off-by: Liang-Chi Hsieh <[email protected]>
| +1, late LGTM. | 
| LGTM2 | 
| late LGTM | 
… Hadoop codecs ### What changes were proposed in this pull request? This patch proposes to add e2e tests for using Hadoop codecs to write sequence files. ### Why are the changes needed? To improve test coverage. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added tests. Closes apache#33924 from viirya/hadoop-seq-test. Authored-by: Liang-Chi Hsieh <[email protected]> Signed-off-by: Liang-Chi Hsieh <[email protected]> (cherry picked from commit 6745d77) Signed-off-by: Liang-Chi Hsieh <[email protected]>
What changes were proposed in this pull request?
This patch proposes to add e2e tests for using Hadoop codecs to write sequence files.
Why are the changes needed?
To improve test coverage.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Added tests.