-
Couldn't load subscription status.
- Fork 9.1k
HADOOP-17292. Using lz4-java in Lz4Codec #2350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
||
| // Compress data | ||
| n = useLz4HC ? compressBytesDirectHC() : compressBytesDirect(); | ||
| n = compressBytesDirect(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
compressDirectBuf?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
| if (Lz4Codec.isNativeCodeLoaded()) { | ||
| conf.setBoolean( | ||
| conf.setBoolean( | ||
| CommonConfigurationKeys.IO_COMPRESSION_CODEC_LZ4_USELZ4HC_KEY, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indentation as you remove the { }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
|
|
||
| if (compressor.getClass().isAssignableFrom(Lz4Compressor.class) | ||
| && (NativeCodeLoader.isNativeCodeLoaded())) | ||
| if (compressor.getClass().isAssignableFrom(Lz4Compressor.class)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a compatibility test like snappy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Will add it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added compatibility test.
10b027d to
4aefff4
Compare
|
cc @steveloughran this is almost identical to #2297 Thanks! |
|
All tests are passed now. https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2350/7/testReport/ |
|
@viirya can you rebase master? |
|
Rebased. Thanks. |
|
Gently ping @steveloughran This is almost identical to SnappyCodec one you merged. Could you help to review it? Thanks. |
|
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems more familiar now
- will need extra docs (where?) to cover the change
- this may be time for that hadoop-compression library which, even if the codec stays in hadoop-common, can declare the new JAR (and snappy java) as explicit dependencies. That way things including mapreduce client can just ask for hadoop-compression and get all the latest set of JARs as well as codecs, and critically, get the JAR versions consistent with what hadoop was built with
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put into the same import block as org.sjf4j
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are already in same block, no?
import net.jpountz.lz4.LZ4Factory;
import net.jpountz.lz4.LZ4Compressor;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.compress.Compressor;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
final?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks. added final.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, not in same import block as org.apache
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already in same block?
import net.jpountz.lz4.LZ4Factory;
import net.jpountz.lz4.LZ4SafeDecompressor;
import org.apache.hadoop.io.compress.Decompressor;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@viirya I think steveloughran is mentioning about the order of imports and blank line between import blocks.
https://github.com/steveloughran/formality/blob/master/styleguide/styleguide.md#imports
While the original file violates the coding stantdard and we usually avoid large diff just for fixing formatting issue, it would be ok to fix a few lines here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, if things are mixed up, best to leave alone -at least for those files which get lots of changes. For something which rarely sees maintenance, you can make a stronger case for cleanup. I do it sometimes, but as I also get to field cherrypick merge pain, I don't go wild on it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shame the constructor can't throw exceptions direct.
Catch only the exception's known to be raised, and don't convert RTEs or, especially, Errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I tried to explicitly catch ClassNotFoundException, but the java compiler complains...
exception java.lang.ClassNotFoundException is never thrown in body of corresponding try statement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LZ4Factory seems to throw java.lang.AssertionError after catching unrecoverable exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, so what's best here?
- Catch AssertionError and wrap
- Catch throwable, but with something ahead of it which will catch and rethrow Error without wrapping. Because we shouldn't really be wrapping those high-priority problems that aren't generally things to ignore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me just catch AssertionError and wrap it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add +t to the end of the string, so the specific error text isn't lost
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks. added it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The library also allow configuring the compression level, which perhaps we can add a Hadoop option to enable that later. This just use the default compression level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, sounds good. We can add it later if it is necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: seems we no longer need the field useLz4HC with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems some of the methods in this class look exactly the same as in SnappyCompressor - perhaps we can do some refactoring later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, currently this PR focuses on turning to use lz4-java. We can refactor compressor/decompressor classes later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this isn't necessary since it's called right before the call site?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove the clear and limit calls at the call site.
// Re-initialize the lz4's output direct-buffer
compressedDirectBuf.clear();
compressedDirectBuf.limit(0);But we cannot remove clear in compressDirectBuf method.
First, lz4-java cannot accept the compressedDirectBuf if it is called with limit(0).
[ERROR] Failures:
[ERROR] TestCompressorDecompressor.testCompressorDecompressor:69 Expected to find 'testCompressorDecompressor error !!!' but got unexpected exception: net.jpountz.lz4.LZ4Exception: maxDestLen is too small
at net.jpountz.lz4.LZ4JNICompressor.compress(LZ4JNICompressor.java:69)
at net.jpountz.lz4.LZ4Compressor.compress(LZ4Compressor.java:158)
at org.apache.hadoop.io.compress.lz4.Lz4Compressor.compressDirectBuf(Lz4Compressor.java:310)
at org.apache.hadoop.io.compress.lz4.Lz4Compressor.compress(Lz4Compressor.java:237)
at org.apache.hadoop.io.compress.CompressDecompressTester$CompressionTestStrategy$2.assertCompression(CompressDecompressTester.java:286)
at org.apache.hadoop.io.compress.CompressDecompressTester.test(CompressDecompressTester.java:114)
at org.apache.hadoop.io.compress.TestCompressorDecompressor.testCompressorDecompressor(TestCompressorDecompressor.java:66)
Second, even we remove limit in the call site, one test still failed:
[ERROR] testCompressorDecompressor(org.apache.hadoop.io.compress.TestCompressorDecompressor) Time elapsed: 0.539 s <<< FAILURE!
java.lang.AssertionError: org.apache.hadoop.io.compress.lz4.Lz4Compressor_org.apache.hadoop.io.compress.lz4.Lz4Decompressor- empty stream compressed output size != 4 expected:<4> but was:<65796>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:834)
at org.junit.Assert.assertEquals(Assert.java:645)
at org.apache.hadoop.io.compress.CompressDecompressTester$CompressionTestStrategy$3.assertCompression(CompressDecompressTester.java:334)
at org.apache.hadoop.io.compress.CompressDecompressTester.test(CompressDecompressTester.java:114)
at org.apache.hadoop.io.compress.TestCompressorDecompressor.testCompressorDecompressor(TestCompressorDecompressor.java:66)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this will ever happen but it's not a big deal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this comment doesn't add much value - it just state what is exactly being done in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can just call clear?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One test in hadoop-mapreduce-client-nativetask uses Lz4Codec. If we don't add lz4-java as test dependency, because it is provided scope in hadoop-common now, we will get class not found exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: perhaps some comments on this - not quite sure what it is testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, added some comment.
|
@steveloughran Thanks for reviewing this change.
Yea, we will make
Not sure where is good, we will look at proper place to put extra docs. |
|
@sunchao Thanks for review. I addressed your comments, please let me know if you have more comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @viirya . This PR looks good to me. @steveloughran would appreciate if you can take another pass. Thanks.
|
@viirya can you resolve the conflict? Thanks. |
|
@dbtsai Resolved. Thanks. |
|
kindly ping @steveloughran |
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
Another kindly ping @steveloughran . I think @viirya has resolved all your comments so please take a look. Otherwise I'll merge this next week. Thanks. |
|
I got a portability issue on bundled snappy .so of snappy-java in xerial/snappy-java#256. It could be downside that we can not directly control the portability. While I could not find how the bundled lz4 is built yet, it is enough portable since it has no dependency on other library? |
Thanks for asking. The underlying of lz4-java is lz4 (https://lz4.github.io/lz4/), I think it is the same as current Hadoop's lz4 library. So I think it should be as portable as before? BTW, for snappy-java and lz4-java, if no native library for your platform is found, it will fallback pure-java implementation. |
|
Regarding
This |
Yeah. In the snappy-java case, native library was provided but the version on the dependency (libstdc++) did not match. Falling back to pure-java impl did not kick in. I think lz4-java has not such a dependency. |
|
@dbtsai Let me post a patch on HADOOP-17369. |
|
Thanks @iwasakims |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm +1 overall. will wait for comments of @steveloughran some time.
This comment has been minimized.
This comment has been minimized.
|
Gently ping @steveloughran and @sunchao. @viirya can you update the pom again due to merge conflict? Thanks! |
|
LGTM. Only issue is should the catch be for all of Throwable or a subset. I think I'll be happy to go with it as is Two checkstyles to deal with; deprecation warning is notthing to worry about |
|
Thanks @steveloughran! |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
Thanks @steveloughran for the LGTM. The last run looks good and I just merged this into trunk. Thanks @viirya for the contribution and everyone for reviewing! |
|
Thanks all! |
|
Ok. Let's leave to simmer in trunk for the weekend and we can worry about backporting to branch-3.3 |
Contributed by Liang-Chi Hsieh.
Contributed by Liang-Chi Hsieh.
See https://issues.apache.org/jira/browse/HADOOP-17292 for details.
The change of mvn dependencies:
For Apache Hadoop Common:
For Apache Hadoop Kafka Library support:
For Apache Hadoop Tools Dist: