-
Notifications
You must be signed in to change notification settings - Fork 9.1k
HDFS-3246: pRead equivalent for direct read path #597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
💔 -1 overall
This message was automatically generated. |
|
Rebased patch, fixed remaining cc issues. Not sure why all those tests failed, they all pass locally. Let's see if Hadoop QA hits them again. |
|
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we catch the ClassCastException to check whether the inpustream has implemented an ByteBufferPositionedReadable interface ? I don't think it's a good way, because the catch{...} block need an full stack and not so efficient .
How about use the following :
if(in instanceof ByteBufferPositionedReadable){
//...
}else{
throw new UnsupportedOperationException(...)
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, its probably not the most efficient way to do things, but thats how all other methods handle the same thing: PositionedReadable, Seekable, etc. + the exception handling code wouldn't be on the hot path. So in this case I would prefer to keep the code consistent with the rest of the class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hrm. I think we should probably do a follow-up JIRA to fix this, not for performance reasons, but because the try{...} block encompasses a lot of code. Let's say we accidentally screw up something in our encryption config and we get a ClassCastException somewhere inside decrypt. We'll swallow the real exception and claim that positioned read isn't supported, which isn't quite right.
So, I agree an instanceof check up front is probably the clearest from a code perspective and also avoids the above issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add an instance check; all the FSDataInputStream classes do that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, makes sense will do. Any objections if I just fix this here rather than filing a separate JIRA? The changes are pretty small / it would probably be less effort than creating, reviewing, and merging a separate patch just for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The limit should be the same as the start + len + Math.min(n - len, localInBuffer.remaining()) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well the invariant you want to preserve is that buf.put(inputBuf) should only be called with the original value of buf.limit() so that you don't exceed the given limit. Using start + len + Math.min(n - len, localInBuffer.remaining()) as the limit could violate this if n + start > buf.limit().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If an exception encountered , we should reset to the original pos & limit ? change the pos or limit is not friendly to the upper layer...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, but I copied this from ByteBufferReadable, so I think we should leave it for now, and if we want to lift this limitation, then we can do so for both ByteBufferReadable and ByteBufferPositionedReadable in another JIRA.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth noting that the way it's implemented, it seems like on exception, the contents of the buffer are also undefined, right? ie we could have partially overwritten the buffer and then thrown?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for saying "buffer state undefined"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should provide an StreamCapabilities method for upstream to decide whether use the ByteBuffer pread , as we discussed in JIRA ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the code is there, updated the javadoc to reflect this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If n is 0, will we get stuck in a dead loop ? , try to use if(n > 0 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC the HDFS read APIs make the same guarantees as InputStream#read(byte[], which returns -1 if there is no more data to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
!! ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was copied from hdfsFileUsesDirectRead, but I agree it hard to understand so I cleaned it up.
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think it would be good to rename this pos to 'bufPos' so it's clearer that it's referring to the position in the buffer and not the current position in the file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems like preadByteBuffer requires the underlying stream to have this capability, so this should probably delegate to ((StreamCapabiltiies)in).hasCapability(PREADBYTEBUFFER), right?
(interestingly, the same goes for a few others of the capabilities like dropbehind, I think. Curious what @steveloughran has to say.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you forward to another stream, yes, wrap it. Theres's ongoing work in #575 to do more of this, with a matching PathCapabilities call for filesystems so you can see if a file you open under a path will have the feature before you open it (#568). These are both background bits of work so have languished neglected, but I'd love to see them in. StreamCapabilities is in hadoop 2.9+ BTW, so while the features you can look for evolves, at least the client code can compile without problems (and if you copy the new capabilities, link)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hrm. I think we should probably do a follow-up JIRA to fix this, not for performance reasons, but because the try{...} block encompasses a lot of code. Let's say we accidentally screw up something in our encryption config and we get a ClassCastException somewhere inside decrypt. We'll swallow the real exception and claim that positioned read isn't supported, which isn't quite right.
So, I agree an instanceof check up front is probably the clearest from a code perspective and also avoids the above issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably should specifically say "with byte buffers" or something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doc is confusing me a bit (haven't looked at the impl yet). It seems this both reads and writes to the same buf, but the read is happening from the 'start' whereas the write is happening at 'buf.position()'? That seems somewhat unexpected and opens up some questions about whether the output range and input range can overlap
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separate nit: "should be unchanged" -> "will be unchanged" or "are not changed". Should be sounds awfully wishy-washy for a postcondition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I copied this from the other #decrypt methods. Regardless, I rewrote this part so hopefully it is easier to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a TODO here that we can likely avoid one of these copies, at least when the byte buffer passed by the user is a direct buffer? It looks like the patch is currently doing:
pread -> user buffer
for each chunk:
copy from user buffer to tmp input
decrypt tmp input to tmp output
copy from tmp output to user buffer
but we could likely decrypt back to the user buffer directly, or decrypt from the user buffer to a tmp, and then write back. (this all assumes that the crypto codecs don't support in-place decryption, which they might)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking briefly through openssl code, it seems like it actually supports in == out encryption, so maybe we could avoid both buffer copies for full blocks, and maybe avoid one buffer copy for the non-bytebuffer case as well by making inBuffer == outBuffer.
Again, probably just something to file for later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a TODO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of saving pos/limit here and restoring them later, would it be easier to duplicate() the bytebuffer? then you could easily just set the limit to match 'n' and not worry about it? The loop bounds might become a bit easier, too (while buf.remaining() > 0) etc since you no longer need to consider the passed-in length.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I played around with this a bit, calling #duplicate avoids having to save and then reset the original limit and position, but I think you still have to set the limit when processing each chunk, other the call to localInBuffer.put(buf) will throw a BufferOverflowException. The Javadocs of #put say:
* If there are more bytes remaining in the
* source buffer than in this buffer, that is, if
* <tt>src.remaining()</tt> <tt>></tt> <tt>remaining()</tt>,
* then no bytes are transferred and a {@link
* BufferOverflowException} is thrown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth noting that the way it's implemented, it seems like on exception, the contents of the buffer are also undefined, right? ie we could have partially overwritten the buffer and then thrown?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DirectByteBuffer avoids an extra copy into the java heap vs the C heap. It's still copying data out of the kernel to user space either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
per above, the kernel->user transition's the same here, it's just avoiding some JVM heap copies
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adding one extra interface shouldn't force reformatting this entire set of lines -this only makes backporting on a common conflict point worse. Can you just add the new interface and the end and leave the rest alone? thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
Addressed comments, rebased patch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whitespace:tabs in line
|
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather creating a new JIRA for this (now), over leaving a TODO in the code. You know that TODO is going to get forgotten about otherwise....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the TODO and filed HDFS-14417
|
Rebased a fixed a few checkstyle issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whitespace:tabs in line
|
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: do you mind negating this check so that you can un-indent the rest of the method? ("early return" style)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here (and a couple similar below)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"should be" -> "will be"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be 1<<1, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, thanks for catching that. I probably should not have just blindly copied the line above. I added a regression test for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems a little weird to return 0 but also set errno here -- I see that you're doing it because you want to fall back to false on error, but then maybe errno should not be set
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. I think the reason for setting errno is that all the other methods set this to indicate that an exception occurred while the method was run. So if ret is true and errno needs to be set, then the method has to return with some value, right? Returning false seemed more reasonable than returning true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a brief doc, even though this is just an internal method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems odd to printf to stderr from a library. Is this our general pattern?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, literally everything prints to stderr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems the javadoc says that the method can return 0 for a transient zero-length result, and -1 for EOF. But here we're converting -1 to 0, so the caller can't distinguish between a must-retry '0 length read' and a 'shouldn't retry' EOF, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this looks like a latent bug. I double checked how hdfsRead works without the ByteBuffer optimization and it seems to be doing the right thing. I fixed this method and the pread equivalent to mimic what is done with the ByteBuffer optimization is disabled.
|
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whitespace:tabs in line
|
💔 -1 overall
This message was automatically generated. |
9c5acf7 to
f5c692d
Compare
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
@toddlipcon addressed your comments. |
* Remove the "try { ... } catch (ClassCatchException e)" pattern from
CryptoInputStream
* Fixed a bug in CryptoInputStream#hasCapability where the method was
not correctly delegating to the underlying stream
* Further refactoring of CryptoInputStream#decrypt methods to make the
code easier to understand
* Revised Javadoc for CryptoInputStream#decrypt
* Revised Javadoc for ByteBufferPositionedReadable and
ByteBufferReadable
* Revised Javadoc for readDirect, preadDirect, hdfsRead, and hdfsPread
|
💔 -1 overall
This message was automatically generated. |
|
Hadoop QA is still hitting these memory issues, so I ran the full suite of I looped I took a quick look at the heap dump as well and didn't see anything that was related to this patch. Interestingly, most of the memory is consumed by Mockito. |
|
Thanks. +1, merged to trunk. |
Cherry-pick and fix conflicts for the following changes from trunk HDFS-14267. Add test_libhdfs_ops to libhdfs tests, mark libhdfs_read/write.c as examples. HDFS-14111. hdfsOpenFile on HDFS causes unnecessary IO from file offset 0. HDFS-3246: pRead equivalent for direct read path (apache#597) Change-Id: I883812e8bb5f0e951a2523336d63022ab61cf737
Contributed by Sahil Takiar (cherry picked from commit 4877f0a) (cherry picked from commit 71cc468) Conflicts: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ByteBufferReadable.java hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/crypto/TestCryptoStreams.java Change-Id: Ie80bb459e1e20d5ba1b39f6a281f8450765bf007
Call for 'status' or 'kill' does not require Execution plan calculation. Author: Boris S <[email protected]> Author: Boris Shkolnik <[email protected]> Reviewers: Xinyu Liu <[email protected]> Closes apache#597 from sborya/RemoteAppRunnerStatus and squashes the following commits: 7e1feea0 [Boris S] retry 1c0b3e4f [Boris S] checkstyle e8d8d517 [Boris S] skipp graph planner for app status command 88f8559 [Boris S] Merge branch 'master' of https://github.com/apache/samza 0edf343 [Boris S] Merge branch 'master' of https://github.com/apache/samza 67e611e [Boris S] Merge branch 'master' of https://github.com/apache/samza dd39d08 [Boris S] Merge branch 'master' of https://github.com/apache/samza 1ad58d4 [Boris S] Merge branch 'master' of https://github.com/apache/samza 06b1ac3 [Boris Shkolnik] Merge branch 'master' of https://github.com/sborya/samza 5e6f5fb [Boris Shkolnik] Merge branch 'master' of https://github.com/apache/samza 010fa16 [Boris S] Merge branch 'master' of https://github.com/apache/samza bbffb79 [Boris S] Merge branch 'master' of https://github.com/apache/samza d4620d6 [Boris S] Merge branch 'master' of https://github.com/apache/samza 410ce78 [Boris S] Merge branch 'master' of https://github.com/apache/samza a31a7aa2 [Boris Shkolnik] reduce debugging from info to debug in KafkaCheckpointManager.java
Cherry-pick and fix conflicts for the following changes from trunk HDFS-14267. Add test_libhdfs_ops to libhdfs tests, mark libhdfs_read/write.c as examples. HDFS-14111. hdfsOpenFile on HDFS causes unnecessary IO from file offset 0. HDFS-3246: pRead equivalent for direct read path (apache#597) Change-Id: I883812e8bb5f0e951a2523336d63022ab61cf737 (cherry picked from commit 01c0d7d)
Cherry-pick and fix conflicts for the following changes from trunk HDFS-14267. Add test_libhdfs_ops to libhdfs tests, mark libhdfs_read/write.c as examples. HDFS-14111. hdfsOpenFile on HDFS causes unnecessary IO from file offset 0. HDFS-3246: pRead equivalent for direct read path (apache#597) Change-Id: I883812e8bb5f0e951a2523336d63022ab61cf737 (cherry picked from commit 01c0d7d)
HDFS-3246: pRead equivalent for direct read path Contributed by Sahil Takiar
HDFS-3246: pRead equivalent for direct read path Contributed by Sahil Takiar
HDFS-3246: pRead equivalent for direct read path Contributed by Sahil Takiar
HDFS-3246: pRead equivalent for direct read path Contributed by Sahil Takiar
HDFS-3246: pRead equivalent for direct read path Contributed by Sahil Takiar
HDFS-3246: pRead equivalent for direct read path Contributed by Sahil Takiar
HDFS-3246: pRead equivalent for direct read path Contributed by Sahil Takiar
HDFS-3246: pRead equivalent for direct read path Contributed by Sahil Takiar
HDFS-3246: pRead equivalent for direct read path Contributed by Sahil Takiar
HDFS-3246: pRead equivalent for direct read path Contributed by Sahil Takiar
Contributed by Sahil Takiar (cherry picked from commit 4877f0a) Conflicts: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ByteBufferReadable.java hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/crypto/TestCryptoStreams.java Change-Id: Ie80bb459e1e20d5ba1b39f6a281f8450765bf007
This reverts commit 57202cc.
This reverts commit 57202cc.
HDFS-3246: First several iterations of this patch are posted on the JIRA. This PR is a continuation of this work, it was created to make the code more reviewable. The version of the patch posted here fixes a few minor issues reported by Yetus, and added some more Javadocs to the changes in
CryptoInputStreamto make the code easier to understand.