-
Notifications
You must be signed in to change notification settings - Fork 9.2k
HADOOP-14566. Add seek support for SFTP FileSystem. #1999
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
The PR is ready for review. |
|
🎊 +1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
...op-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPInputStream.java
Show resolved
Hide resolved
steveloughran
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You done some great work here!
A few minor details and it will be ready to go in.
...op-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPInputStream.java
Outdated
Show resolved
Hide resolved
...op-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPInputStream.java
Outdated
Show resolved
Hide resolved
| sshd.start(); | ||
| int port = sshd.getPort(); | ||
|
|
||
| conf.setClass("fs.sftp.impl", SFTPFileSystem.class, FileSystem.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be set in core-default.xml already?
if not, sftp:// urls would break. (yes, i know every stack overflow spark example does this, but that is just superstition)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not set in core-default.xml, and if not specified here sftp urls won't be resolved by sftp schema. Could you please clarify a bit what exactly you mean here? Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double checked, it's quite strange. Is it a candidate for an improvement issue?
...mon-project/hadoop-common/src/test/java/org/apache/hadoop/fs/contract/sftp/SFTPContract.java
Outdated
Show resolved
Hide resolved
| } | ||
|
|
||
| @Override | ||
| public Path getTestPath() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shame we didn't declare this as raising an ioe; probably too late now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not too late I suppose, would you like me to try to declare it as throwing IOE? I can issue a new Jira and fix it there.
hadoop-common-project/hadoop-common/src/test/resources/contract/sftp.xml
Outdated
Show resolved
Hide resolved
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java
Outdated
Show resolved
Hide resolved
mpryahin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks a lot for your review
...op-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPInputStream.java
Outdated
Show resolved
Hide resolved
...op-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPInputStream.java
Outdated
Show resolved
Hide resolved
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java
Outdated
Show resolved
Hide resolved
...mon-project/hadoop-common/src/test/java/org/apache/hadoop/fs/contract/sftp/SFTPContract.java
Outdated
Show resolved
Hide resolved
hadoop-common-project/hadoop-common/src/test/resources/contract/sftp.xml
Outdated
Show resolved
Hide resolved
| sshd.start(); | ||
| int port = sshd.getPort(); | ||
|
|
||
| conf.setClass("fs.sftp.impl", SFTPFileSystem.class, FileSystem.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not set in core-default.xml, and if not specified here sftp urls won't be resolved by sftp schema. Could you please clarify a bit what exactly you mean here? Thank you!
| } | ||
|
|
||
| @Override | ||
| public Path getTestPath() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not too late I suppose, would you like me to try to declare it as throwing IOE? I can issue a new Jira and fix it there.
|
🎊 +1 overall
This message was automatically generated. |
|
@steveloughran , thanks a lot for review. All the issues you pointed out have been fixed. Is there anything else to be done from my side? |
...on-project/hadoop-common/src/test/java/org/apache/hadoop/fs/contract/AbstractFSContract.java
Outdated
Show resolved
Hide resolved
|
🎊 +1 overall
This message was automatically generated. |
Contributed by Mikhail Pryakhin
|
+1, merged to trunk and branch-3.3 thanks! |
This PR implements seek method for SFTP File System which supports both forward and backward lazy seeks.
Normally subsequent reads proceed reading from the position where the previous read finished, meaning we can avoid making seek operations in this case. We only need to seek when the current
FsDataInputstream#getPos()doesn't equal a requested position to read from.Whenever
seekis called it's deferred until the next read operation. If a consecutivereadhappens at the same position where the previousreadfinished then it merely proceeds reading. If a consecutivereadhappens at the position greater than the position where the previousreadfinished weseekforward to the requested position. And finally, if a consecutivereadhappens at the position smaller than the currentinputstreamposition then we close the currentinputstream, reopen a new one and seek to the requested position.