-
Couldn't load subscription status.
- Fork 9.1k
HADOOP-11452 make rename/3 public #2735
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-11452 make rename/3 public #2735
Conversation
full commit logs of squashed commitsPR with the previous patches. This is a WiP but ready for some iniital review; lacks tests & spec. Also, because the base FileSystem.rename/3 does its own src/dest checks, it's less efficient against object stores. They need their own high-perf subclass. Change-Id: I1586ef2290d7a3d2d33b1a32e2f0999b07c26143 HADOOP-11452 rename/3 has tests for local fs, raw local and hdfs.
tests are based on those of rename; there's still some stuff about renaming into an empty directory which makes me suspect there's ambiguity there Change-Id: Ic1ca6cbe3e9ca80ab8e1459167a7012678e856fc HADOOP-11452 rename(path, path, options) to become public
Issues: is the rename point executeInnerRename() a good name? S3AFS should really implement a direct rename/3 so there's no duplication of checks for parent etc, maybe even passing Change-Id: I0ac3695434d85072ab860854e5e88bc6d36e754a HADOOP-11452 trying to move rename/3 logic into its own class Change-Id: If2ab67152e08e4c2a225f6e89c24a5d1ff79ee59 HADOOP-15183: S3AFileSystem does rename/3 Factored the rename check logic out into a RenameHelper which is then used in S3A FileSystem as the PoC of how it can use the RenameHelper then directly invoke the inner operations. Added some more @Retry attributes as appropriate. Conclusion: it works, but for efficient IOPS then For more work on this
of the 50+ places which call rename, they seem split 3-ways int
2 & 3 are the ones to change. Change-Id: Id77ed7352b9d5ddb124f9191c5c5f1b8a76da7bb HADOOP-11452. Rename Review of RenameHelper based on current coding styles and Change-Id: I3d39ee3ed04a10e7db2c2b2c79833b945b4d691b HADOOP-11452 Rename/3 S3A high performance edition. This avoids all surplus S3 calls and has meaningful exception TODO:
Change-Id: I408b2cfe93f266cf0c9084fa8f05bb84b65c2bad HADOOP-11452 Rename/3
Proposed changes
Change-Id: I1fab598553b8e9de4d659b80248bac440dbac018 |
85015d4 to
744b0b8
Compare
|
Quick review of this, especially the factoring out Good:
Looking at s3a rename I now want
so: all the callbacks would be completable futures; default impl would be sequential, but for the stores we can go async.
No timetable for this. |
744b0b8 to
3e36589
Compare
|
@steveloughran, is this still active. I'm interested in using this functionality and would be willing to help review. |
|
I have stopped working on this. Feel free to take it up I originally thought "hey, we could just make this public and there'd be a good rename", but as usual the challenge becomes one of strictly implementing the preconditions. FileContext does that, though non-atomically; Factoring out all that policy at least makes things consistent. But, having dealt with other rename related trouble recently, I'm thinking really I'd want a new builder-based rename Why this?
Why async?
|
3e36589 to
73ae1f6
Compare
|
💔 -1 overall
This message was automatically generated. |
73ae1f6 to
da3d476
Compare
|
💔 -1 overall
This message was automatically generated. |
PR with the previous patches. Also, because the base FileSystem.rename/3 does its own src/dest checks, it's less efficient against object stores. They need their own high-perf subclass. Change-Id: I1586ef2290d7a3d2d33b1a32e2f0999b07c26143 HADOOP-11452 rename/3 has tests for local fs, raw local and hdfs. + updates docs + contains fix for HADOOP-16255 : checksum FS doesn't do rename/3 properly tests are based on those of rename; there's still some stuff about renaming into an empty directory which makes me suspect there's ambiguity there Change-Id: Ic1ca6cbe3e9ca80ab8e1459167a7012678e856fc HADOOP-11452 rename(path, path, options) to become public * exceptions match expectations of FSMainOperationsBaseTest * clean up FSMainOperationsBaseTest by moving to intercept, try with resources. * S3A to implement subclass of FSMainOperationsBaseTest, ITestS3AFSMainOperations * S3A to implement ITestS3AContractRenameEx * Add protected override point, executeInnerRename, for implementing rename/3; base class calls rename() and throws if that returns false, i.e. current behaviour * S3A overrides executeInnerRename to call its own innerRename() and so raise all failures as IOEs. Issues: is the rename point executeInnerRename() a good name? S3AFS should really implement a direct rename/3 so there's no duplication of checks for parent etc, maybe even passing the values down. Or we make sure innerRename() is consistent with the spec, which primarily means logic about dest dir existing. Change-Id: I0ac3695434d85072ab860854e5e88bc6d36e754a HADOOP-11452 trying to move rename/3 logic into its own class Change-Id: If2ab67152e08e4c2a225f6e89c24a5d1ff79ee59 HADOOP-15183: S3AFileSystem does rename/3 Factored the rename check logic out into a RenameHelper which is then used in S3A FileSystem as the PoC of how it can use the RenameHelper then directly invoke the inner operations. Added some more @Retry attributes as appropriate. Conclusion: it works, but for efficient IOPS then `innerRename()` needs to take optional source and dest status values and so so omit repeating the checks. For more work on this * the tests; file context has some so review and add to AbstractContractRenameTest. Also, based on some feedback from Sean Mackrory: verify the renamed files actually have the data. * move internal use of rename (distcp, maprv2 FileOutputcommitter), etc to use this. of the 50+ places which call rename, they seem split 3-ways int 1. subclasses and forwarding 2. invocations whch check the return value and throw an IOE 3. invocations which are written on the assumption that renames raise exceptions on failure 2 & 3 are the ones to change. Change-Id: Id77ed7352b9d5ddb124f9191c5c5f1b8a76da7bb HADOOP-11452. Rename Review of RenameHelper based on current coding styles and plans (IOStats, etc) Change-Id: I3d39ee3ed04a10e7db2c2b2c79833b945b4d691b HADOOP-11452 Rename/3 S3A high performance edition. This avoids all surplus S3 calls and has meaningful exception raising. TODO: * pull the S3A code out into is own operation + extra callbacks (innerGetFileStatus is all that's needed) * see if the FileContext default logic can be pulled out too, using a custom set of callbacks. If it can't the logic is broken. * do some testing Change-Id: I408b2cfe93f266cf0c9084fa8f05bb84b65c2bad HADOOP-11452 Rename/3 * Add RawLocalFileSystem rename with useful errors * pull out all rename docs into their own filesystem.md doc * Add a callback impl which => FileContext too, at least for the nonatomic rename. FC doesn't do path name checking. Should we? Proposed changes * move the new interfaces up to o.a.h.fs, so that .impl is never imported in FileSystem APIS. * remove the createRename callbacks method, just have stores with implement rename/3 other than the base FS to override all of rename 3. Change-Id: I1fab598553b8e9de4d659b80248bac440dbac018
da3d476 to
33c7641
Compare
|
💔 -1 overall
This message was automatically generated. |
|
closing issue, giving up on this |
make rename/3 public. Rollup of #743 commits rebased to trunk