-
Couldn't load subscription status.
- Fork 9.1k
HADOOP-16790: Add Write Convenience Methods #1792
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems s good idea.
- Looking at the class -which I haven't before- it's a bit of a mess of local File and FileSystem IO. I'd be reluctant to add more FileSystem stuff there -but can't point to a good alternative place right now.
- Passing in overwrite options on create is critical, or make overwrite the default (and tell people!)
- And we will need FileContext equivalent.
- trunk now has builders for create and open. It would be good to explore using these in our own code, so we can tune the APIs. But for something so minimal I think it's overkill, and it would make backporting harder
|
|
||
| URI uri = tmp.toURI(); | ||
| Configuration conf = new Configuration(); | ||
| FileSystem fs = FileSystem.newInstance(uri, conf); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just use FileSystem.get()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was copy & paste from other tests in this same class. I can look at that though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as well as performance issues, you will leak filesystem instances
| List<String> read = | ||
| FileUtils.readLines(new File(testPath.toUri()), StandardCharsets.UTF_8); | ||
|
|
||
| assertEquals(write, read); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd consider some round trip tests with ContractTestUtils too, to verify interop.
| * @param path the path to the file | ||
| * @param charseq the char sequence to write to the file | ||
| * | ||
| * @return the file system |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might as well. Allows for method chaining. For example it's common to write a file into the tmp directory then move it into its final destination to avoid writing garbage into the target directory if the write fails.
File.write(fs, tmpPath, byte[]).rename(tmpPath, path);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"common" as in not-object-store-optimised
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Gives one the flexibility to take advantage of it or ignore it if it is of no value.
| * @throws NullPointerException if any of the arguments are {@code null} | ||
| * @throws IOException if an I/O error occurs creating or writing to the file | ||
| */ | ||
| public static FileSystem write(final FileSystem fs, final Path path, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add in overwrite options. We've been dealing with 404 caching in S3A, which relies on createFile(overwrite = false). Unless you make the default, it must be something callers can use.
|
@steveloughran Thanks for the review!
Neither could I.
The default is to overwrite because this is not an append function and is the most straightforward behavior. The behavior is already documented in the JavaDoc:
I'm honestly not sure that that is, but can that be added as a backlog item? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
All files must be created with overwrite equals true. That is, unless you promise that all support calls related to this code not working on S3. And I mean it. https://issues.apache.org/jira/browse/HADOOP-16490
-
I'd like you to use the new createFile() Builder API. Why so? The more we use it internally, the more we can be confident is suitable. Similarly, if you try to read files, use the openFile() operation.
-
Yes, you get to implement FileContext support now. Not let's put it off and forget about, now.
This is not as hard as you think because all the code which actually write to the opened stream will be identical. The only difference is the object on which you call createFile() on to get back an FSDataOutputStreamBuilder. The rest will be identical.
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileUtil.java
Outdated
Show resolved
Hide resolved
| Objects.requireNonNull(cs); | ||
|
|
||
| CharsetEncoder encoder = cs.newEncoder(); | ||
| try (FSDataOutputStream out = fs.create(path); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
builder with overwrite=true
|
|
||
| URI uri = tmp.toURI(); | ||
| Configuration conf = new Configuration(); | ||
| FileSystem fs = FileSystem.newInstance(uri, conf); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as well as performance issues, you will leak filesystem instances
|
|
||
| URI uri = tmp.toURI(); | ||
| Configuration conf = new Configuration(); | ||
| FileSystem fs = FileSystem.newInstance(uri, conf); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FileSystem.get()
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestFileUtil.java
Show resolved
Hide resolved
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileUtil.java
Show resolved
Hide resolved
|
@steveloughran Let me know if I covered all of your requests. I don't mean to be obtuse here, I just haven't used builder API or FileContext before. |
|
💔 -1 overall
This message was automatically generated. |
|
LGTM +1 |
No description provided.