Skip to content

Conversation

@belugabehr
Copy link
Contributor

No description provided.

@apache apache deleted a comment from hadoop-yetus Jan 6, 2020
Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems s good idea.

  1. Looking at the class -which I haven't before- it's a bit of a mess of local File and FileSystem IO. I'd be reluctant to add more FileSystem stuff there -but can't point to a good alternative place right now.
  2. Passing in overwrite options on create is critical, or make overwrite the default (and tell people!)
  3. And we will need FileContext equivalent.
  4. trunk now has builders for create and open. It would be good to explore using these in our own code, so we can tune the APIs. But for something so minimal I think it's overkill, and it would make backporting harder


URI uri = tmp.toURI();
Configuration conf = new Configuration();
FileSystem fs = FileSystem.newInstance(uri, conf);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just use FileSystem.get()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was copy & paste from other tests in this same class. I can look at that though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as well as performance issues, you will leak filesystem instances

List<String> read =
FileUtils.readLines(new File(testPath.toUri()), StandardCharsets.UTF_8);

assertEquals(write, read);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd consider some round trip tests with ContractTestUtils too, to verify interop.

* @param path the path to the file
* @param charseq the char sequence to write to the file
*
* @return the file system
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well. Allows for method chaining. For example it's common to write a file into the tmp directory then move it into its final destination to avoid writing garbage into the target directory if the write fails.

File.write(fs, tmpPath, byte[]).rename(tmpPath, path);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"common" as in not-object-store-optimised

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Gives one the flexibility to take advantage of it or ignore it if it is of no value.

* @throws NullPointerException if any of the arguments are {@code null}
* @throws IOException if an I/O error occurs creating or writing to the file
*/
public static FileSystem write(final FileSystem fs, final Path path,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add in overwrite options. We've been dealing with 404 caching in S3A, which relies on createFile(overwrite = false). Unless you make the default, it must be something callers can use.

@belugabehr
Copy link
Contributor Author

@steveloughran Thanks for the review!

can't point to a good alternative place right now.

Neither could I.

Passing in overwrite options on create is critical, or make overwrite the default (and tell people!)

The default is to overwrite because this is not an append function and is the most straightforward behavior. The behavior is already documented in the JavaDoc:

 This utility method opens the file for writing, creating the file if it does not exist, or overwrites an existing file.

And we will need FileContext equivalent.

I'm honestly not sure that that is, but can that be added as a backlog item?

Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. All files must be created with overwrite equals true. That is, unless you promise that all support calls related to this code not working on S3. And I mean it. https://issues.apache.org/jira/browse/HADOOP-16490

  2. I'd like you to use the new createFile() Builder API. Why so? The more we use it internally, the more we can be confident is suitable. Similarly, if you try to read files, use the openFile() operation.

  3. Yes, you get to implement FileContext support now. Not let's put it off and forget about, now.

This is not as hard as you think because all the code which actually write to the opened stream will be identical. The only difference is the object on which you call createFile() on to get back an FSDataOutputStreamBuilder. The rest will be identical.

Objects.requireNonNull(cs);

CharsetEncoder encoder = cs.newEncoder();
try (FSDataOutputStream out = fs.create(path);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

builder with overwrite=true


URI uri = tmp.toURI();
Configuration conf = new Configuration();
FileSystem fs = FileSystem.newInstance(uri, conf);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as well as performance issues, you will leak filesystem instances


URI uri = tmp.toURI();
Configuration conf = new Configuration();
FileSystem fs = FileSystem.newInstance(uri, conf);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FileSystem.get()

@belugabehr
Copy link
Contributor Author

@steveloughran Let me know if I covered all of your requests. I don't mean to be obtuse here, I just haven't used builder API or FileContext before.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 42s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 21m 36s trunk passed
+1 💚 compile 19m 52s trunk passed
+1 💚 checkstyle 0m 44s trunk passed
+1 💚 mvnsite 1m 21s trunk passed
+1 💚 shadedclient 16m 13s branch has no errors when building and testing our client artifacts.
+1 💚 javadoc 1m 23s trunk passed
+0 🆗 spotbugs 2m 11s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 2m 9s trunk passed
_ Patch Compile Tests _
+1 💚 mvninstall 0m 50s the patch passed
+1 💚 compile 19m 12s the patch passed
+1 💚 javac 19m 12s the patch passed
-0 ⚠️ checkstyle 0m 45s hadoop-common-project/hadoop-common: The patch generated 1 new + 117 unchanged - 0 fixed = 118 total (was 117)
+1 💚 mvnsite 1m 20s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 shadedclient 13m 50s patch has no errors when building and testing our client artifacts.
-1 ❌ javadoc 1m 25s hadoop-common-project_hadoop-common generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
+1 💚 findbugs 2m 18s the patch passed
_ Other Tests _
+1 💚 unit 9m 21s hadoop-common in the patch passed.
+1 💚 asflicense 0m 44s The patch does not generate ASF License warnings.
115m 57s
Subsystem Report/Notes
Docker Client=19.03.5 Server=19.03.5 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1792/4/artifact/out/Dockerfile
GITHUB PR #1792
JIRA Issue HADOOP-16790
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle
uname Linux dbb463bed6d9 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / 2576c31
Default Java 1.8.0_232
checkstyle https://builds.apache.org/job/hadoop-multibranch/job/PR-1792/4/artifact/out/diff-checkstyle-hadoop-common-project_hadoop-common.txt
javadoc https://builds.apache.org/job/hadoop-multibranch/job/PR-1792/4/artifact/out/diff-javadoc-javadoc-hadoop-common-project_hadoop-common.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-1792/4/testReport/
Max. process+thread count 1348 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-1792/4/console
versions git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

@apache apache deleted a comment from hadoop-yetus Jan 15, 2020
@apache apache deleted a comment from hadoop-yetus Jan 15, 2020
@steveloughran
Copy link
Contributor

LGTM +1
merged into trunk and branch-3.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants