-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-21475][Core] Use NIO's Files API to replace FileInputStream/FileOutputStream in some critical paths #18684
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Change-Id: I0f11b9e0cbe62ca3d0bac7bfe0e2df838da80b48
srowen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose using NIO, yeah. Are there more places that could be updated that could make a difference?
|
|
||
| final FileOutputStream out = new FileOutputStream(outputFile, true); | ||
| final FileChannel out = FileChannel.open(outputFile.toPath(), | ||
| ImmutableSet.of(WRITE, APPEND, CREATE)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indent 4 spaces?
| boolean copyThrewException = true; | ||
| try { | ||
| lengths[i] = Utils.copyStream(in, out, false, transferToEnabled); | ||
| final long size = in.size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why final? we generally don't write that unless it's required. I know the surrounding code does though anyway
|
Thanks @srowen for your review. I generally focused on shuffle related parts to avoid using And for other modules like SQL I don't address the issues there, not sure if it is also needed. |
Change-Id: I883b36420471a20615dbb1e7e10421884fa0b690
|
Test build #79769 has finished for PR 18684 at commit
|
|
Test build #79771 has finished for PR 18684 at commit
|
| } else { | ||
| FileChannel fileChannel = new FileInputStream(file).getChannel(); | ||
| FileChannel fileChannel = FileChannel.open(file.toPath(), | ||
| ImmutableSet.of(StandardOpenOption.READ)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need a new set for this? should we call:
FileChannel fileChannel = FileChannel.open(file.toPath(), StandardOpenOption.READ)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They're actually the same, the one you mentioned is just a simple wrapper of the former one. I will change to yours for the simplicity.
|
|
||
| final FileOutputStream out = new FileOutputStream(outputFile, true); | ||
| final FileChannel out = FileChannel.open(outputFile.toPath(), | ||
| ImmutableSet.of(WRITE, APPEND, CREATE)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
final FileChannel out = FileChannel.open(outputFile.toPath(), WRITE, APPEND, CREATE)
?
| lengths[i] = Utils.copyStream(in, out, false, transferToEnabled); | ||
| long size = in.size(); | ||
| Utils.copyFileStreamNIO(in, out, 0, size); | ||
| lengths[i] = size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we modify Utils.copyStream() to support this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There're lot of other places using Utils#copyStream, it would be better not to change this API.
Change-Id: Ibb0036d0ac88c01310cba817da0bb40535c12351
|
Test build #79801 has finished for PR 18684 at commit
|
jiangxb1987
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
@srowen any further comment? |
srowen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's worth updating, yes, if you can see a performance improvement
|
@cloud-fan @JoshRosen can you please help to review, thanks! |
|
Merged to master |
…ternal shuffle service ## What changes were proposed in this pull request? This PR is the second attempt of #18684 , NIO's Files API doesn't override `skip` method for `InputStream`, so it will bring in performance issue (mentioned in #20119). But using `FileInputStream`/`FileOutputStream` will also bring in memory issue (https://dzone.com/articles/fileinputstream-fileoutputstream-considered-harmful), which is severe for long running external shuffle service. So here in this proposal, only fixing the external shuffle service related code. ## How was this patch tested? Existing tests. Author: jerryshao <[email protected]> Closes #20144 from jerryshao/SPARK-21475-v2. (cherry picked from commit 93f92c0) Signed-off-by: Shixiong Zhu <[email protected]>
…ternal shuffle service ## What changes were proposed in this pull request? This PR is the second attempt of #18684 , NIO's Files API doesn't override `skip` method for `InputStream`, so it will bring in performance issue (mentioned in #20119). But using `FileInputStream`/`FileOutputStream` will also bring in memory issue (https://dzone.com/articles/fileinputstream-fileoutputstream-considered-harmful), which is severe for long running external shuffle service. So here in this proposal, only fixing the external shuffle service related code. ## How was this patch tested? Existing tests. Author: jerryshao <[email protected]> Closes #20144 from jerryshao/SPARK-21475-v2.
What changes were proposed in this pull request?
Java's
FileInputStreamandFileOutputStreamoverrides finalize(), even this file input/output stream is closed correctly and promptly, it will still leave some memory footprints which will only get cleaned in Full GC. This will introduce two side effects:https://bugs.openjdk.java.net/browse/JDK-8080225
https://www.cloudbees.com/blog/fileinputstream-fileoutputstream-considered-harmful
So to fix this potential issue, here propose to use NIO's Files#newInput/OutputStream instead in some critical paths like shuffle.
Left unchanged FileInputStream in core which I think is not so critical:
Left unchanged FileOutputStream in core:
Here in
DiskBlockObjectWriter, it usesFileDescriptorso it is not easy to change to NIO Files API.For the
FileInputStreamandFileOutputStreamin common/shuffle* I changed them all.How was this patch tested?
Existing tests and manual verification.