Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Aug 3, 2025

What changes were proposed in this pull request?

This PR aims to use Java OutputStream.write instead of IOUtils.write.

Why are the changes needed?

To use a better implementation for our use cases.

scala> val s = "a".repeat(400_000_000)

scala> spark.time(new java.io.FileOutputStream("/tmp/a").write(s.getBytes()))
Time taken: 270 ms

scala> spark.time(org.apache.commons.io.IOUtils.write(s, new java.io.FileOutputStream("/tmp/a")))
Time taken: 1070 ms

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Pass the CIs.

Was this patch authored or co-authored using generative AI tooling?

No.

@dongjoon-hyun
Copy link
Member Author

Thank you, @HyukjinKwon .

@dongjoon-hyun
Copy link
Member Author

I verified this test PR manually.

$ dev/lint-scala
Using SPARK_LOCAL_IP=localhost
Scalastyle checks passed.
Scalafmt checks passed.

$ build/sbt "core/testOnly org.apache.spark.util.UtilsSuite"
...
[info] Tests: succeeded 61, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 22 s, completed Aug 3, 2025, 5:08:10 PM

$ build/sbt "streaming/testOnly org.apache.spark.streaming.InputStreamsSuite"
...
[info] Tests: succeeded 11, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 22 s, completed Aug 3, 2025, 5:09:06 PM

Merged to master.

@dongjoon-hyun dongjoon-hyun deleted the SPARK-53090 branch August 4, 2025 00:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants