Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Aug 2, 2025

What changes were proposed in this pull request?

This PR aims to use Java Files.readAllLines/write instead of FileUtils.(read|write)Lines.

In addition,

  • commons-io test dependency is removed from commons-utils module
  • Two Scalastyle rules are added to prevent a future regression.

Why are the changes needed?

Java implementations are faster.

SAMPLE DATA

scala> val array = new java.util.ArrayList[String]()
val array: java.util.ArrayList[String] = []

scala> (1 to 100_000_000).foreach { _ => array.add("a") }

BEFORE (WRITE)

scala> spark.time(org.apache.commons.io.FileUtils.writeLines(new java.io.File("/tmp/text"), array))
Time taken: 5013 ms

AFTER (WRITE)

scala> spark.time(java.nio.file.Files.write(java.nio.file.Paths.get("/tmp/text"), array))
Time taken: 1191 ms

BEFORE(READ)

scala> spark.time(org.apache.commons.io.FileUtils.readLines(new java.io.File("/tmp/text")))
Time taken: 2377 ms

AFTER(READ)

scala> spark.time(java.nio.file.Files.readAllLines(java.nio.file.Paths.get("/tmp/text")))
Time taken: 2279 ms

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Pass the CIs.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the BUILD label Aug 2, 2025
@dongjoon-hyun dongjoon-hyun changed the title [SPARK-53075][CORE] Use Java Files.readAllLines/write instead of FileUtils.(read|write)Lines [SPARK-53075][CORE][TESTS] Use Java Files.readAllLines/write instead of FileUtils.(read|write)Lines Aug 2, 2025
@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Aug 2, 2025

Could you review this test PR which improves test suite only, @LuciferYang ?

@LuciferYang
Copy link
Contributor

Merged into master for Apache Spark 4.1.0. Thanks @dongjoon-hyun and @peter-toth

@dongjoon-hyun
Copy link
Member Author

Thank you, @peter-toth and @LuciferYang !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants