Skip to content

Commit 322b3d0

Browse files
dongjoon-hyunLuciferYang
authored andcommitted
[SPARK-53075][CORE][TESTS] Use Java Files.readAllLines/write instead of FileUtils.(read|write)Lines
### What changes were proposed in this pull request? This PR aims to use Java `Files.readAllLines/write` instead of `FileUtils.(read|write)Lines`. In addition, - `commons-io` test dependency is removed from `commons-utils` module - Two Scalastyle rules are added to prevent a future regression. ### Why are the changes needed? Java implementations are faster. **SAMPLE DATA** ```scala scala> val array = new java.util.ArrayList[String]() val array: java.util.ArrayList[String] = [] scala> (1 to 100_000_000).foreach { _ => array.add("a") } ``` **BEFORE (WRITE)** ```scala scala> spark.time(org.apache.commons.io.FileUtils.writeLines(new java.io.File("/tmp/text"), array)) Time taken: 5013 ms ``` **AFTER (WRITE)** ```scala scala> spark.time(java.nio.file.Files.write(java.nio.file.Paths.get("/tmp/text"), array)) Time taken: 1191 ms ``` **BEFORE(READ)** ```scala scala> spark.time(org.apache.commons.io.FileUtils.readLines(new java.io.File("/tmp/text"))) Time taken: 2377 ms ``` **AFTER(READ)** ```scala scala> spark.time(java.nio.file.Files.readAllLines(java.nio.file.Paths.get("/tmp/text"))) Time taken: 2279 ms ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#51787 from dongjoon-hyun/SPARK-53075. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: yangjie01 <[email protected]>
1 parent 2513f41 commit 322b3d0

File tree

3 files changed

+14
-12
lines changed

3 files changed

+14
-12
lines changed

common/utils/pom.xml

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -51,11 +51,6 @@
5151
<groupId>com.fasterxml.jackson.module</groupId>
5252
<artifactId>jackson-module-scala_${scala.binary.version}</artifactId>
5353
</dependency>
54-
<dependency>
55-
<groupId>commons-io</groupId>
56-
<artifactId>commons-io</artifactId>
57-
<scope>test</scope>
58-
</dependency>
5954
<dependency>
6055
<groupId>org.apache.ivy</groupId>
6156
<artifactId>ivy</artifactId>

common/utils/src/test/scala/org/apache/spark/util/LogKeySuite.scala

Lines changed: 4 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -17,14 +17,12 @@
1717

1818
package org.apache.spark.util
1919

20-
import java.nio.charset.StandardCharsets
2120
import java.nio.file.{Files, Path}
2221
import java.util.{ArrayList => JList}
2322

2423
import scala.jdk.CollectionConverters._
2524
import scala.reflect.runtime.universe._
2625

27-
import org.apache.commons.io.FileUtils
2826
import org.scalatest.funsuite.AnyFunSuite // scalastyle:ignore funsuite
2927

3028
import org.apache.spark.internal.{Logging, LogKeys}
@@ -61,9 +59,8 @@ class LogKeySuite
6159
private def regenerateLogKeyFile(
6260
originalKeys: Seq[String], sortedKeys: Seq[String]): Unit = {
6361
if (originalKeys != sortedKeys) {
64-
val logKeyFile = logKeyFilePath.toFile
65-
logInfo(s"Regenerating the file $logKeyFile")
66-
val originalContents = FileUtils.readLines(logKeyFile, StandardCharsets.UTF_8)
62+
logInfo(s"Regenerating the file $logKeyFilePath")
63+
val originalContents = Files.readAllLines(logKeyFilePath)
6764
val sortedContents = new JList[String]()
6865
var firstMatch = false
6966
originalContents.asScala.foreach { line =>
@@ -78,8 +75,8 @@ class LogKeySuite
7875
sortedContents.add(line)
7976
}
8077
}
81-
Files.delete(logKeyFile.toPath)
82-
FileUtils.writeLines(logKeyFile, StandardCharsets.UTF_8.name(), sortedContents)
78+
Files.delete(logKeyFilePath)
79+
Files.write(logKeyFilePath, sortedContents)
8380
}
8481
}
8582

scalastyle-config.xml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -282,6 +282,16 @@ This file is divided into 3 sections:
282282
scala.jdk.CollectionConverters._ and use .asScala / .asJava methods</customMessage>
283283
</check>
284284

285+
<check customId="readLines" level="error" class="org.scalastyle.file.RegexChecker" enabled="true">
286+
<parameters><parameter name="regex">FileUtils\.readLines</parameter></parameters>
287+
<customMessage>Use Files.readAllLines instead.</customMessage>
288+
</check>
289+
290+
<check customId="writeLines" level="error" class="org.scalastyle.file.RegexChecker" enabled="true">
291+
<parameters><parameter name="regex">FileUtils\.writeLines</parameter></parameters>
292+
<customMessage>Use Files.write instead.</customMessage>
293+
</check>
294+
285295
<check customId="deleteRecursively" level="error" class="org.scalastyle.file.RegexChecker" enabled="true">
286296
<parameters><parameter name="regex">FileUtils\.deleteDirectory</parameter></parameters>
287297
<customMessage>Use deleteRecursively of SparkFileUtils or Utils</customMessage>

0 commit comments

Comments
 (0)