Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Aug 6, 2025

What changes were proposed in this pull request?

This PR aims to use Java Files.readString instead of com.google.common.io.Files.asCharSource.

Why are the changes needed?

To use a simpler built-in Java API.

  • Both core and yarn modules use this in testing
  • kubernetes module uses to read small config files or YAML template files in addition to testing.
-import java.nio.charset.StandardCharsets
+import java.nio.file.Files
...
-import com.google.common.io.Files
...
+ val expected = Files.asCharSource(expectedFile, StandardCharsets.UTF_8).read()
- val expected = Files.readString(expectedFile.toPath)

Does this PR introduce any user-facing change?

No behavior changes.

How was this patch tested?

Pass the CIs.

Was this patch authored or co-authored using generative AI tooling?

No.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

java.nio.file.Files.readString will be replaced to Files.readString when we finish the migration in this file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are they the same (in the effect)? What migration it is?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • For read side, the performance is the same.
  • For write side, Java is faster than Google library. My very next PR will provide a simple perf comparison.

This is a subtask of the following migration towards modern and faster Java APIs.

  • SPARK-53047 Mordernize Spark to use the latest Java features

Copy link
Member Author

@dongjoon-hyun dongjoon-hyun Aug 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @dongjoon-hyun

@dongjoon-hyun dongjoon-hyun marked this pull request as draft August 6, 2025 19:58
@dongjoon-hyun dongjoon-hyun marked this pull request as ready for review August 6, 2025 20:07
@dongjoon-hyun
Copy link
Member Author

Could you review this PR when you have some time, @viirya ?

@dongjoon-hyun
Copy link
Member Author

Thank you so much, @viirya !


val entry = is.getNextEntry
assert(entry != null)
val actual = new String(ByteStreams.toByteArray(is), StandardCharsets.UTF_8)
Copy link
Member Author

@dongjoon-hyun dongjoon-hyun Aug 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, these test suites are testing the compressed event logs (LZ4, Snappy, ZSD) which cannot fit to UTF_8. This is a kind of bug fix.

val fileName = entry.getName.stripPrefix(logPath.getName + "/")
assert(allFileNames.contains(fileName))

val actual = new String(ByteStreams.toByteArray(is), StandardCharsets.UTF_8)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Aug 6, 2025

The first commit passed all relevant tests except the Spark History Event logs due to the existing test code bug. It's fixed at the second commit and I verified like the following.

$ build/sbt "core/testOnly org.apache.spark.deploy.history.*"
...
[info] Tests: succeeded 255, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 65 s (01:05), completed Aug 6, 2025, 2:37:02 PM
$ dev/lint-scala
Using SPARK_LOCAL_IP=localhost
Scalastyle checks passed.
Scalafmt checks passed.

@dongjoon-hyun
Copy link
Member Author

Merged to master for Apache Spark 4.1.0.

@dongjoon-hyun dongjoon-hyun deleted the SPARK-53152 branch August 6, 2025 21:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants