Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Aug 7, 2025

What changes were proposed in this pull request?

This PR aims to use Java 9+ java.nio.file.Files.readAllBytes instead of com.google.common.io.Files.toByteArray.

In addition, a new Scalastyle rule is added to ban Files.toByteArray for consistency.

Why are the changes needed?

The built-in Java method is as good as 3rd party library.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Pass the CIs.

Was this patch authored or co-authored using generative AI tooling?

No.

assert(diskStore.getSize(blockId) === testData.length)

val diskData = Files.toByteArray(diskBlockManager.getFile(blockId.name))
val diskData = Files.readAllBytes(diskBlockManager.getFile(blockId.name).toPath)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which one is better, Files.readAllBytes or com.google.common.io.ByteStreams#toByteArray(java.io.InputStream)?

Copy link
Member Author

@dongjoon-hyun dongjoon-hyun Aug 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the performance. It's the same for reading part, @LuciferYang . The main performance difference happens at writer API part usually.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I mentioned "The built-in Java method is as good as 3rd party library." in this PR description.

Copy link
Member Author

@dongjoon-hyun dongjoon-hyun Aug 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, FYI, I'm digging this are as a part of Java 25 preparation and CI runtime improvement. If we stick to the old libraries, we cannot get the benefit of new Java version's improvement. For me, the stale Scala 2.13 and 3rd party libraries are very suspicious to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining!

@dongjoon-hyun
Copy link
Member Author

BTW, all tests passed except one flaky test of SparkConnectServiceSuite suite.

@dongjoon-hyun
Copy link
Member Author

Thank you, @LuciferYang . Merged to master for Apache Spark 4.1.0.

@dongjoon-hyun dongjoon-hyun deleted the SPARK-53164 branch August 7, 2025 14:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants