Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Aug 1, 2025

What changes were proposed in this pull request?

This PR aims to support copyDirectory in SparkFileUtils and JavaUtils.

Why are the changes needed?

To provide a better implementation.

BEFORE

scala> spark.time(org.apache.commons.io.FileUtils.copyDirectory(new java.io.File("/tmp/spark"), new java.io.File("/tmp/spark2")))
Time taken: 5128 ms

AFTER

scala> spark.time(org.apache.spark.network.util.JavaUtils.copyDirectory(new java.io.File("/tmp/spark"), new java.io.File("/tmp/spark2")))
Time taken: 2979 ms

Does this PR introduce any user-facing change?

No behavior change.

How was this patch tested?

Pass the CIs.

Was this patch authored or co-authored using generative AI tooling?

No.

}

/** Copy src to the target directory simply. File attribute times are not copied. */
public static void copyDirectory(File src, File dst) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If src is a File, it seems that a file-to-file copy will occur regardless of whether target exists, and no error will be reported.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @LuciferYang . I addressed your comment.

throw new IllegalArgumentException("Invalid input file " + src + " or directory " + dst);
}
Path from = src.toPath();
Path to = dst.toPath();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest adding a check:

if (dstPath.toAbsolutePath().startsWith(srcPath.toRealPath())) {
    throw new IllegalArgumentException("Cannot copy directory to itself or its subdirectory");
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rest of them look ok

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. I got your point. But, for that case, if (dstPath.toRealPath().startsWith(srcPath.toRealPath())) { would be right.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's because AbsolutePath and RealPath could be different on the Mac.

scala> new java.io.File("/tmp/spark").toPath.toRealPath()
val res0: java.nio.file.Path = /private/tmp/spark

scala> new java.io.File("/tmp/spark").toPath.toAbsolutePath()
val res1: java.nio.file.Path = /tmp/spark

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your corrections

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm working on this.

Path to = dst.toPath().toAbsolutePath().normalize();
if (to.startsWith(from)) {
throw new IllegalArgumentException("Cannot copy directory to itself or its subdirectory");
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here. I addressed your comment, @LuciferYang .

Copy link
Contributor

@LuciferYang LuciferYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM
Thank you @dongjoon-hyun

@dongjoon-hyun
Copy link
Member Author

Thank you, @LuciferYang .

@dongjoon-hyun
Copy link
Member Author

Merged to master. Thank you, @peter-toth and @LuciferYang .

@dongjoon-hyun dongjoon-hyun deleted the SPARK-53073 branch August 2, 2025 12:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants