Skip to content

Conversation

@pan3793
Copy link
Member

@pan3793 pan3793 commented Jun 28, 2020

What changes were proposed in this pull request?

Correct file seprate use in ExecutorDiskUtils.createNormalizedInternedPathname on Windows

Why are the changes needed?

ExternalShuffleBlockResolverSuite failed on Windows, see detail at:
https://issues.apache.org/jira/browse/SPARK-32121

Does this PR introduce any user-facing change?

No

How was this patch tested?

The existed test suite.

@pan3793 pan3793 changed the title Fix ExternalShuffleBlockResolverSuite failed on Windows [SPARK-32121][SHUFFLE][TEST] Fix ExternalShuffleBlockResolverSuite failed on Windows Jun 28, 2020
@pan3793 pan3793 requested a review from HyukjinKwon June 29, 2020 02:19
@HyukjinKwon
Copy link
Member

ok to test

@HyukjinKwon
Copy link
Member

Looks fine.

@pan3793 pan3793 requested a review from HyukjinKwon June 29, 2020 02:33
@SparkQA
Copy link

SparkQA commented Jun 29, 2020

Test build #124618 has finished for PR 28940 at commit 187e813.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 29, 2020

Test build #124627 has finished for PR 28940 at commit 8cd8bd1.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 29, 2020

Test build #124643 has finished for PR 28940 at commit 06024f4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Build started: [CORE] org.apache.spark.network.shuffle.ExternalShuffleBlockResolverSuite PR-28940

Copy link
Contributor

@attilapiros attilapiros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kinda unrelated to your change but i am keep wondering how often could the createNormalizedInternedPathname give back a different String than the FileSystem#normalize if that is the case then the String#intern does not save us from storing multiple copies of the result string within the File class as a path.

@attilapiros
Copy link
Contributor

attilapiros commented Jun 30, 2020

What about the following?

As createNormalizedInternedPathname is only used here in production:

return new File(createNormalizedInternedPathname(
localDir, String.format("%02x", subDirId), filename));

What about creating the file with the constructor:
https://github.com/openjdk/jdk/blob/9a9add8825a040565051a09010b29b099c2e7d49/jdk/src/share/classes/java/io/File.java#L275-L281:

    public File(String pathname) {
        if (pathname == null) {
            throw new NullPointerException();
        }
        this.path = fs.normalize(pathname);
        this.prefixLength = fs.prefixLength(this.path);
    }

Then read the path via File#getPath back and build the File again with the the path where the String#intern called. As we already doing path transformations twice this new solution is not worse than the current implementation but at least it will be a perfect match regarding the normalized path. The constructed first File will be garbage collected (this might be a disadvantage and the prefixLength calculation I am checking how complex is that). Moreover we can get rid of the unnecessary tests.

@Ngone51 @HyukjinKwon what's your opinion?

@HyukjinKwon
Copy link
Member

I was assuming we can't reuse io.File per:

  • the internal code in java.io.File would normalize it later, creating a new "foo/bar"
  • String copy. Unfortunately, we cannot just reuse the normalization code that java.io.File
  • uses, since it is in the package-private class java.io.FileSystem.

If it's clear that it's going to be more performant and simple, let's do it. Otherwise, let's just let get this in for now, and investigate that separately.

@attilapiros
Copy link
Contributor

attilapiros commented Jun 30, 2020

We have to measure it for sure.

Regarding simplicity of the solution I described it is basically sth like:
attilapiros@926ebbd

And I checked the prefixLength calculation they are O(1):

Of course we can do it in separate PR too.

assertPathsMatch("/", "", "", File.separator);
assertPathsMatch("/", "/", "/", File.separator);
String sep = File.separator;
String expectedPathname1 = sep + "foo" + sep + "bar" + sep + "baz";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant this value only. The others are used only once.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to write in single style if there is no strict rule that "value used only once must be inlined"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extracting a variable which only used once just increases the indirection.

Especially here where the role of the value is trivial. If it would be something complex then I might understand to use a describing name but here not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, reverted.

@SparkQA
Copy link

SparkQA commented Jun 30, 2020

Test build #124656 has finished for PR 28940 at commit aac01ca.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Jul 1, 2020

Test build #124722 has finished for PR 28940 at commit aac01ca.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 1, 2020

Test build #124744 has finished for PR 28940 at commit a80bd6c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

retest this please

@attilapiros
Copy link
Contributor

Regarding the changes below I will create a PR and open a new Jira ticket (so it will be taken care of).

We have to measure it for sure.

Regarding simplicity of the solution I described it is basically sth like:
attilapiros@926ebbd

And I checked the prefixLength calculation they are O(1):

Of course we can do it in separate PR too.

@HyukjinKwon
Copy link
Member

Sure, @attilapiros. Thanks you for following up and investigating deep. I will merge this one as soon as the tests pass.

@SparkQA
Copy link

SparkQA commented Jul 1, 2020

Test build #124765 has finished for PR 28940 at commit a80bd6c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Jul 1, 2020

Test build #124783 has finished for PR 28940 at commit a80bd6c.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Jul 1, 2020

Test build #124790 has finished for PR 28940 at commit a80bd6c.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Retest this please.

@SparkQA
Copy link

SparkQA commented Jul 1, 2020

Test build #124827 has finished for PR 28940 at commit a80bd6c.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Jul 2, 2020

Test build #124838 has finished for PR 28940 at commit a80bd6c.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Jul 2, 2020

Test build #124840 has finished for PR 28940 at commit a80bd6c.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Jul 2, 2020

Test build #124850 has finished for PR 28940 at commit a80bd6c.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Jul 2, 2020

Test build #124859 has finished for PR 28940 at commit a80bd6c.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Jul 2, 2020

Test build #124873 has finished for PR 28940 at commit a80bd6c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Merged to master and branch-3.0.

HyukjinKwon pushed a commit that referenced this pull request Jul 2, 2020
### What changes were proposed in this pull request?
Correct file seprate use in `ExecutorDiskUtils.createNormalizedInternedPathname` on Windows

### Why are the changes needed?
`ExternalShuffleBlockResolverSuite` failed on Windows, see detail at:
https://issues.apache.org/jira/browse/SPARK-32121

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
The existed test suite.

Closes #28940 from pan3793/SPARK-32121.

Lead-authored-by: pancheng <[email protected]>
Co-authored-by: chengpan <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
(cherry picked from commit 7fda184)
Signed-off-by: HyukjinKwon <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants