Skip to content

Conversation

@taoli91
Copy link
Contributor

@taoli91 taoli91 commented Apr 26, 2016

What changes were proposed in this pull request?

The drive label and the back slash on windows' absolute path would confuse the URI scheme. This pull request mostly fix the path issue for windows.

How was this patch tested?

Unit tests

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@srowen
Copy link
Member

srowen commented Apr 26, 2016

@taoli91 please combine all your PRs into one and close the others.

@srowen
Copy link
Member

srowen commented Apr 26, 2016

Most of this PR is not the way to do it and adds lots of complexity. You're manually reimplementing a lot of logic in the JDK.

@taoli91
Copy link
Contributor Author

taoli91 commented Apr 26, 2016

@srowen Sure, I'll combine them into one. Could you elaborate more about what logic am I re-implementing?

case null | "local" => new File(path).getCanonicalFile.toURI.toString
case _ => path
}
val uri =Utils.resolveURI(path)
Copy link
Member

@HyukjinKwon HyukjinKwon Apr 26, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe val uri = Utils.resolveURI(path)

@srowen
Copy link
Member

srowen commented Apr 26, 2016

Broadly: you should be able to use the File API with forward-slash paths to describe and manipulate file locations, in a platform-independent way. Wherever you're changing Strings to Files is probably a good thing. Where you're manually handling paths with regexes seems suboptimal.

@taoli91
Copy link
Contributor Author

taoli91 commented Apr 26, 2016

please combine all your PRs into one and close the others.

@srowen Double think about this problem, it seems the PR would be too large if combining into one. I personally believe that it's better to categorize the changes and separate them.

@srowen
Copy link
Member

srowen commented Apr 26, 2016

The problem is that discussion gets split across many threads, when the changes have a logical relationship. I see that they're mostly distinct threads of change in this case. Some of these may end up not being merged or in reduced form, which makes them more combine-able. For now, leave them to see where they go

@HyukjinKwon
Copy link
Member

(@taoli91 FYI, it would be great if you run sbt scalastyle or ./dev/run-tests (this triggers style checking first) for style corrections before adding some commits. This will show up the corrections I said in the comments.)

@taoli91
Copy link
Contributor Author

taoli91 commented Apr 28, 2016

@HyukjinKwon Thanks for pointing out, I haven't noticed that there is an auto style check. Im going to have a vacation for a few days and I will fix them as soon as possible next week

@taoli91
Copy link
Contributor Author

taoli91 commented Apr 28, 2016

Where you're manually handling paths with regexes seems suboptimal.

@srowen The regex parts are actually dealing with the URI. The default URI can't correctly handle the absolute Windows local path. It will recognize the drive label to be scheme. For example new URI("D:\test") will return a scheme of D. I can't find a JDK method to handle this problem, thus I have to use regex to find out the label and manually append a / in front of the path. E.g., convert D:\test to /D:/test before passing to the constructor of URI

@srowen
Copy link
Member

srowen commented Apr 28, 2016

Isn't the URI just file:///D:/test? EDIT: more specifically, would a user not need to write paths like /D:/test as input in general to things that are going to interpret them as URI-like strings? that makes this unnecessary. That is, I'd imagine lots of things don't accept D:/test

@taoli91
Copy link
Contributor Author

taoli91 commented Apr 28, 2016

@srowen new FIle("D:\test").toURI.toString can return file://D/test but new URI("D:\test") won't. But we should make sure that it is a local file before converting it into a file. Actually lots of method accepts a path string, and it will be converted to URI internally. Most test cases pass in path string instead of URI like string.

@srowen
Copy link
Member

srowen commented Apr 28, 2016

Yes, I'm assuming all of the inputs need to be parseable as a URI in order to keep this sensible. At least, that seems less problematic than trying to accept inputs like D:\foo\bar. /foo/bar is ambiguous since it could be a local or HDFS file, in which case it needs a scheme. This does probably mean some pieces of the code which deal with local files via string paths need some extra logic to parse as a URI and get the parsed path out of it. I think that's substantially what you're doing here, just wondering if we can make it simpler by making stronger input assumptions.

@vanzin
Copy link
Contributor

vanzin commented Aug 4, 2016

Is this needed after #13868? pinging @avulanov

@avulanov
Copy link
Contributor

avulanov commented Aug 5, 2016

@vanzin This PR partly addresses a broader problem that path construction is not consistent throughout Spark sources. Indeed, this is a potential source of bugs. Here is an example #13868 (comment). However, author addresses the problem with regex, which does not seem to be optimal as @srowen mentioned. I believe we should have a separate JIRA for making all path references and construction error-prone and consistent, and check everything including shell scripts. For example, https://github.com/apache/spark/blob/master/build/mvn#L161 contains a reference to ${MVN_BIN} -someParameters which will fail if maven path contains a space.

@srowen
Copy link
Member

srowen commented Aug 15, 2016

Let's close this PR

@steveloughran
Copy link
Contributor

If you are working with windows paths; Hadoop's Path class contains the code to do this, stabilised and addressing the corner cases

@steveloughran
Copy link
Contributor

As #13868 does adopt org.apache.hadoop.io.Path, I don't see this patch being needed —though it may highlight some places where the new code may need applying

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants