-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-14914] Normalize Paths/URIs for windows. #12695
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Helping script for windows to download dependency and start zinc to support incremental building on windows.
When we pass absolute local path with drive label on windows to SparkContext's addFile, the scheme can't be correctly handled and it will cause a failure
|
Can one of the admins verify this patch? |
|
@taoli91 please combine all your PRs into one and close the others. |
|
Most of this PR is not the way to do it and adds lots of complexity. You're manually reimplementing a lot of logic in the JDK. |
|
@srowen Sure, I'll combine them into one. Could you elaborate more about what logic am I re-implementing? |
| case null | "local" => new File(path).getCanonicalFile.toURI.toString | ||
| case _ => path | ||
| } | ||
| val uri =Utils.resolveURI(path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe val uri = Utils.resolveURI(path)
|
Broadly: you should be able to use the |
@srowen Double think about this problem, it seems the PR would be too large if combining into one. I personally believe that it's better to categorize the changes and separate them. |
|
The problem is that discussion gets split across many threads, when the changes have a logical relationship. I see that they're mostly distinct threads of change in this case. Some of these may end up not being merged or in reduced form, which makes them more combine-able. For now, leave them to see where they go |
|
(@taoli91 FYI, it would be great if you run |
|
@HyukjinKwon Thanks for pointing out, I haven't noticed that there is an auto style check. Im going to have a vacation for a few days and I will fix them as soon as possible next week |
@srowen The regex parts are actually dealing with the URI. The default URI can't correctly handle the absolute Windows local path. It will recognize the drive label to be scheme. For example |
|
Isn't the URI just |
|
@srowen |
|
Yes, I'm assuming all of the inputs need to be parseable as a URI in order to keep this sensible. At least, that seems less problematic than trying to accept inputs like |
|
@vanzin This PR partly addresses a broader problem that path construction is not consistent throughout Spark sources. Indeed, this is a potential source of bugs. Here is an example #13868 (comment). However, author addresses the problem with regex, which does not seem to be optimal as @srowen mentioned. I believe we should have a separate JIRA for making all path references and construction error-prone and consistent, and check everything including shell scripts. For example, https://github.com/apache/spark/blob/master/build/mvn#L161 contains a reference to |
|
Let's close this PR |
|
If you are working with windows paths; Hadoop's Path class contains the code to do this, stabilised and addressing the corner cases |
|
As #13868 does adopt |
Closes apache#10995 Closes apache#13658 Closes apache#14505 Closes apache#14536 Closes apache#12753 Closes apache#14449 Closes apache#12694 Closes apache#12695 Closes apache#14810
What changes were proposed in this pull request?
The drive label and the back slash on windows' absolute path would confuse the URI scheme. This pull request mostly fix the path issue for windows.
How was this patch tested?
Unit tests