-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-31692][SQL][2.4] Pass hadoop confs specifed via Spark confs to URLStreamHandlerfactory #28529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…treamHandlerfactory
Pass hadoop confs specifed via Spark confs to URLStreamHandlerfactory
**BEFORE**
```
➜ spark git:(SPARK-31692) ✗ ./bin/spark-shell --conf spark.hadoop.fs.file.impl=org.apache.hadoop.fs.RawLocalFileSystem
scala> spark.sharedState
res0: org.apache.spark.sql.internal.SharedState = org.apache.spark.sql.internal.SharedState5793cd84
scala> new java.net.URL("file:///tmp/1.txt").openConnection.getInputStream
res1: java.io.InputStream = org.apache.hadoop.fs.ChecksumFileSystem$FSDataBoundedInputStream22846025
scala> import org.apache.hadoop.fs._
import org.apache.hadoop.fs._
scala> FileSystem.get(new Path("file:///tmp/1.txt").toUri, spark.sparkContext.hadoopConfiguration)
res2: org.apache.hadoop.fs.FileSystem = org.apache.hadoop.fs.LocalFileSystem5a930c03
```
**AFTER**
```
➜ spark git:(SPARK-31692) ✗ ./bin/spark-shell --conf spark.hadoop.fs.file.impl=org.apache.hadoop.fs.RawLocalFileSystem
scala> spark.sharedState
res0: org.apache.spark.sql.internal.SharedState = org.apache.spark.sql.internal.SharedState5c24a636
scala> new java.net.URL("file:///tmp/1.txt").openConnection.getInputStream
res1: java.io.InputStream = org.apache.hadoop.fs.FSDataInputStream2ba8f528
scala> import org.apache.hadoop.fs._
import org.apache.hadoop.fs._
scala> FileSystem.get(new Path("file:///tmp/1.txt").toUri, spark.sparkContext.hadoopConfiguration)
res2: org.apache.hadoop.fs.FileSystem = LocalFS
scala> FileSystem.get(new Path("file:///tmp/1.txt").toUri, spark.sparkContext.hadoopConfiguration).getClass
res3: Class[_ <: org.apache.hadoop.fs.FileSystem] = class org.apache.hadoop.fs.RawLocalFileSystem
```
The type of FileSystem object created(you can check the last statement in the above snippets) in the above two cases are different, which should not have been the case
No
Tested locally.
Added Unit test
Closes apache#28516 from karuppayya/SPARK-31692.
Authored-by: Karuppayya Rajendran <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 7260146)
|
Backport commit for SPARK-31692 @dongjoon-hyun |
|
ok to test |
|
Thank you, @karuppayya ! I updated the PR description like the original PR. |
| object SharedState extends Logging { | ||
| try { | ||
| URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory()) | ||
| SparkSession.getActiveSession match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ur, is it the same with branch-3.0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this because we don't have a configuration patch in branch-2.4?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we dont have the configuration patch
|
Test build #122625 has finished for PR 28529 at commit
|
|
Test build #122627 has started for PR 28529 at commit |
|
Hmm. In this case, I believe we had better have the conf patch because we can disable it when we have a regression on this PR. Let me take a look at that. |
|
Retest this please |
|
Test build #122748 has finished for PR 28529 at commit
|
|
Thank you for your patience, @karuppayya . |
|
@karuppayya . I cherry-picked your original commit from |
What changes were proposed in this pull request?
Pass hadoop confs specifed via Spark confs to URLStreamHandlerfactory
Why are the changes needed?
BEFORE
AFTER
The type of FileSystem object created(you can check the last statement in the above snippets) in the above two cases are different, which should not have been the case
Does this PR introduce any user-facing change?
No
How was this patch tested?
Pass the jenkins with newly added test cases.