-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-25918][SQL] LOAD DATA LOCAL INPATH should handle a relative path #22927
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| } | ||
| } else { | ||
| path | ||
| new Path(pathUri) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
path doesn't contain workingDir information.
Could you review this, @cloud-fan , @gatorsmile , @HyukjinKwon , @vanzin and @squito ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, it is converted PATH to URI and then converted back to Path. What is your goal rather than directly building a path?
if (path.isAbsolute()) path else new Path(workingDir, path)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Instead of this, new Path(workingDir, path) will work. In that case, we may refactor this as a variable for line 379 and reuse it.
new Path(workingDir, path)
|
good catch, I should have checked this case too. lgtm |
|
Thank you for review, @squito ! |
| } | ||
| } | ||
|
|
||
| test("SPARK-25918: LOAD DATA LOCAL INPATH should handle a relative path") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not belong to this test suite. HiveCommandSuite.scala is the best place, although this is not for hive module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ya. I know. I put this here before the previous test case are here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll move this.
|
Test build #98368 has finished for PR 22927 at commit
|
|
Thank you, @squito and @gatorsmile . I addressed the review comments. |
|
For SparkR failure, https://issues.apache.org/jira/browse/SPARK-25923 is filed. |
|
Test build #98377 has finished for PR 22927 at commit
|
|
Thank you for review, @squito , @gatorsmile , @HyukjinKwon . The failure mark is only due to SPARK-25923. I'll merge this. |
|
Merged to master/branch-2.4. |
## What changes were proposed in this pull request?
Unfortunately, it seems that we missed this in 2.4.0. In Spark 2.4, if the default file system is not the local file system, `LOAD DATA LOCAL INPATH` only works in case of absolute paths. This PR aims to fix it to support relative paths. This is a regression in 2.4.0.
```scala
$ ls kv1.txt
kv1.txt
scala> spark.sql("LOAD DATA LOCAL INPATH 'kv1.txt' INTO TABLE t")
org.apache.spark.sql.AnalysisException: LOAD DATA input path does not exist: kv1.txt;
```
## How was this patch tested?
Pass the Jenkins
Closes #22927 from dongjoon-hyun/SPARK-LOAD.
Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit e91b607)
Signed-off-by: Dongjoon Hyun <[email protected]>
|
I'll list it as a known issue in 2.4.0, thanks for fixing it! |
|
Thank you, @cloud-fan ! |
## What changes were proposed in this pull request?
Unfortunately, it seems that we missed this in 2.4.0. In Spark 2.4, if the default file system is not the local file system, `LOAD DATA LOCAL INPATH` only works in case of absolute paths. This PR aims to fix it to support relative paths. This is a regression in 2.4.0.
```scala
$ ls kv1.txt
kv1.txt
scala> spark.sql("LOAD DATA LOCAL INPATH 'kv1.txt' INTO TABLE t")
org.apache.spark.sql.AnalysisException: LOAD DATA input path does not exist: kv1.txt;
```
## How was this patch tested?
Pass the Jenkins
Closes apache#22927 from dongjoon-hyun/SPARK-LOAD.
Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
## What changes were proposed in this pull request?
Unfortunately, it seems that we missed this in 2.4.0. In Spark 2.4, if the default file system is not the local file system, `LOAD DATA LOCAL INPATH` only works in case of absolute paths. This PR aims to fix it to support relative paths. This is a regression in 2.4.0.
```scala
$ ls kv1.txt
kv1.txt
scala> spark.sql("LOAD DATA LOCAL INPATH 'kv1.txt' INTO TABLE t")
org.apache.spark.sql.AnalysisException: LOAD DATA input path does not exist: kv1.txt;
```
## How was this patch tested?
Pass the Jenkins
Closes apache#22927 from dongjoon-hyun/SPARK-LOAD.
Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit e91b607)
Signed-off-by: Dongjoon Hyun <[email protected]>
## What changes were proposed in this pull request?
Unfortunately, it seems that we missed this in 2.4.0. In Spark 2.4, if the default file system is not the local file system, `LOAD DATA LOCAL INPATH` only works in case of absolute paths. This PR aims to fix it to support relative paths. This is a regression in 2.4.0.
```scala
$ ls kv1.txt
kv1.txt
scala> spark.sql("LOAD DATA LOCAL INPATH 'kv1.txt' INTO TABLE t")
org.apache.spark.sql.AnalysisException: LOAD DATA input path does not exist: kv1.txt;
```
## How was this patch tested?
Pass the Jenkins
Closes apache#22927 from dongjoon-hyun/SPARK-LOAD.
Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit e91b607)
Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
Unfortunately, it seems that we missed this in 2.4.0. In Spark 2.4, if the default file system is not the local file system,
LOAD DATA LOCAL INPATHonly works in case of absolute paths. This PR aims to fix it to support relative paths. This is a regression in 2.4.0.How was this patch tested?
Pass the Jenkins