[SPARK-25918][SQL] LOAD DATA LOCAL INPATH should handle a relative path #22927

dongjoon-hyun · 2018-11-01T20:33:34Z

What changes were proposed in this pull request?

Unfortunately, it seems that we missed this in 2.4.0. In Spark 2.4, if the default file system is not the local file system, LOAD DATA LOCAL INPATH only works in case of absolute paths. This PR aims to fix it to support relative paths. This is a regression in 2.4.0.

$ ls kv1.txt
kv1.txt

scala> spark.sql("LOAD DATA LOCAL INPATH 'kv1.txt' INTO TABLE t")
org.apache.spark.sql.AnalysisException: LOAD DATA input path does not exist: kv1.txt;

How was this patch tested?

Pass the Jenkins

dongjoon-hyun · 2018-11-01T20:36:16Z

sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala

      }
    } else {
-      path
+      new Path(pathUri)


path doesn't contain workingDir information.

Could you review this, @cloud-fan , @gatorsmile , @HyukjinKwon , @vanzin and @squito ?

Here, it is converted PATH to URI and then converted back to Path. What is your goal rather than directly building a path?

if (path.isAbsolute()) path else new Path(workingDir, path)

Good point. Instead of this, new Path(workingDir, path) will work. In that case, we may refactor this as a variable for line 379 and reuse it.

new Path(workingDir, path)

squito · 2018-11-01T21:28:02Z

good catch, I should have checked this case too.

lgtm

dongjoon-hyun · 2018-11-01T21:39:11Z

Thank you for review, @squito !

gatorsmile · 2018-11-01T22:07:51Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala

    }
  }

+  test("SPARK-25918: LOAD DATA LOCAL INPATH should handle a relative path") {


This does not belong to this test suite. HiveCommandSuite.scala is the best place, although this is not for hive module.

Ya. I know. I put this here before the previous test case are here.

I'll move this.

SparkQA · 2018-11-02T00:17:55Z

Test build #98368 has finished for PR 22927 at commit efb99da.

This patch fails SparkR unit tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2018-11-02T01:10:41Z

Thank you, @squito and @gatorsmile . I addressed the review comments.
The SparkR failure looks irrelevant to this. I also observed that in another unrelated PR (#22924), too

dongjoon-hyun · 2018-11-02T01:40:29Z

For SparkR failure, https://issues.apache.org/jira/browse/SPARK-25923 is filed.

SparkQA · 2018-11-02T04:37:28Z

Test build #98377 has finished for PR 22927 at commit 85a5864.

This patch fails SparkR unit tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2018-11-02T06:17:24Z

Thank you for review, @squito , @gatorsmile , @HyukjinKwon .
According to the log, all Java/Scala/Python/R tests passed.

The failure mark is only due to SPARK-25923. I'll merge this.

* checking CRAN incoming feasibility ...Error in .check_package_CRAN_incoming(pkgdir) : 
  dims [product 26] do not match the length of object [0]
Execution halted

dongjoon-hyun · 2018-11-02T06:17:41Z

Merged to master/branch-2.4.

## What changes were proposed in this pull request? Unfortunately, it seems that we missed this in 2.4.0. In Spark 2.4, if the default file system is not the local file system, `LOAD DATA LOCAL INPATH` only works in case of absolute paths. This PR aims to fix it to support relative paths. This is a regression in 2.4.0. ```scala $ ls kv1.txt kv1.txt scala> spark.sql("LOAD DATA LOCAL INPATH 'kv1.txt' INTO TABLE t") org.apache.spark.sql.AnalysisException: LOAD DATA input path does not exist: kv1.txt; ``` ## How was this patch tested? Pass the Jenkins Closes #22927 from dongjoon-hyun/SPARK-LOAD. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit e91b607) Signed-off-by: Dongjoon Hyun <[email protected]>

cloud-fan · 2018-11-02T07:14:44Z

I'll list it as a known issue in 2.4.0, thanks for fixing it!

dongjoon-hyun · 2018-11-02T16:36:16Z

Thank you, @cloud-fan !

## What changes were proposed in this pull request? Unfortunately, it seems that we missed this in 2.4.0. In Spark 2.4, if the default file system is not the local file system, `LOAD DATA LOCAL INPATH` only works in case of absolute paths. This PR aims to fix it to support relative paths. This is a regression in 2.4.0. ```scala $ ls kv1.txt kv1.txt scala> spark.sql("LOAD DATA LOCAL INPATH 'kv1.txt' INTO TABLE t") org.apache.spark.sql.AnalysisException: LOAD DATA input path does not exist: kv1.txt; ``` ## How was this patch tested? Pass the Jenkins Closes apache#22927 from dongjoon-hyun/SPARK-LOAD. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

## What changes were proposed in this pull request? Unfortunately, it seems that we missed this in 2.4.0. In Spark 2.4, if the default file system is not the local file system, `LOAD DATA LOCAL INPATH` only works in case of absolute paths. This PR aims to fix it to support relative paths. This is a regression in 2.4.0. ```scala $ ls kv1.txt kv1.txt scala> spark.sql("LOAD DATA LOCAL INPATH 'kv1.txt' INTO TABLE t") org.apache.spark.sql.AnalysisException: LOAD DATA input path does not exist: kv1.txt; ``` ## How was this patch tested? Pass the Jenkins Closes apache#22927 from dongjoon-hyun/SPARK-LOAD. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit e91b607) Signed-off-by: Dongjoon Hyun <[email protected]>

[SPARK-25918][SQL] LOAD DATA LOCAL INPATH should handle a relative path

efb99da

dongjoon-hyun commented Nov 1, 2018

View reviewed changes

gatorsmile reviewed Nov 1, 2018

View reviewed changes

Address comments

85a5864

HyukjinKwon approved these changes Nov 2, 2018

View reviewed changes

asfgit closed this in e91b607 Nov 2, 2018

dongjoon-hyun deleted the SPARK-LOAD branch November 2, 2018 06:39

[SPARK-25918][SQL] LOAD DATA LOCAL INPATH should handle a relative path #22927

[SPARK-25918][SQL] LOAD DATA LOCAL INPATH should handle a relative path #22927

Uh oh!

Conversation

dongjoon-hyun commented Nov 1, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

dongjoon-hyun Nov 1, 2018

Choose a reason for hiding this comment

Uh oh!

gatorsmile Nov 1, 2018

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Nov 1, 2018

Choose a reason for hiding this comment

Uh oh!

squito commented Nov 1, 2018

Uh oh!

dongjoon-hyun commented Nov 1, 2018

Uh oh!

gatorsmile Nov 1, 2018

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Nov 1, 2018

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Nov 1, 2018

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Nov 2, 2018

Uh oh!

dongjoon-hyun commented Nov 2, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dongjoon-hyun commented Nov 2, 2018

Uh oh!

SparkQA commented Nov 2, 2018

Uh oh!

dongjoon-hyun commented Nov 2, 2018

Uh oh!

dongjoon-hyun commented Nov 2, 2018

Uh oh!

cloud-fan commented Nov 2, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dongjoon-hyun commented Nov 2, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

dongjoon-hyun commented Nov 1, 2018 •

edited

Loading

dongjoon-hyun commented Nov 2, 2018 •

edited

Loading

cloud-fan commented Nov 2, 2018 •

edited

Loading