Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Nov 1, 2018

What changes were proposed in this pull request?

Unfortunately, it seems that we missed this in 2.4.0. In Spark 2.4, if the default file system is not the local file system, LOAD DATA LOCAL INPATH only works in case of absolute paths. This PR aims to fix it to support relative paths. This is a regression in 2.4.0.

$ ls kv1.txt
kv1.txt

scala> spark.sql("LOAD DATA LOCAL INPATH 'kv1.txt' INTO TABLE t")
org.apache.spark.sql.AnalysisException: LOAD DATA input path does not exist: kv1.txt;

How was this patch tested?

Pass the Jenkins

}
} else {
path
new Path(pathUri)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

path doesn't contain workingDir information.

Could you review this, @cloud-fan , @gatorsmile , @HyukjinKwon , @vanzin and @squito ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, it is converted PATH to URI and then converted back to Path. What is your goal rather than directly building a path?

if (path.isAbsolute()) path else new Path(workingDir, path)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Instead of this, new Path(workingDir, path) will work. In that case, we may refactor this as a variable for line 379 and reuse it.

new Path(workingDir, path)

@squito
Copy link
Contributor

squito commented Nov 1, 2018

good catch, I should have checked this case too.

lgtm

@dongjoon-hyun
Copy link
Member Author

Thank you for review, @squito !

}
}

test("SPARK-25918: LOAD DATA LOCAL INPATH should handle a relative path") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not belong to this test suite. HiveCommandSuite.scala is the best place, although this is not for hive module.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya. I know. I put this here before the previous test case are here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll move this.

@SparkQA
Copy link

SparkQA commented Nov 2, 2018

Test build #98368 has finished for PR 22927 at commit efb99da.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Nov 2, 2018

Thank you, @squito and @gatorsmile . I addressed the review comments.
The SparkR failure looks irrelevant to this. I also observed that in another unrelated PR (#22924), too

@dongjoon-hyun
Copy link
Member Author

For SparkR failure, https://issues.apache.org/jira/browse/SPARK-25923 is filed.

@SparkQA
Copy link

SparkQA commented Nov 2, 2018

Test build #98377 has finished for PR 22927 at commit 85a5864.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member Author

Thank you for review, @squito , @gatorsmile , @HyukjinKwon .
According to the log, all Java/Scala/Python/R tests passed.

The failure mark is only due to SPARK-25923. I'll merge this.

* checking CRAN incoming feasibility ...Error in .check_package_CRAN_incoming(pkgdir) : 
  dims [product 26] do not match the length of object [0]
Execution halted

@dongjoon-hyun
Copy link
Member Author

Merged to master/branch-2.4.

asfgit pushed a commit that referenced this pull request Nov 2, 2018
## What changes were proposed in this pull request?

Unfortunately, it seems that we missed this in 2.4.0. In Spark 2.4, if the default file system is not the local file system, `LOAD DATA LOCAL INPATH` only works in case of absolute paths. This PR aims to fix it to support relative paths. This is a regression in 2.4.0.

```scala
$ ls kv1.txt
kv1.txt

scala> spark.sql("LOAD DATA LOCAL INPATH 'kv1.txt' INTO TABLE t")
org.apache.spark.sql.AnalysisException: LOAD DATA input path does not exist: kv1.txt;
```

## How was this patch tested?

Pass the Jenkins

Closes #22927 from dongjoon-hyun/SPARK-LOAD.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit e91b607)
Signed-off-by: Dongjoon Hyun <[email protected]>
@asfgit asfgit closed this in e91b607 Nov 2, 2018
@dongjoon-hyun dongjoon-hyun deleted the SPARK-LOAD branch November 2, 2018 06:39
@cloud-fan
Copy link
Contributor

cloud-fan commented Nov 2, 2018

I'll list it as a known issue in 2.4.0, thanks for fixing it!

@dongjoon-hyun
Copy link
Member Author

Thank you, @cloud-fan !

jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
## What changes were proposed in this pull request?

Unfortunately, it seems that we missed this in 2.4.0. In Spark 2.4, if the default file system is not the local file system, `LOAD DATA LOCAL INPATH` only works in case of absolute paths. This PR aims to fix it to support relative paths. This is a regression in 2.4.0.

```scala
$ ls kv1.txt
kv1.txt

scala> spark.sql("LOAD DATA LOCAL INPATH 'kv1.txt' INTO TABLE t")
org.apache.spark.sql.AnalysisException: LOAD DATA input path does not exist: kv1.txt;
```

## How was this patch tested?

Pass the Jenkins

Closes apache#22927 from dongjoon-hyun/SPARK-LOAD.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
kai-chi pushed a commit to kai-chi/spark that referenced this pull request Jul 23, 2019
## What changes were proposed in this pull request?

Unfortunately, it seems that we missed this in 2.4.0. In Spark 2.4, if the default file system is not the local file system, `LOAD DATA LOCAL INPATH` only works in case of absolute paths. This PR aims to fix it to support relative paths. This is a regression in 2.4.0.

```scala
$ ls kv1.txt
kv1.txt

scala> spark.sql("LOAD DATA LOCAL INPATH 'kv1.txt' INTO TABLE t")
org.apache.spark.sql.AnalysisException: LOAD DATA input path does not exist: kv1.txt;
```

## How was this patch tested?

Pass the Jenkins

Closes apache#22927 from dongjoon-hyun/SPARK-LOAD.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit e91b607)
Signed-off-by: Dongjoon Hyun <[email protected]>
kai-chi pushed a commit to kai-chi/spark that referenced this pull request Aug 1, 2019
## What changes were proposed in this pull request?

Unfortunately, it seems that we missed this in 2.4.0. In Spark 2.4, if the default file system is not the local file system, `LOAD DATA LOCAL INPATH` only works in case of absolute paths. This PR aims to fix it to support relative paths. This is a regression in 2.4.0.

```scala
$ ls kv1.txt
kv1.txt

scala> spark.sql("LOAD DATA LOCAL INPATH 'kv1.txt' INTO TABLE t")
org.apache.spark.sql.AnalysisException: LOAD DATA input path does not exist: kv1.txt;
```

## How was this patch tested?

Pass the Jenkins

Closes apache#22927 from dongjoon-hyun/SPARK-LOAD.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit e91b607)
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants