[SPARK-16044][SQL] input_file_name() returns empty strings in data sources based on NewHadoopRDD #13759

HyukjinKwon · 2016-06-18T07:07:22Z

What changes were proposed in this pull request?

This PR makes input_file_name() function return the file paths not empty strings for external data sources based on NewHadoopRDD, such as spark-redshift and spark-xml.

The codes with the external data sources below:

df.select(input_file_name).show()

will produce

Before

+-----------------+
|input_file_name()|
+-----------------+
|                 |
+-----------------+

After

+--------------------+
|   input_file_name()|
+--------------------+
|file:/private/var...|
+--------------------+

How was this patch tested?

Unit tests in ColumnExpressionSuite.

…adoopRDD

HyukjinKwon · 2016-06-18T07:22:48Z

cc @cloud-fan Could you please take a look maybe? I remember renaming SqlNewHadoopRDDState to InputFileNameHolder was reviewed by you.

SparkQA · 2016-06-18T09:14:30Z

Test build #60769 has finished for PR 13759 at commit 10dedc2.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2016-06-21T04:54:27Z

LGTM - merging in master/2.0.

…urces based on NewHadoopRDD ## What changes were proposed in this pull request? This PR makes `input_file_name()` function return the file paths not empty strings for external data sources based on `NewHadoopRDD`, such as [spark-redshift](https://github.com/databricks/spark-redshift/blob/cba5eee1ab79ae8f0fa9e668373a54d2b5babf6b/src/main/scala/com/databricks/spark/redshift/RedshiftRelation.scala#L149) and [spark-xml](https://github.com/databricks/spark-xml/blob/master/src/main/scala/com/databricks/spark/xml/util/XmlFile.scala#L39-L47). The codes with the external data sources below: ```scala df.select(input_file_name).show() ``` will produce - **Before** ``` +-----------------+ |input_file_name()| +-----------------+ | | +-----------------+ ``` - **After** ``` +--------------------+ | input_file_name()| +--------------------+ |file:/private/var...| +--------------------+ ``` ## How was this patch tested? Unit tests in `ColumnExpressionSuite`. Author: hyukjinkwon <[email protected]> Closes #13759 from HyukjinKwon/SPARK-16044. (cherry picked from commit 4f7f1c4) Signed-off-by: Reynold Xin <[email protected]>

HyukjinKwon · 2016-06-21T04:57:22Z

Thank you @rxin! Would this be sensible if this one is backported to branch 1-6?

rxin · 2016-06-21T04:58:53Z

Can you submit a pr for 1.6? Thanks.

HyukjinKwon · 2016-06-21T04:59:41Z

@rxin Sure!

…n NewHadoopRDD to branch 1.6 ## What changes were proposed in this pull request? This PR backports #13759. (`SqlNewHadoopRDDState` was renamed to `InputFileNameHolder` and `spark` API does not exist in branch 1.6) ## How was this patch tested? Unit tests in `ColumnExpressionSuite`. Author: hyukjinkwon <[email protected]> Closes #13806 from HyukjinKwon/backport-SPARK-16044.

…n NewHadoopRDD to branch 1.6 ## What changes were proposed in this pull request? This PR backports apache#13759. (`SqlNewHadoopRDDState` was renamed to `InputFileNameHolder` and `spark` API does not exist in branch 1.6) ## How was this patch tested? Unit tests in `ColumnExpressionSuite`. Author: hyukjinkwon <[email protected]> Closes apache#13806 from HyukjinKwon/backport-SPARK-16044. (cherry picked from commit 1ac830a)

input_file_name() returns empty strings in data sources based on NewH…

10dedc2

…adoopRDD

asfgit closed this in 4f7f1c4 Jun 21, 2016

HyukjinKwon mentioned this pull request Jun 21, 2016

[SPARK-16044][SQL] Backport input_file_name() for data source based on NewHadoopRDD to branch 1.6 #13806

Closed

HyukjinKwon deleted the SPARK-16044 branch January 2, 2018 03:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-16044][SQL] input_file_name() returns empty strings in data sources based on NewHadoopRDD #13759

[SPARK-16044][SQL] input_file_name() returns empty strings in data sources based on NewHadoopRDD #13759

Uh oh!

HyukjinKwon commented Jun 18, 2016 •

edited

Loading

Uh oh!

HyukjinKwon commented Jun 18, 2016

Uh oh!

SparkQA commented Jun 18, 2016

Uh oh!

rxin commented Jun 21, 2016

Uh oh!

HyukjinKwon commented Jun 21, 2016 •

edited

Loading

Uh oh!

rxin commented Jun 21, 2016

Uh oh!

HyukjinKwon commented Jun 21, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-16044][SQL] input_file_name() returns empty strings in data sources based on NewHadoopRDD #13759

[SPARK-16044][SQL] input_file_name() returns empty strings in data sources based on NewHadoopRDD #13759

Uh oh!

Conversation

HyukjinKwon commented Jun 18, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

HyukjinKwon commented Jun 18, 2016

Uh oh!

SparkQA commented Jun 18, 2016

Uh oh!

rxin commented Jun 21, 2016

Uh oh!

HyukjinKwon commented Jun 21, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rxin commented Jun 21, 2016

Uh oh!

HyukjinKwon commented Jun 21, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

HyukjinKwon commented Jun 18, 2016 •

edited

Loading

HyukjinKwon commented Jun 21, 2016 •

edited

Loading