Skip to content

Conversation

@HyukjinKwon
Copy link
Member

@HyukjinKwon HyukjinKwon commented Jun 18, 2016

What changes were proposed in this pull request?

This PR makes input_file_name() function return the file paths not empty strings for external data sources based on NewHadoopRDD, such as spark-redshift and spark-xml.

The codes with the external data sources below:

df.select(input_file_name).show()

will produce

  • Before

    +-----------------+
    |input_file_name()|
    +-----------------+
    |                 |
    +-----------------+
    
  • After

    +--------------------+
    |   input_file_name()|
    +--------------------+
    |file:/private/var...|
    +--------------------+
    

How was this patch tested?

Unit tests in ColumnExpressionSuite.

@HyukjinKwon
Copy link
Member Author

cc @cloud-fan Could you please take a look maybe? I remember renaming SqlNewHadoopRDDState to InputFileNameHolder was reviewed by you.

@SparkQA
Copy link

SparkQA commented Jun 18, 2016

Test build #60769 has finished for PR 13759 at commit 10dedc2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Jun 21, 2016

LGTM - merging in master/2.0.

@asfgit asfgit closed this in 4f7f1c4 Jun 21, 2016
asfgit pushed a commit that referenced this pull request Jun 21, 2016
…urces based on NewHadoopRDD

## What changes were proposed in this pull request?

This PR makes `input_file_name()` function return the file paths not empty strings for external data sources based on `NewHadoopRDD`, such as [spark-redshift](https://github.com/databricks/spark-redshift/blob/cba5eee1ab79ae8f0fa9e668373a54d2b5babf6b/src/main/scala/com/databricks/spark/redshift/RedshiftRelation.scala#L149) and [spark-xml](https://github.com/databricks/spark-xml/blob/master/src/main/scala/com/databricks/spark/xml/util/XmlFile.scala#L39-L47).

The codes with the external data sources below:

```scala
df.select(input_file_name).show()
```

will produce

- **Before**
  ```
+-----------------+
|input_file_name()|
+-----------------+
|                 |
+-----------------+
```

- **After**
  ```
+--------------------+
|   input_file_name()|
+--------------------+
|file:/private/var...|
+--------------------+
```

## How was this patch tested?

Unit tests in `ColumnExpressionSuite`.

Author: hyukjinkwon <[email protected]>

Closes #13759 from HyukjinKwon/SPARK-16044.

(cherry picked from commit 4f7f1c4)
Signed-off-by: Reynold Xin <[email protected]>
@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented Jun 21, 2016

Thank you @rxin! Would this be sensible if this one is backported to branch 1-6?

@rxin
Copy link
Contributor

rxin commented Jun 21, 2016

Can you submit a pr for 1.6? Thanks.

@HyukjinKwon
Copy link
Member Author

@rxin Sure!

asfgit pushed a commit that referenced this pull request Jun 29, 2016
…n NewHadoopRDD to branch 1.6

## What changes were proposed in this pull request?

This PR backports #13759.

(`SqlNewHadoopRDDState` was renamed to `InputFileNameHolder` and `spark` API does not exist in branch 1.6)

## How was this patch tested?

Unit tests in `ColumnExpressionSuite`.

Author: hyukjinkwon <[email protected]>

Closes #13806 from HyukjinKwon/backport-SPARK-16044.
zzcclp pushed a commit to zzcclp/spark that referenced this pull request Jun 30, 2016
…n NewHadoopRDD to branch 1.6

## What changes were proposed in this pull request?

This PR backports apache#13759.

(`SqlNewHadoopRDDState` was renamed to `InputFileNameHolder` and `spark` API does not exist in branch 1.6)

## How was this patch tested?

Unit tests in `ColumnExpressionSuite`.

Author: hyukjinkwon <[email protected]>

Closes apache#13806 from HyukjinKwon/backport-SPARK-16044.

(cherry picked from commit 1ac830a)
@HyukjinKwon HyukjinKwon deleted the SPARK-16044 branch January 2, 2018 03:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants