[SPARK-21791][SQL] ORC should support column names with dot #19004

dongjoon-hyun · 2017-08-20T04:04:04Z

What changes were proposed in this pull request?

Currently, Apache Spark ORC data source doesn't support field names with dot properly. For a feature parity with Parquet, we had better support this. Since Apache ORC 1.4 supports this by #18953, this PR adds a test case to prevent a future regression.

PARQUET

scala> Seq(Some(1), None).toDF("col.dots").write.parquet("/tmp/parquet_dot")
scala> spark.read.parquet("/tmp/parquet_dot").show
+--------+
|col.dots|
+--------+
|       1|
|    null|
+--------+

ORC

scala> Seq(Some(1), None).toDF("col.dots").write.orc("/tmp/orc_dot")
scala> spark.read.orc("/tmp/orc_dot").show
org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input '.' expecting ':'(line 1, pos 10)

== SQL ==
struct<col.dots:int>
----------^^^

How was this patch tested?

After merging #18953, I will enable this test case.

dongjoon-hyun · 2017-08-20T04:04:37Z

cc @cloud-fan , @gatorsmile , @hvanhovell , @sameeragarwal , @rxin .

SparkQA · 2017-08-20T06:05:34Z

Test build #80883 has finished for PR 19004 at commit 5995bd5.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class OrcSourceSuite extends OrcSuite with SQLTestUtils

dongjoon-hyun · 2017-09-10T00:41:36Z

Since this is a test case only PR to confirm that, I'll include this into #18953 to save committer's review efforts.

dongjoon-hyun · 2017-09-10T01:00:34Z

Also, this PR is included into #17980, too.

[SPARK-21791][SQL] ORC should support column names with dot

5995bd5

dongjoon-hyun mentioned this pull request Aug 22, 2017

[SPARK-20682][SQL] Update ORC data source based on Apache ORC library #18953

Closed

dongjoon-hyun closed this Sep 10, 2017

dongjoon-hyun deleted the SPARK-21791 branch September 10, 2017 00:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-21791][SQL] ORC should support column names with dot #19004

[SPARK-21791][SQL] ORC should support column names with dot #19004

Uh oh!

dongjoon-hyun commented Aug 20, 2017

Uh oh!

dongjoon-hyun commented Aug 20, 2017

Uh oh!

SparkQA commented Aug 20, 2017

Uh oh!

dongjoon-hyun commented Sep 10, 2017

Uh oh!

dongjoon-hyun commented Sep 10, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[SPARK-21791][SQL] ORC should support column names with dot #19004

[SPARK-21791][SQL] ORC should support column names with dot #19004

Uh oh!

Conversation

dongjoon-hyun commented Aug 20, 2017

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

dongjoon-hyun commented Aug 20, 2017

Uh oh!

SparkQA commented Aug 20, 2017

Uh oh!

dongjoon-hyun commented Sep 10, 2017

Uh oh!

dongjoon-hyun commented Sep 10, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants