Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

What changes were proposed in this pull request?

Currently, Apache Spark ORC data source doesn't support field names with dot properly. For a feature parity with Parquet, we had better support this. Since Apache ORC 1.4 supports this by #18953, this PR adds a test case to prevent a future regression.

PARQUET

scala> Seq(Some(1), None).toDF("col.dots").write.parquet("/tmp/parquet_dot")
scala> spark.read.parquet("/tmp/parquet_dot").show
+--------+
|col.dots|
+--------+
|       1|
|    null|
+--------+

ORC

scala> Seq(Some(1), None).toDF("col.dots").write.orc("/tmp/orc_dot")
scala> spark.read.orc("/tmp/orc_dot").show
org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input '.' expecting ':'(line 1, pos 10)

== SQL ==
struct<col.dots:int>
----------^^^

How was this patch tested?

After merging #18953, I will enable this test case.

@dongjoon-hyun
Copy link
Member Author

@SparkQA
Copy link

SparkQA commented Aug 20, 2017

Test build #80883 has finished for PR 19004 at commit 5995bd5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class OrcSourceSuite extends OrcSuite with SQLTestUtils

@dongjoon-hyun
Copy link
Member Author

Since this is a test case only PR to confirm that, I'll include this into #18953 to save committer's review efforts.

@dongjoon-hyun dongjoon-hyun deleted the SPARK-21791 branch September 10, 2017 00:42
@dongjoon-hyun
Copy link
Member Author

Also, this PR is included into #17980, too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants