Skip to content

Conversation

@jzc928
Copy link

@jzc928 jzc928 commented Sep 15, 2020

What changes were proposed in this pull request?

make orc table column name support special characters like $

Why are the changes needed?

Special characters like $ are allowed in orc table column name by Hive.
But it's error when execute command "CREATE TABLE tbl($ INT, b INT) using orc" in spark. it's not compatible with Hive.

Column name "$" contains invalid character(s). Please use alias to rename it.;Column name "$" contains invalid character(s). Please use alias to rename it.;org.apache.spark.sql.AnalysisException: Column name "$" contains invalid character(s). Please use alias to rename it.; at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$.checkFieldName(OrcFileFormat.scala:51) at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$.$anonfun$checkFieldNames$1(OrcFileFormat.scala:59) at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$.$anonfun$checkFieldNames$1$adapted(OrcFileFormat.scala:59) at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)

Does this PR introduce any user-facing change?

No

How was this patch tested?

Add unit test

@wzhfy
Copy link
Contributor

wzhfy commented Sep 15, 2020

test this please

@wzhfy
Copy link
Contributor

wzhfy commented Sep 15, 2020

also cc @dongjoon-hyun @cloud-fan

@SparkQA
Copy link

SparkQA commented Sep 15, 2020

Test build #128719 has finished for PR 29761 at commit 0a2cca7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we support special column names in data source already, I believe this PR is okay. I left a few comments, @jzc928 .

scala> Seq(1, 2).toDF("$").write.orc("/tmp/orc")

scala> spark.read.orc("/tmp/orc").printSchema
root
 |-- $: integer (nullable = true)


scala> sc.version
res3: String = 3.0.1

@dongjoon-hyun
Copy link
Member

Retest this please.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Sep 17, 2020

@jzc928 . I left a few comments. Please update the PR accordingly. Although this is different from Parquet, but this is the same with JSON data source. So, I think we can accept this approach after revising the PR and passing Jenkins CI tests.

@jzc928
Copy link
Author

jzc928 commented Sep 17, 2020

@dongjoon-hyun comments fixed.

@SparkQA
Copy link

SparkQA commented Sep 17, 2020

Test build #128797 has finished for PR 29761 at commit c3c7f4c.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Retest this please.

@SparkQA
Copy link

SparkQA commented Sep 17, 2020

Test build #128828 has finished for PR 29761 at commit 7d63fd9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @jzc928 and @wzhfy .
Merged to master for Apache Spark 3.1.0 on December 2020.

@dongjoon-hyun
Copy link
Member

Thank you for your first contribution, @jzc928 .
You are added to the Apache Spark contributor group and SPARK-32889 is assigned to you.
Welcome to the Apache Spark community!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants