-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-32889][SQL] orc table column name supports special characters. #29761
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
test this please |
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
Outdated
Show resolved
Hide resolved
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
Outdated
Show resolved
Hide resolved
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
Outdated
Show resolved
Hide resolved
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
Outdated
Show resolved
Hide resolved
|
also cc @dongjoon-hyun @cloud-fan |
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
Outdated
Show resolved
Hide resolved
|
Test build #128719 has finished for PR 29761 at commit
|
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala
Outdated
Show resolved
Hide resolved
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
Outdated
Show resolved
Hide resolved
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we support special column names in data source already, I believe this PR is okay. I left a few comments, @jzc928 .
scala> Seq(1, 2).toDF("$").write.orc("/tmp/orc")
scala> spark.read.orc("/tmp/orc").printSchema
root
|-- $: integer (nullable = true)
scala> sc.version
res3: String = 3.0.1
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala
Outdated
Show resolved
Hide resolved
b378671 to
c3c7f4c
Compare
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala
Outdated
Show resolved
Hide resolved
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala
Show resolved
Hide resolved
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
Outdated
Show resolved
Hide resolved
|
Retest this please. |
|
@jzc928 . I left a few comments. Please update the PR accordingly. Although this is different from Parquet, but this is the same with JSON data source. So, I think we can accept this approach after revising the PR and passing Jenkins CI tests. |
|
@dongjoon-hyun comments fixed. |
|
Test build #128797 has finished for PR 29761 at commit
|
|
Retest this please. |
|
Test build #128828 has finished for PR 29761 at commit
|
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
Thank you for your first contribution, @jzc928 . |
What changes were proposed in this pull request?
make orc table column name support special characters like
$Why are the changes needed?
Special characters like
$are allowed in orc table column name by Hive.But it's error when execute command "CREATE TABLE tbl(
$INT, b INT) using orc" in spark. it's not compatible with Hive.Column name "$" contains invalid character(s). Please use alias to rename it.;Column name "$" contains invalid character(s). Please use alias to rename it.;org.apache.spark.sql.AnalysisException: Column name "$" contains invalid character(s). Please use alias to rename it.; at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$.checkFieldName(OrcFileFormat.scala:51) at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$.$anonfun$checkFieldNames$1(OrcFileFormat.scala:59) at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$.$anonfun$checkFieldNames$1$adapted(OrcFileFormat.scala:59) at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)Does this PR introduce any user-facing change?
No
How was this patch tested?
Add unit test