-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-37965][SQL] Remove check field name when reading/writing existing data in Orc #35253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I checked the history. Seems like we added this check mainly because Parquet restricts the column names that will be removed from #35229. So this change seems fine to me but would be great to double check w/ @dongjoon-hyun |
|
@AngersZhuuuu BTW, I think it would be great to explain why we can remove this change in the PR description with pointing out the commits in the history. |
Yea, will do this later. |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, @HyukjinKwon 's comment is correct.
Let's review this after #35229 landed to the master first.
Thank you for keeping Apache Spark data sources consistent.
|
This check is added in #19124 but change to use back quote to wrap field name in #29761 And in pr #29761 added a test
|
…g existing data in Orc" This reverts commit c4fbc9c.
|
thanks, merging to master! |
|
+1, LGTM. |
|
I think we still check for the empty character case, or add the Here are some tests. nativenative writeset spark.sql.orc.impl=native;
create table t_1 stored as orc as select '' ;suceess. native readset spark.sql.orc.impl=native;
select t_1;hive readset spark.sql.orc.impl=hive;
select t_1;hivehive writeset spark.sql.orc.impl=hive;
create table t_1 stored as orc as select '' ;use HiveFileFormatset spark.sql.hive.convertMetastoreOrc=false;
create table t_1 stored as orc as select '' ;org.apache.spark.sql.hive.execution.HiveFileFormat#supportFieldName spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveFileFormat.scala Lines 115 to 120 in 305388d
|
|
@AngersZhuuuu is there a way to only check field name in the write side? |
What changes were proposed in this pull request?
Remove
supportFieldNamecheck in DataSource ORCFormat.Why are the changes needed?
Remove unnecessary check
Does this PR introduce any user-facing change?
No
How was this patch tested?
Added UT