-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-19611][SQL] Preserve metastore field order when merging inferred schema #17249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ed schema The ```HiveMetastoreCatalog.mergeWithMetastoreSchema()``` method added in apache#16944 may not preserve the same field order as the metastore schema in some cases, which can cause queries to fail. This change ensures that the metastore field order is preserved. A test for ensuring that metastore order is preserved was added to ```HiveSchemaInferenceSuite.``` The particular failure usecase from apache#16944 was tested manually as well.
|
Pinging @cloud-fan and @dongjoon-hyun. Apologies for not catching this the first time around. |
|
@dongjoon-hyun can you verify if it fixes your problem? |
|
I've verified that it does using the same procedure but I'll let @dongjoon-hyun confirm as well. |
|
Sure, I'll test locally, too. |
| StructType(metastoreFields.map { case(name, field) => | ||
| field.copy(name = inferredFields(name).name) | ||
| }.toSeq) | ||
| StructType(metastoreSchema.map(f => f.copy(name = inferredFields(f.name).name))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Now metastoreSchema is used instead of metastoreFields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should ensure the proper ordering is used. Iterating over the metastoreFields map isn't guaranteed to maintain the original ordering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. I'm still building this PR locally, but it looks good logically. :) I'll post my result soon.
|
Great. I also verified the patch. scala> sql("SELECT a, b FROM t1").show
+---+---+
| a| b|
+---+---+
|100|200|
+---+---+
scala> sql("SELECT * FROM t1").show
+---+---+-----+---+----+
| a| b|dummy|day|hour|
+---+---+-----+---+----+
|100|200| null| 1| 01|
+---+---+-----+---+----+Thank you so much, @budde and @cloud-fan . |
|
Test build #74337 has finished for PR 17249 at commit
|
|
thanks, merging to master! |
What changes were proposed in this pull request?
The
HiveMetastoreCatalog.mergeWithMetastoreSchema()method added in #16944 maynot preserve the same field order as the metastore schema in some cases, which can cause
queries to fail. This change ensures that the metastore field order is preserved.
How was this patch tested?
A test for ensuring that metastore order is preserved was added to
HiveSchemaInferenceSuite.The particular failure usecase from #16944 was tested manually as well.