Skip to content

Conversation

@budde
Copy link

@budde budde commented Mar 10, 2017

What changes were proposed in this pull request?

The HiveMetastoreCatalog.mergeWithMetastoreSchema() method added in #16944 may
not preserve the same field order as the metastore schema in some cases, which can cause
queries to fail. This change ensures that the metastore field order is preserved.

How was this patch tested?

A test for ensuring that metastore order is preserved was added to HiveSchemaInferenceSuite.
The particular failure usecase from #16944 was tested manually as well.

…ed schema

The ```HiveMetastoreCatalog.mergeWithMetastoreSchema()``` method added in apache#16944 may
not preserve the same field order as the metastore schema in some cases, which can cause
queries to fail. This change ensures that the metastore field order is preserved.

A test for ensuring that metastore order is preserved was added to ```HiveSchemaInferenceSuite.```
The particular failure usecase from apache#16944 was tested manually as well.
@budde
Copy link
Author

budde commented Mar 10, 2017

Pinging @cloud-fan and @dongjoon-hyun. Apologies for not catching this the first time around.

@cloud-fan
Copy link
Contributor

@dongjoon-hyun can you verify if it fixes your problem?

@budde
Copy link
Author

budde commented Mar 10, 2017

I've verified that it does using the same procedure but I'll let @dongjoon-hyun confirm as well.

@dongjoon-hyun
Copy link
Member

Sure, I'll test locally, too.

StructType(metastoreFields.map { case(name, field) =>
field.copy(name = inferredFields(name).name)
}.toSeq)
StructType(metastoreSchema.map(f => f.copy(name = inferredFields(f.name).name)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Now metastoreSchema is used instead of metastoreFields.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should ensure the proper ordering is used. Iterating over the metastoreFields map isn't guaranteed to maintain the original ordering.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. I'm still building this PR locally, but it looks good logically. :) I'll post my result soon.

@dongjoon-hyun
Copy link
Member

Great. I also verified the patch.

scala> sql("SELECT a, b FROM t1").show
+---+---+
|  a|  b|
+---+---+
|100|200|
+---+---+


scala> sql("SELECT * FROM t1").show
+---+---+-----+---+----+
|  a|  b|dummy|day|hour|
+---+---+-----+---+----+
|100|200| null|  1|  01|
+---+---+-----+---+----+

Thank you so much, @budde and @cloud-fan .

@SparkQA
Copy link

SparkQA commented Mar 10, 2017

Test build #74337 has finished for PR 17249 at commit 99eb2dc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@asfgit asfgit closed this in bc30351 Mar 10, 2017
@budde budde deleted the PreserveMetastoreFieldOrder branch March 24, 2017 21:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants