[SPARK-18647][SQL] do not put provider in table properties for Hive serde table #16080

cloud-fan · 2016-11-30T11:35:53Z

What changes were proposed in this pull request?

In Spark 2.1, we make Hive serde tables case-preserving by putting the table metadata in table properties, like what we did for data source table. However, we should not put table provider, as it will break forward compatibility. e.g. if we create a Hive serde table with Spark 2.1, using sql("create table test stored as parquet as select 1"), we will fail to read it with Spark 2.0, as Spark 2.0 mistakenly treat it as data source table because there is a provider entry in table properties.

Logically Hive serde table's provider is always hive, we don't need to store it in table properties, this PR removes it.

How was this patch tested?

manually test the forward compatibility issue.

cloud-fan · 2016-11-30T11:36:23Z

cc @rxin @yhuai @mallman

SparkQA · 2016-11-30T13:21:10Z

Test build #69406 has finished for PR 16080 at commit 89f1625.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

mallman · 2016-11-30T17:42:51Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala

    val tableProperties = tableMetaToTableProps(table)

+    // put table provider and partition provider in table properties.
+    tableProperties.put(DATASOURCE_PROVIDER, provider)


Why are we putting the provider name in the table properties here?

Previously we store the provider in the code path for both data source and hive serde tables. Now I move it to the data source table only code path.

mallman · 2016-11-30T17:45:38Z

I built and tested this branch, and it resolves the issue I was having with reading Spark 2.1 tables in earlier versions of Spark. Thanks!

rxin · 2016-11-30T19:54:34Z

We need a test case here.

gatorsmile · 2016-11-30T21:34:17Z

We need to provide forward compatibility? That is pretty hard.

rxin · 2016-11-30T21:39:28Z

In a lot of environments people run multiple Spark versions side by side. That's always been a big strength of Spark.

cloud-fan · 2016-12-01T00:44:26Z

@rxin I'm afraid it's hard to write forward compatibility tests using unit test, we may need an external test infrastructure(python scripts) to do this.

rxin · 2016-12-01T00:48:33Z

We can have a test to check the table properties don't contain the entry, can't we?

gatorsmile · 2016-12-01T02:25:30Z

I see. Will be careful in the future to not break the forward compatibility.

SparkQA · 2016-12-01T07:03:32Z

Test build #69459 has finished for PR 16080 at commit 198d273.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-12-01T09:58:34Z

Test build #69465 has finished for PR 16080 at commit 5ee6489.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2016-12-01T23:19:11Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogSuite.scala

+      provider = Some("hive"))
+    catalog.createTable(hiveTable, ignoreIfExists = false)
+
+    val rawTable = externalCatalog.client.getTable("db1", "hive_tbl")


Could we also add one more check for another API externalCatalog.getTable("db1", "hive_tbl")? The provider should contain DDLUtils.HIVE_PROVIDER

gatorsmile · 2016-12-01T23:23:25Z

The alterTable API in HiveExternalCatalog is still based on the provider field. We need a change too.

cloud-fan · 2016-12-02T01:11:59Z

@gatorsmile , alterTable don't need to change, the provider is not in table properties but still in the CatalogTable.provider field.

SparkQA · 2016-12-02T02:31:06Z

Test build #69525 has finished for PR 16080 at commit 3b81c33.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2016-12-02T03:07:46Z

@cloud-fan True. That is not the oldTableDef fetched from the metastore.

LGTM pending test

SparkQA · 2016-12-02T04:45:18Z

Test build #69535 has finished for PR 16080 at commit 00bdeff.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

…erde table ## What changes were proposed in this pull request? In Spark 2.1, we make Hive serde tables case-preserving by putting the table metadata in table properties, like what we did for data source table. However, we should not put table provider, as it will break forward compatibility. e.g. if we create a Hive serde table with Spark 2.1, using `sql("create table test stored as parquet as select 1")`, we will fail to read it with Spark 2.0, as Spark 2.0 mistakenly treat it as data source table because there is a `provider` entry in table properties. Logically Hive serde table's provider is always hive, we don't need to store it in table properties, this PR removes it. ## How was this patch tested? manually test the forward compatibility issue. Author: Wenchen Fan <[email protected]> Closes #16080 from cloud-fan/hive. (cherry picked from commit a5f02b0) Signed-off-by: Wenchen Fan <[email protected]>

cloud-fan · 2016-12-02T04:56:44Z

thanks for the review, merging to master/2.1!

…erde table ## What changes were proposed in this pull request? In Spark 2.1, we make Hive serde tables case-preserving by putting the table metadata in table properties, like what we did for data source table. However, we should not put table provider, as it will break forward compatibility. e.g. if we create a Hive serde table with Spark 2.1, using `sql("create table test stored as parquet as select 1")`, we will fail to read it with Spark 2.0, as Spark 2.0 mistakenly treat it as data source table because there is a `provider` entry in table properties. Logically Hive serde table's provider is always hive, we don't need to store it in table properties, this PR removes it. ## How was this patch tested? manually test the forward compatibility issue. Author: Wenchen Fan <[email protected]> Closes apache#16080 from cloud-fan/hive.

do not put provider in table properties for Hive serde table

89f1625

mallman reviewed Nov 30, 2016

View reviewed changes

add a test

5ee6489

cloud-fan force-pushed the hive branch from 198d273 to 5ee6489 Compare December 1, 2016 07:38

gatorsmile reviewed Dec 1, 2016

View reviewed changes

improve test

00bdeff

cloud-fan force-pushed the hive branch from 3b81c33 to 00bdeff Compare December 2, 2016 02:51

asfgit closed this in a5f02b0 Dec 2, 2016

[SPARK-18647][SQL] do not put provider in table properties for Hive serde table #16080

[SPARK-18647][SQL] do not put provider in table properties for Hive serde table #16080

Uh oh!

Conversation

cloud-fan commented Nov 30, 2016

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

cloud-fan commented Nov 30, 2016

Uh oh!

SparkQA commented Nov 30, 2016

Uh oh!

mallman Nov 30, 2016

Choose a reason for hiding this comment

Uh oh!

cloud-fan Dec 1, 2016

Choose a reason for hiding this comment

Uh oh!

mallman commented Nov 30, 2016

Uh oh!

rxin commented Nov 30, 2016

Uh oh!

gatorsmile commented Nov 30, 2016

Uh oh!

rxin commented Nov 30, 2016

Uh oh!

cloud-fan commented Dec 1, 2016

Uh oh!

rxin commented Dec 1, 2016

Uh oh!

gatorsmile commented Dec 1, 2016

Uh oh!

SparkQA commented Dec 1, 2016

Uh oh!

SparkQA commented Dec 1, 2016

Uh oh!

gatorsmile Dec 1, 2016

Choose a reason for hiding this comment

Uh oh!

gatorsmile commented Dec 1, 2016

Uh oh!

cloud-fan commented Dec 2, 2016

Uh oh!

SparkQA commented Dec 2, 2016

Uh oh!

gatorsmile commented Dec 2, 2016

Uh oh!

SparkQA commented Dec 2, 2016

Uh oh!

cloud-fan commented Dec 2, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants