Skip to content

Conversation

@cloud-fan
Copy link
Contributor

What changes were proposed in this pull request?

In Spark 2.1, we make Hive serde tables case-preserving by putting the table metadata in table properties, like what we did for data source table. However, we should not put table provider, as it will break forward compatibility. e.g. if we create a Hive serde table with Spark 2.1, using sql("create table test stored as parquet as select 1"), we will fail to read it with Spark 2.0, as Spark 2.0 mistakenly treat it as data source table because there is a provider entry in table properties.

Logically Hive serde table's provider is always hive, we don't need to store it in table properties, this PR removes it.

How was this patch tested?

manually test the forward compatibility issue.

@cloud-fan
Copy link
Contributor Author

cc @rxin @yhuai @mallman

@SparkQA
Copy link

SparkQA commented Nov 30, 2016

Test build #69406 has finished for PR 16080 at commit 89f1625.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val tableProperties = tableMetaToTableProps(table)

// put table provider and partition provider in table properties.
tableProperties.put(DATASOURCE_PROVIDER, provider)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we putting the provider name in the table properties here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously we store the provider in the code path for both data source and hive serde tables. Now I move it to the data source table only code path.

@mallman
Copy link
Contributor

mallman commented Nov 30, 2016

I built and tested this branch, and it resolves the issue I was having with reading Spark 2.1 tables in earlier versions of Spark. Thanks!

@rxin
Copy link
Contributor

rxin commented Nov 30, 2016

We need a test case here.

@gatorsmile
Copy link
Member

We need to provide forward compatibility? That is pretty hard.

@rxin
Copy link
Contributor

rxin commented Nov 30, 2016

In a lot of environments people run multiple Spark versions side by side. That's always been a big strength of Spark.

@cloud-fan
Copy link
Contributor Author

@rxin I'm afraid it's hard to write forward compatibility tests using unit test, we may need an external test infrastructure(python scripts) to do this.

@rxin
Copy link
Contributor

rxin commented Dec 1, 2016

We can have a test to check the table properties don't contain the entry, can't we?

@gatorsmile
Copy link
Member

I see. Will be careful in the future to not break the forward compatibility.

@SparkQA
Copy link

SparkQA commented Dec 1, 2016

Test build #69459 has finished for PR 16080 at commit 198d273.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 1, 2016

Test build #69465 has finished for PR 16080 at commit 5ee6489.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

provider = Some("hive"))
catalog.createTable(hiveTable, ignoreIfExists = false)

val rawTable = externalCatalog.client.getTable("db1", "hive_tbl")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we also add one more check for another API externalCatalog.getTable("db1", "hive_tbl")? The provider should contain DDLUtils.HIVE_PROVIDER

@gatorsmile
Copy link
Member

The alterTable API in HiveExternalCatalog is still based on the provider field. We need a change too.

@cloud-fan
Copy link
Contributor Author

@gatorsmile , alterTable don't need to change, the provider is not in table properties but still in the CatalogTable.provider field.

@SparkQA
Copy link

SparkQA commented Dec 2, 2016

Test build #69525 has finished for PR 16080 at commit 3b81c33.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

@cloud-fan True. That is not the oldTableDef fetched from the metastore.

LGTM pending test

@SparkQA
Copy link

SparkQA commented Dec 2, 2016

Test build #69535 has finished for PR 16080 at commit 00bdeff.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

asfgit pushed a commit that referenced this pull request Dec 2, 2016
…erde table

## What changes were proposed in this pull request?

In Spark 2.1, we make Hive serde tables case-preserving by putting the table metadata in table properties, like what we did for data source table. However, we should not put table provider, as it will break forward compatibility. e.g. if we create a Hive serde table with Spark 2.1, using `sql("create table test stored as parquet as select 1")`, we will fail to read it with Spark 2.0, as Spark 2.0 mistakenly treat it as data source table because there is a `provider` entry in table properties.

Logically Hive serde table's provider is always hive, we don't need to store it in table properties, this PR removes it.

## How was this patch tested?

manually test the forward compatibility issue.

Author: Wenchen Fan <[email protected]>

Closes #16080 from cloud-fan/hive.

(cherry picked from commit a5f02b0)
Signed-off-by: Wenchen Fan <[email protected]>
@asfgit asfgit closed this in a5f02b0 Dec 2, 2016
@cloud-fan
Copy link
Contributor Author

thanks for the review, merging to master/2.1!

robert3005 pushed a commit to palantir/spark that referenced this pull request Dec 2, 2016
…erde table

## What changes were proposed in this pull request?

In Spark 2.1, we make Hive serde tables case-preserving by putting the table metadata in table properties, like what we did for data source table. However, we should not put table provider, as it will break forward compatibility. e.g. if we create a Hive serde table with Spark 2.1, using `sql("create table test stored as parquet as select 1")`, we will fail to read it with Spark 2.0, as Spark 2.0 mistakenly treat it as data source table because there is a `provider` entry in table properties.

Logically Hive serde table's provider is always hive, we don't need to store it in table properties, this PR removes it.

## How was this patch tested?

manually test the forward compatibility issue.

Author: Wenchen Fan <[email protected]>

Closes apache#16080 from cloud-fan/hive.
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
…erde table

## What changes were proposed in this pull request?

In Spark 2.1, we make Hive serde tables case-preserving by putting the table metadata in table properties, like what we did for data source table. However, we should not put table provider, as it will break forward compatibility. e.g. if we create a Hive serde table with Spark 2.1, using `sql("create table test stored as parquet as select 1")`, we will fail to read it with Spark 2.0, as Spark 2.0 mistakenly treat it as data source table because there is a `provider` entry in table properties.

Logically Hive serde table's provider is always hive, we don't need to store it in table properties, this PR removes it.

## How was this patch tested?

manually test the forward compatibility issue.

Author: Wenchen Fan <[email protected]>

Closes apache#16080 from cloud-fan/hive.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants