-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-18647][SQL] do not put provider in table properties for Hive serde table #16080
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #69406 has finished for PR 16080 at commit
|
| val tableProperties = tableMetaToTableProps(table) | ||
|
|
||
| // put table provider and partition provider in table properties. | ||
| tableProperties.put(DATASOURCE_PROVIDER, provider) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we putting the provider name in the table properties here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously we store the provider in the code path for both data source and hive serde tables. Now I move it to the data source table only code path.
|
I built and tested this branch, and it resolves the issue I was having with reading Spark 2.1 tables in earlier versions of Spark. Thanks! |
|
We need a test case here. |
|
We need to provide forward compatibility? That is pretty hard. |
|
In a lot of environments people run multiple Spark versions side by side. That's always been a big strength of Spark. |
|
@rxin I'm afraid it's hard to write forward compatibility tests using unit test, we may need an external test infrastructure(python scripts) to do this. |
|
We can have a test to check the table properties don't contain the entry, can't we? |
|
I see. Will be careful in the future to not break the forward compatibility. |
|
Test build #69459 has finished for PR 16080 at commit
|
|
Test build #69465 has finished for PR 16080 at commit
|
| provider = Some("hive")) | ||
| catalog.createTable(hiveTable, ignoreIfExists = false) | ||
|
|
||
| val rawTable = externalCatalog.client.getTable("db1", "hive_tbl") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we also add one more check for another API externalCatalog.getTable("db1", "hive_tbl")? The provider should contain DDLUtils.HIVE_PROVIDER
|
The |
|
@gatorsmile , |
|
Test build #69525 has finished for PR 16080 at commit
|
|
@cloud-fan True. That is not the LGTM pending test |
|
Test build #69535 has finished for PR 16080 at commit
|
…erde table
## What changes were proposed in this pull request?
In Spark 2.1, we make Hive serde tables case-preserving by putting the table metadata in table properties, like what we did for data source table. However, we should not put table provider, as it will break forward compatibility. e.g. if we create a Hive serde table with Spark 2.1, using `sql("create table test stored as parquet as select 1")`, we will fail to read it with Spark 2.0, as Spark 2.0 mistakenly treat it as data source table because there is a `provider` entry in table properties.
Logically Hive serde table's provider is always hive, we don't need to store it in table properties, this PR removes it.
## How was this patch tested?
manually test the forward compatibility issue.
Author: Wenchen Fan <[email protected]>
Closes #16080 from cloud-fan/hive.
(cherry picked from commit a5f02b0)
Signed-off-by: Wenchen Fan <[email protected]>
|
thanks for the review, merging to master/2.1! |
…erde table
## What changes were proposed in this pull request?
In Spark 2.1, we make Hive serde tables case-preserving by putting the table metadata in table properties, like what we did for data source table. However, we should not put table provider, as it will break forward compatibility. e.g. if we create a Hive serde table with Spark 2.1, using `sql("create table test stored as parquet as select 1")`, we will fail to read it with Spark 2.0, as Spark 2.0 mistakenly treat it as data source table because there is a `provider` entry in table properties.
Logically Hive serde table's provider is always hive, we don't need to store it in table properties, this PR removes it.
## How was this patch tested?
manually test the forward compatibility issue.
Author: Wenchen Fan <[email protected]>
Closes apache#16080 from cloud-fan/hive.
…erde table
## What changes were proposed in this pull request?
In Spark 2.1, we make Hive serde tables case-preserving by putting the table metadata in table properties, like what we did for data source table. However, we should not put table provider, as it will break forward compatibility. e.g. if we create a Hive serde table with Spark 2.1, using `sql("create table test stored as parquet as select 1")`, we will fail to read it with Spark 2.0, as Spark 2.0 mistakenly treat it as data source table because there is a `provider` entry in table properties.
Logically Hive serde table's provider is always hive, we don't need to store it in table properties, this PR removes it.
## How was this patch tested?
manually test the forward compatibility issue.
Author: Wenchen Fan <[email protected]>
Closes apache#16080 from cloud-fan/hive.
What changes were proposed in this pull request?
In Spark 2.1, we make Hive serde tables case-preserving by putting the table metadata in table properties, like what we did for data source table. However, we should not put table provider, as it will break forward compatibility. e.g. if we create a Hive serde table with Spark 2.1, using
sql("create table test stored as parquet as select 1"), we will fail to read it with Spark 2.0, as Spark 2.0 mistakenly treat it as data source table because there is aproviderentry in table properties.Logically Hive serde table's provider is always hive, we don't need to store it in table properties, this PR removes it.
How was this patch tested?
manually test the forward compatibility issue.