Skip to content

Conversation

@gatorsmile
Copy link
Member

What changes were proposed in this pull request?

CTAS lost table properties after conversion to data source tables. For example,

CREATE TABLE t TBLPROPERTIES('prop1' = 'c', 'prop2' = 'd') AS SELECT 1 as a, 1 as b

The output of DESC FORMATTED t does not have the related properties.

|Table Parameters:           |                                                                                                              |       |
|  rawDataSize               |-1                                                                                                            |       |
|  numFiles                  |1                                                                                                             |       |
|  transient_lastDdlTime     |1471670983                                                                                                    |       |
|  totalSize                 |496                                                                                                           |       |
|  spark.sql.sources.provider|parquet                                                                                                       |       |
|  EXTERNAL                  |FALSE                                                                                                         |       |
|  COLUMN_STATS_ACCURATE     |false                                                                                                         |       |
|  numRows                   |-1                                                                                                            |       |
|                            |                                                                                                              |       |
|# Storage Information       |                                                                                                              |       |
|SerDe Library:              |org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe                                                   |       |
|InputFormat:                |org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat                                                 |       |
|OutputFormat:               |org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat                                                |       |
|Compressed:                 |No                                                                                                            |       |
|Storage Desc Parameters:    |                                                                                                              |       |
|  serialization.format      |1                                                                                                             |       |
|  path                      |file:/private/var/folders/4b/sgmfldk15js406vk7lw5llzw0000gn/T/warehouse-f3aa2927-6464-4a35-a715-1300dde6c614/t|       |

After the fix, the properties specified by users are stored as serde properties, since the table properties are used for storing table schemas and system generated properties.

|Table Parameters:           |                                                                                                              |       |
|  rawDataSize               |-1                                                                                                            |       |
|  numFiles                  |1                                                                                                             |       |
|  transient_lastDdlTime     |1471672182                                                                                                    |       |
|  totalSize                 |496                                                                                                           |       |
|  spark.sql.sources.provider|parquet                                                                                                       |       |
|  EXTERNAL                  |FALSE                                                                                                         |       |
|  COLUMN_STATS_ACCURATE     |false                                                                                                         |       |
|  numRows                   |-1                                                                                                            |       |
|                            |                                                                                                              |       |
|# Storage Information       |                                                                                                              |       |
|SerDe Library:              |org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe                                                   |       |
|InputFormat:                |org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat                                                 |       |
|OutputFormat:               |org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat                                                |       |
|Compressed:                 |No                                                                                                            |       |
|Storage Desc Parameters:    |                                                                                                              |       |
|  prop2                     |d                                                                                                             |       |
|  prop1                     |c                                                                                                             |       |
|  serialization.format      |1                                                                                                             |       |
|  path                      |file:/private/var/folders/4b/sgmfldk15js406vk7lw5llzw0000gn/T/warehouse-78c38cea-02c9-40aa-9b20-9803686069ae/t|       |
+----------------------------+--------------------------------------------------------------------------------------------------------------+-------+

How was this patch tested?

Added a test case.

@gatorsmile gatorsmile changed the title [SPARK-17166] [SQL] Store Table Properties Specified in CTAS after Conversion to Data Source Tables [SPARK-17166] [SQL] Store Table Properties in CTAS that is Converted to Data Source Tables Aug 20, 2016
@gatorsmile
Copy link
Member Author

cc @cloud-fan @yhuai This is what we discussed in another PR. Could you please review whether this is a right fix? Thanks!

@SparkQA
Copy link

SparkQA commented Aug 20, 2016

Test build #64125 has finished for PR 14727 at commit bffc412.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

assert(tableDesc.properties.get("prop1").isEmpty)
assert(tableDesc.properties.get("prop2").isEmpty)
assert(tableDesc.storage.properties.get("prop1") == Option("c"))
assert(tableDesc.storage.properties.get("prop2") == Option("d"))
Copy link
Contributor

@cloud-fan cloud-fan Aug 20, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this what we want? Why do the table properties in Hive serde table should go to storage properties in data source table?

Ideally data source table should have data source options(storage properties) and table properties. Currently we don't support specifying table properties for data source tables, but it doesn't mean we will never do it. I think we can do it when unify the CREATE TABLE syntax.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uh, agree! Let me close this PR. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants