Skip to content

Conversation

@budde
Copy link

@budde budde commented Mar 9, 2017

Add a new configuration option that allows Spark SQL to infer a case-sensitive schema from a Hive Metastore table's data files when a case-sensitive schema can't be read from the table properties.

  • Add spark.sql.hive.caseSensitiveInferenceMode param to SQLConf
  • Add schemaPreservesCase field to CatalogTable (set to false when schema can't
    successfully be read from Hive table props)
  • Perform schema inference in HiveMetastoreCatalog if schemaPreservesCase is
    false, depending on spark.sql.hive.caseSensitiveInferenceMode
  • Add alterTableSchema() method to the ExternalCatalog interface
  • Add HiveSchemaInferenceSuite tests
  • Refactor and move ParquetFileForamt.meregeMetastoreParquetSchema() as
    HiveMetastoreCatalog.mergeWithMetastoreSchema
  • Move schema merging tests from ParquetSchemaSuite to HiveSchemaInferenceSuite

JIRA for this change

The tests in HiveSchemaInferenceSuite should verify that schema inference is working as expected. ExternalCatalogSuite has also been extended to cover the new alterTableSchema() API.

@budde
Copy link
Author

budde commented Mar 9, 2017

Backport of #16944 to branch-2.1. Pinging @cloud-fan

@budde budde force-pushed the SPARK-19611-2.1 branch from 27e9f68 to 55b4ec9 Compare March 9, 2017 23:08
@budde
Copy link
Author

budde commented Mar 9, 2017

Made a quick fix to remove some dead code after merging w/the 2.1 version of HiveMetastoreCatalog

@SparkQA
Copy link

SparkQA commented Mar 10, 2017

Test build #74285 has finished for PR 17229 at commit 27e9f68.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 10, 2017

Test build #74286 has finished for PR 17229 at commit 55b4ec9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Mar 10, 2017

Hi, @budde and @cloud-fan .
If possible, could you hold on this backporting for a while before resolving the new issue?

Add a new configuration option that allows Spark SQL to infer a case-sensitive schema from a Hive Metastore table's data files when a case-sensitive schema can't be read from the table properties.

- Add spark.sql.hive.caseSensitiveInferenceMode param to SQLConf
- Add schemaPreservesCase field to CatalogTable (set to false when schema can't
  successfully be read from Hive table props)
- Perform schema inference in HiveMetastoreCatalog if schemaPreservesCase is
  false, depending on spark.sql.hive.caseSensitiveInferenceMode
- Add alterTableSchema() method to the ExternalCatalog interface
- Add HiveSchemaInferenceSuite tests
- Refactor and move ParquetFileForamt.meregeMetastoreParquetSchema() as
  HiveMetastoreCatalog.mergeWithMetastoreSchema
- Move schema merging tests from ParquetSchemaSuite to HiveSchemaInferenceSuite

[JIRA for this change](https://issues.apache.org/jira/browse/SPARK-19611)

The tests in ```HiveSchemaInferenceSuite``` should verify that schema inference is working as expected. ```ExternalCatalogSuite``` has also been extended to cover the new ```alterTableSchema()``` API.
@budde budde force-pushed the SPARK-19611-2.1 branch from 55b4ec9 to 1d6ed46 Compare March 10, 2017 21:19
@budde
Copy link
Author

budde commented Mar 10, 2017

Amended this commit with the changes made in #17249. The issue identified by @dongjoon-hyun should now be resolved.

@SparkQA
Copy link

SparkQA commented Mar 10, 2017

Test build #74339 has finished for PR 17229 at commit 1d6ed46.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

asfgit pushed a commit that referenced this pull request Mar 11, 2017
Add a new configuration option that allows Spark SQL to infer a case-sensitive schema from a Hive Metastore table's data files when a case-sensitive schema can't be read from the table properties.

- Add spark.sql.hive.caseSensitiveInferenceMode param to SQLConf
- Add schemaPreservesCase field to CatalogTable (set to false when schema can't
  successfully be read from Hive table props)
- Perform schema inference in HiveMetastoreCatalog if schemaPreservesCase is
  false, depending on spark.sql.hive.caseSensitiveInferenceMode
- Add alterTableSchema() method to the ExternalCatalog interface
- Add HiveSchemaInferenceSuite tests
- Refactor and move ParquetFileForamt.meregeMetastoreParquetSchema() as
  HiveMetastoreCatalog.mergeWithMetastoreSchema
- Move schema merging tests from ParquetSchemaSuite to HiveSchemaInferenceSuite

[JIRA for this change](https://issues.apache.org/jira/browse/SPARK-19611)

The tests in ```HiveSchemaInferenceSuite``` should verify that schema inference is working as expected. ```ExternalCatalogSuite``` has also been extended to cover the new ```alterTableSchema()``` API.

Author: Budde <[email protected]>

Closes #17229 from budde/SPARK-19611-2.1.
@cloud-fan
Copy link
Contributor

thanks, merging to 2.1!

@cloud-fan
Copy link
Contributor

@budde can you close this PR manually? thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants