[SPARK-23355][SQL][DOC][FOLLOWUP] Add migration doc for TBLPROPERTIES #21269

dongjoon-hyun · 2018-05-08T16:01:33Z

What changes were proposed in this pull request?

In Apache Spark 2.4, SPARK-23355 fixes a bug which ignores table properties during convertMetastore for tables created by STORED AS ORC/PARQUET.

For some Parquet tables having table properties like TBLPROPERTIES (parquet.compression 'NONE'), it was ignored by default before Apache Spark 2.4. After upgrading cluster, Spark will write uncompressed file which is different from Apache Spark 2.3 and old.

This PR adds a migration note for that.

How was this patch tested?

N/A

…tore` conf

cloud-fan · 2018-05-08T16:06:49Z

docs/sql-programming-guide.md

  - Since Spark 2.4, creating a managed table with nonempty location is not allowed. An exception is thrown when attempting to create a managed table with nonempty location. To set `true` to `spark.sql.allowCreatingManagedTableUsingNonemptyLocation` restores the previous behavior. This option will be removed in Spark 3.0.
  - Since Spark 2.4, the type coercion rules can automatically promote the argument types of the variadic SQL functions (e.g., IN/COALESCE) to the widest common type, no matter how the input arguments order. In prior Spark versions, the promotion could fail in some specific orders (e.g., TimestampType, IntegerType and StringType) and throw an exception.
  - In version 2.3 and earlier, `to_utc_timestamp` and `from_utc_timestamp` respect the timezone in the input timestamp string, which breaks the assumption that the input timestamp is in a specific timezone. Therefore, these 2 functions can return unexpected results. In version 2.4 and later, this problem has been fixed. `to_utc_timestamp` and `from_utc_timestamp` will return null if the input timestamp string contains timezone. As an example, `from_utc_timestamp('2000-10-10 00:00:00', 'GMT+1')` will return `2000-10-10 01:00:00` in both Spark 2.3 and 2.4. However, `from_utc_timestamp('2000-10-10 00:00:00+00:00', 'GMT+1')`, assuming a local timezone of GMT+8, will return `2000-10-10 09:00:00` in Spark 2.3 but `null` in 2.4. For people who don't care about this problem and want to retain the previous behaivor to keep their query unchanged, you can set `spark.sql.function.rejectTimezoneInString` to false. This option will be removed in Spark 3.0 and should only be used as a temporary workaround.
+  - In version 2.3 and earlier, Spark converts Parquet Hive tables by default but ignores table properties like `TBLPROPERTIES (parquet.compression 'NONE')`. This happens for ORC Hive table properties like `TBLPROPERTIES (orc.compress 'NONE')` in case of `spark.sql.hive.convertMetastoreOrc=true`, too. Since Spark 2.4, Spark supports Parquet/ORC specific table properties while converting Parquet/ORC Hive tables. As an example, `CREATE TABLE t(id int) STORED AS PARQUET TBLPROPERTIES (parquet.compression 'NONE')` would generate Snappy parquet files during insertion in Spark 2.3, and in Spark 2.4, the result would be uncompressed parquet files.


Spark supports Parquet/ORC specific table properties while converting ..

supports -> respects

Thanks. It's updated.

cloud-fan · 2018-05-08T16:06:59Z

LGTM

SparkQA · 2018-05-08T16:15:59Z

Test build #90372 has finished for PR 21269 at commit 1647961.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-05-08T16:27:17Z

Test build #90373 has finished for PR 21269 at commit 75a6e17.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2018-05-09T00:39:21Z

Merged to master.

dongjoon-hyun · 2018-05-09T02:41:29Z

Thank you, @HyukjinKwon and @cloud-fan .

[SPARK-23355][SQL][DOC][FOLLOWUP] Add migration doc for `convertMetas…

1647961

…tore` conf

dongjoon-hyun mentioned this pull request May 8, 2018

[SPARK-24112][SQL] Add convertMetastoreTableProperty conf #21259

Closed

cloud-fan reviewed May 8, 2018

View reviewed changes

Address comments

75a6e17

asfgit closed this in 9498e52 May 9, 2018

dongjoon-hyun deleted the SPARK-23355-DOC branch May 9, 2018 02:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-23355][SQL][DOC][FOLLOWUP] Add migration doc for TBLPROPERTIES #21269

[SPARK-23355][SQL][DOC][FOLLOWUP] Add migration doc for TBLPROPERTIES #21269

Uh oh!

dongjoon-hyun commented May 8, 2018

Uh oh!

cloud-fan May 8, 2018

Uh oh!

dongjoon-hyun May 8, 2018

Uh oh!

cloud-fan commented May 8, 2018

Uh oh!

SparkQA commented May 8, 2018

Uh oh!

SparkQA commented May 8, 2018

Uh oh!

HyukjinKwon commented May 9, 2018

Uh oh!

dongjoon-hyun commented May 9, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SPARK-23355][SQL][DOC][FOLLOWUP] Add migration doc for TBLPROPERTIES #21269

[SPARK-23355][SQL][DOC][FOLLOWUP] Add migration doc for TBLPROPERTIES #21269

Uh oh!

Conversation

dongjoon-hyun commented May 8, 2018

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

cloud-fan May 8, 2018

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun May 8, 2018

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented May 8, 2018

Uh oh!

SparkQA commented May 8, 2018

Uh oh!

SparkQA commented May 8, 2018

Uh oh!

HyukjinKwon commented May 9, 2018

Uh oh!

dongjoon-hyun commented May 9, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants