[SPARK-35581][SQL] Support special datetime values in typed literals only #32714

MaxGekk · 2021-05-31T12:12:45Z

What changes were proposed in this pull request?

In the PR, I propose to support special datetime values introduced by #25708 and by #25716 only in typed literals, and don't recognize them in parsing strings to dates/timestamps. The following string values are supported only in typed timestamp literals:

epoch [zoneId] - 1970-01-01 00:00:00+00 (Unix system time zero)
today [zoneId] - midnight today.
yesterday [zoneId] - midnight yesterday
tomorrow [zoneId] - midnight tomorrow
now - current query start time.

For example:

spark-sql> SELECT timestamp 'tomorrow';
2019-09-07 00:00:00

Similarly, the following special date values are supported only in typed date literals:

epoch [zoneId] - 1970-01-01
today [zoneId] - the current date in the time zone specified by spark.sql.session.timeZone.
yesterday [zoneId] - the current date -1
tomorrow [zoneId] - the current date + 1
now - the date of running the current query. It has the same notion as today.

For example:

spark-sql> SELECT date 'tomorrow' - date 'yesterday';
2

Why are the changes needed?

In the current implementation, Spark supports the special date/timestamp value in any input strings casted to dates/timestamps that leads to the following problems:

If executors have different system time, the result is inconsistent, and random. Column values depend on where the conversions were performed.
The special values play the role of distributed non-deterministic functions though users might think of the values as constants.

Does this PR introduce any user-facing change?

Yes but the probability should be small.

How was this patch tested?

By running existing test suites:

$ build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z interval.sql"
$ build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z date.sql"
$ build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z timestamp.sql"
$ build/sbt "test:testOnly *DateTimeUtilsSuite"

SparkQA · 2021-05-31T13:36:21Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43635/

SparkQA · 2021-05-31T14:08:07Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43635/

SparkQA · 2021-05-31T14:31:22Z

Test build #139115 has finished for PR 32714 at commit 193eeef.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-05-31T18:37:57Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43640/

SparkQA · 2021-05-31T19:09:49Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43640/

MaxGekk · 2021-05-31T20:44:06Z

@cloud-fan @HyukjinKwon FYI, I did't add new tests because special values in typed literals are tested in date.sql/timestamp.sql.

SparkQA · 2021-05-31T21:39:21Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43644/

SparkQA · 2021-05-31T22:04:36Z

Test build #139120 has finished for PR 32714 at commit aa78690.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-05-31T22:13:53Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43644/

SparkQA · 2021-06-01T01:07:08Z

Test build #139124 has finished for PR 32714 at commit 33b5ce3.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

yaooqinn · 2021-06-01T03:27:25Z

docs/sql-migration-guide.md


  - In Spark 3.2, `CREATE TABLE AS SELECT` with non-empty `LOCATION` will throw `AnalysisException`. To restore the behavior before Spark 3.2, you can set `spark.sql.legacy.allowNonEmptyLocationInCTAS` to `true`.

+  - In Spark 3.2, the special datetime values such as `epoch`, `today`, `yesterday`, `tomorrow` and `now` are supported in typed literals only, for instance `select timestamp'now'`. In Spark 3.1 and earlier, such special values are supported in any casts of strings to dates/timestamps. To restore the behavior before Spark 3.2, you should preprocess string columns and convert the strings to desired timestamps explicitly using UDF for instance.


In Spark 3.2, ~~the~~ special datetime values..... in typed literals only, for instance (add',') select timestamp'now'. In Spark 3.1 and ~~earlier~~ (3.0?)

@yaooqinn What do you mean by: for instance (add',') select timestamp'now'. I didn't get the problem. BTW, you could use the suggestion feature, so, I would just commit your suggestions.

Hi @MaxGekk, thanks for your suggestion.

I think if users need to preprocess the data, we may not call it asTo restore the behavior before Spark 3.2

What do you propose? How about To have the behavior before Spark 3.2 ...

How about "to keep these special values as datetimes in Spark 3.1 and 3.0, you need to match them manually, e.g. if(c in ('now', 'today'), current_date(), c)".

I think it's better to suggest user use builtin functions than UDFs

I agree that builtin functions is better suggestion. Let me update this.

docs/sql-migration-guide.md

MaxGekk · 2021-06-01T08:13:39Z

@cloud-fan @HyukjinKwon Are you ok with the changes in general?

Co-authored-by: Kent Yao <[email protected]>

SparkQA · 2021-06-01T09:25:58Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43667/

SparkQA · 2021-06-01T09:50:42Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43670/

SparkQA · 2021-06-01T10:06:28Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43667/

SparkQA · 2021-06-01T10:53:47Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43672/

SparkQA · 2021-06-01T11:21:51Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43670/

SparkQA · 2021-06-01T11:30:54Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43672/

cloud-fan · 2021-06-01T12:02:56Z

docs/sql-migration-guide.md


  - In Spark 3.2, `CREATE TABLE AS SELECT` with non-empty `LOCATION` will throw `AnalysisException`. To restore the behavior before Spark 3.2, you can set `spark.sql.legacy.allowNonEmptyLocationInCTAS` to `true`.

+  - In Spark 3.2, special datetime values such as `epoch`, `today`, `yesterday`, `tomorrow`, and `now` are supported in typed literals only, for instance, `select timestamp'now'`. In Spark 3.1 and 3.0, such special values are supported in any casts of strings to dates/timestamps. To keep these special values as dates/timestamps in Spark 3.1 and 3.0, you should replace them manually, e.g. `if (c in ('now', 'today'), current_date(), cast(c as date))`.


if (c in ('now', 'today'), current_date(), cast(c as date))

What does it mean?

@cloud-fan See the example:

scala> val df = Seq("now", "2021-01-19", "today").toDF("c") df: org.apache.spark.sql.DataFrame = [c: string] scala> df.selectExpr("if (c in ('now', 'today'), current_date(), cast(c as date))").show(false) +----------------------------------------------------------+ |(IF((c IN (now, today)), current_date(), CAST(c AS DATE)))| +----------------------------------------------------------+ |2021-06-01 | |2021-01-19 | |2021-06-01 | +----------------------------------------------------------+

SparkQA · 2021-06-01T12:13:18Z

Test build #139147 has finished for PR 32714 at commit 8c2e228.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

MaxGekk · 2021-06-01T12:28:31Z

GA passed. Merging to master.
Thank you @yaooqinn and @cloud-fan for your reviews.

SparkQA · 2021-06-01T13:26:04Z

Test build #139150 has finished for PR 32714 at commit 3d302dc.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-06-01T14:24:05Z

Test build #139152 has finished for PR 32714 at commit c8423fa.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gengliangwang · 2021-07-05T07:15:38Z

@MaxGekk QQ: why do the special values support zone id? I tried PostgreSQL and this is not supported.
Also, the zone id in the special string is not respected and won't affect the result. However, for a normal timestamp string the zone id is respected:

scala> spark.sql("select cast(timestamp'1970-01-01 00:00:00 UTC' as long)").show()
+-----------------------------------------------+
|CAST(TIMESTAMP '1970-01-01 08:00:00' AS BIGINT)|
+-----------------------------------------------+
|                                              0|
+-----------------------------------------------+


scala> spark.sql("select cast(timestamp'1970-01-01 00:00:00 CET' as long)").show()
+-----------------------------------------------+
|CAST(TIMESTAMP '1970-01-01 07:00:00' AS BIGINT)|
+-----------------------------------------------+
|                                          -3600|
+-----------------------------------------------+

gengliangwang · 2021-07-05T07:18:02Z

I suggest that we remove the support of zone id in the special strings to make things simple.

MaxGekk · 2021-07-05T09:39:36Z

QQ: why do the special values support zone id? I tried PostgreSQL and this is not supported.

@gengliangwang We follow PostgreSQL behavior strictly. Time zones are accepted but ignored, see the tests from PostgreSQL:
https://github.com/postgres/postgres/blob/master/src/test/regress/sql/timestamp.sql#L19-L20

gengliangwang · 2021-07-05T09:45:17Z

@MaxGekk Oh, I made a mistake in the test with PostgreSQL. Sorry for that.

### What changes were proposed in this pull request? In the PR, I propose to add new correctness rule `SpecialDatetimeValues` to the final analysis phase. It replaces casts of strings to date/timestamp_ltz/timestamp_ntz by literals of such types if the strings contain special datetime values like `today`, `yesterday` and `tomorrow`, and the input strings are foldable. ### Why are the changes needed? 1. To avoid a breaking change. 2. To improve user experience with Spark SQL. After the PR #32714, users have to use typed literals instead of implicit casts. For instance, at Spark 3.1: ```sql select ts_col > 'now'; ``` but the query fails at the moment, and users have to use typed timestamp literal: ```sql select ts_col > timestamp'now'; ``` ### Does this PR introduce _any_ user-facing change? No. Previous release 3.1 has supported the feature already till it was removed by #32714. ### How was this patch tested? 1. Manually tested via the sql command line: ```sql spark-sql> select cast('today' as date); 2021-08-24 spark-sql> select timestamp('today'); 2021-08-24 00:00:00 spark-sql> select timestamp'tomorrow' > 'today'; true ``` 2. By running new test suite: ``` $ build/sbt "sql/testOnly org.apache.spark.sql.catalyst.optimizer.SpecialDatetimeValuesSuite" ``` Closes #33816 from MaxGekk/foldable-datetime-special-values. Authored-by: Max Gekk <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

### What changes were proposed in this pull request? In the PR, I propose to add new correctness rule `SpecialDatetimeValues` to the final analysis phase. It replaces casts of strings to date/timestamp_ltz/timestamp_ntz by literals of such types if the strings contain special datetime values like `today`, `yesterday` and `tomorrow`, and the input strings are foldable. ### Why are the changes needed? 1. To avoid a breaking change. 2. To improve user experience with Spark SQL. After the PR #32714, users have to use typed literals instead of implicit casts. For instance, at Spark 3.1: ```sql select ts_col > 'now'; ``` but the query fails at the moment, and users have to use typed timestamp literal: ```sql select ts_col > timestamp'now'; ``` ### Does this PR introduce _any_ user-facing change? No. Previous release 3.1 has supported the feature already till it was removed by #32714. ### How was this patch tested? 1. Manually tested via the sql command line: ```sql spark-sql> select cast('today' as date); 2021-08-24 spark-sql> select timestamp('today'); 2021-08-24 00:00:00 spark-sql> select timestamp'tomorrow' > 'today'; true ``` 2. By running new test suite: ``` $ build/sbt "sql/testOnly org.apache.spark.sql.catalyst.optimizer.SpecialDatetimeValuesSuite" ``` Closes #33816 from MaxGekk/foldable-datetime-special-values. Authored-by: Max Gekk <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit df0ec56) Signed-off-by: Wenchen Fan <[email protected]>

MaxGekk added 2 commits May 31, 2021 14:45

Don't support special datetime values in cast

6a786ec

Remove special datetime values support from formatters

193eeef

github-actions bot added the SQL label May 31, 2021

MaxGekk added 4 commits May 31, 2021 19:33

Fix ParquetRebaseDatetimeV1Suite

2388a6c

Fix JsonFunctionsSuite

0e41efe

Fix CsvFunctionsSuite

b581a46

Re-gen postgreSQL/timestamp.sql

aa78690

MaxGekk changed the title ~~[WIP][SQL] Support special datetime values in typed literals only~~ [WIP][SPARK-35581][SQL] Support special datetime values in typed literals only May 31, 2021

Update the SQL migration guide.

33b5ce3

MaxGekk changed the title ~~[WIP][SPARK-35581][SQL] Support special datetime values in typed literals only~~ [SPARK-35581][SQL] Support special datetime values in typed literals only May 31, 2021

github-actions bot added the DOCS label May 31, 2021

MaxGekk requested review from HyukjinKwon and cloud-fan May 31, 2021 20:38

yaooqinn reviewed Jun 1, 2021

View reviewed changes

Improve the SQL migration guide.

8c2e228

yaooqinn reviewed Jun 1, 2021

View reviewed changes

docs/sql-migration-guide.md Outdated Show resolved Hide resolved

MaxGekk and others added 2 commits June 1, 2021 11:14

Update docs/sql-migration-guide.md

27a5f4f

Co-authored-by: Kent Yao <[email protected]>

restore -> have

3d302dc

cloud-fan approved these changes Jun 1, 2021

View reviewed changes

Suggest builtin functions.

c8423fa

yaooqinn approved these changes Jun 1, 2021

View reviewed changes

cloud-fan reviewed Jun 1, 2021

View reviewed changes

MaxGekk closed this in a59063d Jun 1, 2021

MaxGekk mentioned this pull request Aug 24, 2021

[SPARK-36567][SQL] Support foldable special datetime strings by CAST #33816

Closed


		- In Spark 3.2, `CREATE TABLE AS SELECT` with non-empty `LOCATION` will throw `AnalysisException`. To restore the behavior before Spark 3.2, you can set `spark.sql.legacy.allowNonEmptyLocationInCTAS` to `true`.

		- In Spark 3.2, the special datetime values such as `epoch`, `today`, `yesterday`, `tomorrow` and `now` are supported in typed literals only, for instance `select timestamp'now'`. In Spark 3.1 and earlier, such special values are supported in any casts of strings to dates/timestamps. To restore the behavior before Spark 3.2, you should preprocess string columns and convert the strings to desired timestamps explicitly using UDF for instance.


		- In Spark 3.2, `CREATE TABLE AS SELECT` with non-empty `LOCATION` will throw `AnalysisException`. To restore the behavior before Spark 3.2, you can set `spark.sql.legacy.allowNonEmptyLocationInCTAS` to `true`.

		- In Spark 3.2, special datetime values such as `epoch`, `today`, `yesterday`, `tomorrow`, and `now` are supported in typed literals only, for instance, `select timestamp'now'`. In Spark 3.1 and 3.0, such special values are supported in any casts of strings to dates/timestamps. To keep these special values as dates/timestamps in Spark 3.1 and 3.0, you should replace them manually, e.g. `if (c in ('now', 'today'), current_date(), cast(c as date))`.

Uh oh!

[SPARK-35581][SQL] Support special datetime values in typed literals only #32714

[SPARK-35581][SQL] Support special datetime values in typed literals only #32714

Uh oh!

Conversation

MaxGekk commented May 31, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

SparkQA commented May 31, 2021

Uh oh!

SparkQA commented May 31, 2021

Uh oh!

SparkQA commented May 31, 2021

Uh oh!

SparkQA commented May 31, 2021

Uh oh!

SparkQA commented May 31, 2021

Uh oh!

MaxGekk commented May 31, 2021

Uh oh!

SparkQA commented May 31, 2021

Uh oh!

SparkQA commented May 31, 2021

Uh oh!

SparkQA commented May 31, 2021

Uh oh!

SparkQA commented Jun 1, 2021

Uh oh!

yaooqinn Jun 1, 2021

Choose a reason for hiding this comment

Uh oh!

MaxGekk Jun 1, 2021

Choose a reason for hiding this comment

Uh oh!

yaooqinn Jun 1, 2021

Choose a reason for hiding this comment

Uh oh!

MaxGekk Jun 1, 2021

Choose a reason for hiding this comment

Uh oh!

yaooqinn Jun 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MaxGekk Jun 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

MaxGekk commented Jun 1, 2021

Uh oh!

SparkQA commented Jun 1, 2021

Uh oh!

SparkQA commented Jun 1, 2021

Uh oh!

SparkQA commented Jun 1, 2021

Uh oh!

SparkQA commented Jun 1, 2021

Uh oh!

SparkQA commented Jun 1, 2021

Uh oh!

SparkQA commented Jun 1, 2021

Uh oh!

cloud-fan Jun 1, 2021

Choose a reason for hiding this comment

Uh oh!

MaxGekk Jun 1, 2021

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jun 1, 2021

Uh oh!

MaxGekk commented Jun 1, 2021

Uh oh!

SparkQA commented Jun 1, 2021

Uh oh!

SparkQA commented Jun 1, 2021

MaxGekk commented May 31, 2021 •

edited

Loading

yaooqinn Jun 1, 2021 •

edited

Loading

MaxGekk Jun 1, 2021 •

edited

Loading