Skip to content

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented May 31, 2021

What changes were proposed in this pull request?

In the PR, I propose to support special datetime values introduced by #25708 and by #25716 only in typed literals, and don't recognize them in parsing strings to dates/timestamps. The following string values are supported only in typed timestamp literals:

  • epoch [zoneId] - 1970-01-01 00:00:00+00 (Unix system time zero)
  • today [zoneId] - midnight today.
  • yesterday [zoneId] - midnight yesterday
  • tomorrow [zoneId] - midnight tomorrow
  • now - current query start time.

For example:

spark-sql> SELECT timestamp 'tomorrow';
2019-09-07 00:00:00

Similarly, the following special date values are supported only in typed date literals:

  • epoch [zoneId] - 1970-01-01
  • today [zoneId] - the current date in the time zone specified by spark.sql.session.timeZone.
  • yesterday [zoneId] - the current date -1
  • tomorrow [zoneId] - the current date + 1
  • now - the date of running the current query. It has the same notion as today.

For example:

spark-sql> SELECT date 'tomorrow' - date 'yesterday';
2

Why are the changes needed?

In the current implementation, Spark supports the special date/timestamp value in any input strings casted to dates/timestamps that leads to the following problems:

  • If executors have different system time, the result is inconsistent, and random. Column values depend on where the conversions were performed.
  • The special values play the role of distributed non-deterministic functions though users might think of the values as constants.

Does this PR introduce any user-facing change?

Yes but the probability should be small.

How was this patch tested?

By running existing test suites:

$ build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z interval.sql"
$ build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z date.sql"
$ build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z timestamp.sql"
$ build/sbt "test:testOnly *DateTimeUtilsSuite"

@github-actions github-actions bot added the SQL label May 31, 2021
@SparkQA
Copy link

SparkQA commented May 31, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43635/

@SparkQA
Copy link

SparkQA commented May 31, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43635/

@SparkQA
Copy link

SparkQA commented May 31, 2021

Test build #139115 has finished for PR 32714 at commit 193eeef.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 31, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43640/

@SparkQA
Copy link

SparkQA commented May 31, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43640/

@MaxGekk MaxGekk changed the title [WIP][SQL] Support special datetime values in typed literals only [WIP][SPARK-35581][SQL] Support special datetime values in typed literals only May 31, 2021
@MaxGekk MaxGekk changed the title [WIP][SPARK-35581][SQL] Support special datetime values in typed literals only [SPARK-35581][SQL] Support special datetime values in typed literals only May 31, 2021
@github-actions github-actions bot added the DOCS label May 31, 2021
@MaxGekk MaxGekk requested review from HyukjinKwon and cloud-fan May 31, 2021 20:38
@MaxGekk
Copy link
Member Author

MaxGekk commented May 31, 2021

@cloud-fan @HyukjinKwon FYI, I did't add new tests because special values in typed literals are tested in date.sql/timestamp.sql.

@SparkQA
Copy link

SparkQA commented May 31, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43644/

@SparkQA
Copy link

SparkQA commented May 31, 2021

Test build #139120 has finished for PR 32714 at commit aa78690.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 31, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43644/

@SparkQA
Copy link

SparkQA commented Jun 1, 2021

Test build #139124 has finished for PR 32714 at commit 33b5ce3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


- In Spark 3.2, `CREATE TABLE AS SELECT` with non-empty `LOCATION` will throw `AnalysisException`. To restore the behavior before Spark 3.2, you can set `spark.sql.legacy.allowNonEmptyLocationInCTAS` to `true`.

- In Spark 3.2, the special datetime values such as `epoch`, `today`, `yesterday`, `tomorrow` and `now` are supported in typed literals only, for instance `select timestamp'now'`. In Spark 3.1 and earlier, such special values are supported in any casts of strings to dates/timestamps. To restore the behavior before Spark 3.2, you should preprocess string columns and convert the strings to desired timestamps explicitly using UDF for instance.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Spark 3.2, the special datetime values..... in typed literals only, for instance (add',') select timestamp'now'. In Spark 3.1 and earlier (3.0?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yaooqinn What do you mean by: for instance (add',') select timestamp'now'. I didn't get the problem. BTW, you could use the suggestion feature, so, I would just commit your suggestions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @MaxGekk, thanks for your suggestion.

I think if users need to preprocess the data, we may not call it asTo restore the behavior before Spark 3.2

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you propose? How about To have the behavior before Spark 3.2 ...

Copy link
Member

@yaooqinn yaooqinn Jun 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about "to keep these special values as datetimes in Spark 3.1 and 3.0, you need to match them manually, e.g. if(c in ('now', 'today'), current_date(), c)".

I think it's better to suggest user use builtin functions than UDFs

Copy link
Member Author

@MaxGekk MaxGekk Jun 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that builtin functions is better suggestion. Let me update this.

@MaxGekk
Copy link
Member Author

MaxGekk commented Jun 1, 2021

@cloud-fan @HyukjinKwon Are you ok with the changes in general?

@SparkQA
Copy link

SparkQA commented Jun 1, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43667/

@SparkQA
Copy link

SparkQA commented Jun 1, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43670/

@SparkQA
Copy link

SparkQA commented Jun 1, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43667/

@SparkQA
Copy link

SparkQA commented Jun 1, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43672/

@SparkQA
Copy link

SparkQA commented Jun 1, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43670/

@SparkQA
Copy link

SparkQA commented Jun 1, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43672/


- In Spark 3.2, `CREATE TABLE AS SELECT` with non-empty `LOCATION` will throw `AnalysisException`. To restore the behavior before Spark 3.2, you can set `spark.sql.legacy.allowNonEmptyLocationInCTAS` to `true`.

- In Spark 3.2, special datetime values such as `epoch`, `today`, `yesterday`, `tomorrow`, and `now` are supported in typed literals only, for instance, `select timestamp'now'`. In Spark 3.1 and 3.0, such special values are supported in any casts of strings to dates/timestamps. To keep these special values as dates/timestamps in Spark 3.1 and 3.0, you should replace them manually, e.g. `if (c in ('now', 'today'), current_date(), cast(c as date))`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (c in ('now', 'today'), current_date(), cast(c as date))

What does it mean?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan See the example:

scala> val df = Seq("now", "2021-01-19", "today").toDF("c")
df: org.apache.spark.sql.DataFrame = [c: string]

scala> df.selectExpr("if (c in ('now', 'today'), current_date(), cast(c as date))").show(false)
+----------------------------------------------------------+
|(IF((c IN (now, today)), current_date(), CAST(c AS DATE)))|
+----------------------------------------------------------+
|2021-06-01                                                |
|2021-01-19                                                |
|2021-06-01                                                |
+----------------------------------------------------------+

@SparkQA
Copy link

SparkQA commented Jun 1, 2021

Test build #139147 has finished for PR 32714 at commit 8c2e228.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member Author

MaxGekk commented Jun 1, 2021

GA passed. Merging to master.
Thank you @yaooqinn and @cloud-fan for your reviews.

@MaxGekk MaxGekk closed this in a59063d Jun 1, 2021
@SparkQA
Copy link

SparkQA commented Jun 1, 2021

Test build #139150 has finished for PR 32714 at commit 3d302dc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 1, 2021

Test build #139152 has finished for PR 32714 at commit c8423fa.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gengliangwang
Copy link
Member

@MaxGekk QQ: why do the special values support zone id? I tried PostgreSQL and this is not supported.
Also, the zone id in the special string is not respected and won't affect the result. However, for a normal timestamp string the zone id is respected:

scala> spark.sql("select cast(timestamp'1970-01-01 00:00:00 UTC' as long)").show()
+-----------------------------------------------+
|CAST(TIMESTAMP '1970-01-01 08:00:00' AS BIGINT)|
+-----------------------------------------------+
|                                              0|
+-----------------------------------------------+


scala> spark.sql("select cast(timestamp'1970-01-01 00:00:00 CET' as long)").show()
+-----------------------------------------------+
|CAST(TIMESTAMP '1970-01-01 07:00:00' AS BIGINT)|
+-----------------------------------------------+
|                                          -3600|
+-----------------------------------------------+


@gengliangwang
Copy link
Member

I suggest that we remove the support of zone id in the special strings to make things simple.

@MaxGekk
Copy link
Member Author

MaxGekk commented Jul 5, 2021

QQ: why do the special values support zone id? I tried PostgreSQL and this is not supported.

@gengliangwang We follow PostgreSQL behavior strictly. Time zones are accepted but ignored, see the tests from PostgreSQL:
https://github.com/postgres/postgres/blob/master/src/test/regress/sql/timestamp.sql#L19-L20

@gengliangwang
Copy link
Member

@MaxGekk Oh, I made a mistake in the test with PostgreSQL. Sorry for that.

cloud-fan pushed a commit that referenced this pull request Aug 25, 2021
### What changes were proposed in this pull request?
In the PR, I propose to add new correctness rule `SpecialDatetimeValues` to the final analysis phase. It replaces casts of strings to date/timestamp_ltz/timestamp_ntz by literals of such types if the strings contain special datetime values like `today`, `yesterday` and `tomorrow`, and the input strings are foldable.

### Why are the changes needed?
1. To avoid a breaking change.
2. To improve user experience with Spark SQL. After the PR #32714, users have to use typed literals instead of implicit casts. For instance,
at Spark 3.1:
```sql
select ts_col > 'now';
```
but the query fails at the moment, and users have to use typed timestamp literal:
```sql
select ts_col > timestamp'now';
```

### Does this PR introduce _any_ user-facing change?
No. Previous release 3.1 has supported the feature already till it was removed by #32714.

### How was this patch tested?
1. Manually tested via the sql command line:
```sql
spark-sql> select cast('today' as date);
2021-08-24
spark-sql> select timestamp('today');
2021-08-24 00:00:00
spark-sql> select timestamp'tomorrow' > 'today';
true
```
2. By running new test suite:
```
$ build/sbt "sql/testOnly org.apache.spark.sql.catalyst.optimizer.SpecialDatetimeValuesSuite"
```

Closes #33816 from MaxGekk/foldable-datetime-special-values.

Authored-by: Max Gekk <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
cloud-fan pushed a commit that referenced this pull request Aug 25, 2021
### What changes were proposed in this pull request?
In the PR, I propose to add new correctness rule `SpecialDatetimeValues` to the final analysis phase. It replaces casts of strings to date/timestamp_ltz/timestamp_ntz by literals of such types if the strings contain special datetime values like `today`, `yesterday` and `tomorrow`, and the input strings are foldable.

### Why are the changes needed?
1. To avoid a breaking change.
2. To improve user experience with Spark SQL. After the PR #32714, users have to use typed literals instead of implicit casts. For instance,
at Spark 3.1:
```sql
select ts_col > 'now';
```
but the query fails at the moment, and users have to use typed timestamp literal:
```sql
select ts_col > timestamp'now';
```

### Does this PR introduce _any_ user-facing change?
No. Previous release 3.1 has supported the feature already till it was removed by #32714.

### How was this patch tested?
1. Manually tested via the sql command line:
```sql
spark-sql> select cast('today' as date);
2021-08-24
spark-sql> select timestamp('today');
2021-08-24 00:00:00
spark-sql> select timestamp'tomorrow' > 'today';
true
```
2. By running new test suite:
```
$ build/sbt "sql/testOnly org.apache.spark.sql.catalyst.optimizer.SpecialDatetimeValuesSuite"
```

Closes #33816 from MaxGekk/foldable-datetime-special-values.

Authored-by: Max Gekk <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit df0ec56)
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants