-
Couldn't load subscription status.
- Fork 28.9k
[SPARK-35581][SQL] Support special datetime values in typed literals only #32714
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #139115 has finished for PR 32714 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
@cloud-fan @HyukjinKwon FYI, I did't add new tests because special values in typed literals are tested in |
|
Kubernetes integration test starting |
|
Test build #139120 has finished for PR 32714 at commit
|
|
Kubernetes integration test status success |
|
Test build #139124 has finished for PR 32714 at commit
|
docs/sql-migration-guide.md
Outdated
|
|
||
| - In Spark 3.2, `CREATE TABLE AS SELECT` with non-empty `LOCATION` will throw `AnalysisException`. To restore the behavior before Spark 3.2, you can set `spark.sql.legacy.allowNonEmptyLocationInCTAS` to `true`. | ||
|
|
||
| - In Spark 3.2, the special datetime values such as `epoch`, `today`, `yesterday`, `tomorrow` and `now` are supported in typed literals only, for instance `select timestamp'now'`. In Spark 3.1 and earlier, such special values are supported in any casts of strings to dates/timestamps. To restore the behavior before Spark 3.2, you should preprocess string columns and convert the strings to desired timestamps explicitly using UDF for instance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Spark 3.2, the special datetime values..... in typed literals only, for instance (add',') select timestamp'now'. In Spark 3.1 and earlier (3.0?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yaooqinn What do you mean by: for instance (add',') select timestamp'now'. I didn't get the problem. BTW, you could use the suggestion feature, so, I would just commit your suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @MaxGekk, thanks for your suggestion.
I think if users need to preprocess the data, we may not call it asTo restore the behavior before Spark 3.2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you propose? How about To have the behavior before Spark 3.2 ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about "to keep these special values as datetimes in Spark 3.1 and 3.0, you need to match them manually, e.g. if(c in ('now', 'today'), current_date(), c)".
I think it's better to suggest user use builtin functions than UDFs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that builtin functions is better suggestion. Let me update this.
|
@cloud-fan @HyukjinKwon Are you ok with the changes in general? |
Co-authored-by: Kent Yao <[email protected]>
|
Kubernetes integration test starting |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Kubernetes integration test status success |
|
|
||
| - In Spark 3.2, `CREATE TABLE AS SELECT` with non-empty `LOCATION` will throw `AnalysisException`. To restore the behavior before Spark 3.2, you can set `spark.sql.legacy.allowNonEmptyLocationInCTAS` to `true`. | ||
|
|
||
| - In Spark 3.2, special datetime values such as `epoch`, `today`, `yesterday`, `tomorrow`, and `now` are supported in typed literals only, for instance, `select timestamp'now'`. In Spark 3.1 and 3.0, such special values are supported in any casts of strings to dates/timestamps. To keep these special values as dates/timestamps in Spark 3.1 and 3.0, you should replace them manually, e.g. `if (c in ('now', 'today'), current_date(), cast(c as date))`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (c in ('now', 'today'), current_date(), cast(c as date))
What does it mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan See the example:
scala> val df = Seq("now", "2021-01-19", "today").toDF("c")
df: org.apache.spark.sql.DataFrame = [c: string]
scala> df.selectExpr("if (c in ('now', 'today'), current_date(), cast(c as date))").show(false)
+----------------------------------------------------------+
|(IF((c IN (now, today)), current_date(), CAST(c AS DATE)))|
+----------------------------------------------------------+
|2021-06-01 |
|2021-01-19 |
|2021-06-01 |
+----------------------------------------------------------+|
Test build #139147 has finished for PR 32714 at commit
|
|
GA passed. Merging to master. |
|
Test build #139150 has finished for PR 32714 at commit
|
|
Test build #139152 has finished for PR 32714 at commit
|
|
@MaxGekk QQ: why do the special values support zone id? I tried PostgreSQL and this is not supported. |
|
I suggest that we remove the support of zone id in the special strings to make things simple. |
@gengliangwang We follow PostgreSQL behavior strictly. Time zones are accepted but ignored, see the tests from PostgreSQL: |
|
@MaxGekk Oh, I made a mistake in the test with PostgreSQL. Sorry for that. |
### What changes were proposed in this pull request? In the PR, I propose to add new correctness rule `SpecialDatetimeValues` to the final analysis phase. It replaces casts of strings to date/timestamp_ltz/timestamp_ntz by literals of such types if the strings contain special datetime values like `today`, `yesterday` and `tomorrow`, and the input strings are foldable. ### Why are the changes needed? 1. To avoid a breaking change. 2. To improve user experience with Spark SQL. After the PR #32714, users have to use typed literals instead of implicit casts. For instance, at Spark 3.1: ```sql select ts_col > 'now'; ``` but the query fails at the moment, and users have to use typed timestamp literal: ```sql select ts_col > timestamp'now'; ``` ### Does this PR introduce _any_ user-facing change? No. Previous release 3.1 has supported the feature already till it was removed by #32714. ### How was this patch tested? 1. Manually tested via the sql command line: ```sql spark-sql> select cast('today' as date); 2021-08-24 spark-sql> select timestamp('today'); 2021-08-24 00:00:00 spark-sql> select timestamp'tomorrow' > 'today'; true ``` 2. By running new test suite: ``` $ build/sbt "sql/testOnly org.apache.spark.sql.catalyst.optimizer.SpecialDatetimeValuesSuite" ``` Closes #33816 from MaxGekk/foldable-datetime-special-values. Authored-by: Max Gekk <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
### What changes were proposed in this pull request? In the PR, I propose to add new correctness rule `SpecialDatetimeValues` to the final analysis phase. It replaces casts of strings to date/timestamp_ltz/timestamp_ntz by literals of such types if the strings contain special datetime values like `today`, `yesterday` and `tomorrow`, and the input strings are foldable. ### Why are the changes needed? 1. To avoid a breaking change. 2. To improve user experience with Spark SQL. After the PR #32714, users have to use typed literals instead of implicit casts. For instance, at Spark 3.1: ```sql select ts_col > 'now'; ``` but the query fails at the moment, and users have to use typed timestamp literal: ```sql select ts_col > timestamp'now'; ``` ### Does this PR introduce _any_ user-facing change? No. Previous release 3.1 has supported the feature already till it was removed by #32714. ### How was this patch tested? 1. Manually tested via the sql command line: ```sql spark-sql> select cast('today' as date); 2021-08-24 spark-sql> select timestamp('today'); 2021-08-24 00:00:00 spark-sql> select timestamp'tomorrow' > 'today'; true ``` 2. By running new test suite: ``` $ build/sbt "sql/testOnly org.apache.spark.sql.catalyst.optimizer.SpecialDatetimeValuesSuite" ``` Closes #33816 from MaxGekk/foldable-datetime-special-values. Authored-by: Max Gekk <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit df0ec56) Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
In the PR, I propose to support special datetime values introduced by #25708 and by #25716 only in typed literals, and don't recognize them in parsing strings to dates/timestamps. The following string values are supported only in typed timestamp literals:
epoch [zoneId]-1970-01-01 00:00:00+00 (Unix system time zero)today [zoneId]- midnight today.yesterday [zoneId]- midnight yesterdaytomorrow [zoneId]- midnight tomorrownow- current query start time.For example:
Similarly, the following special date values are supported only in typed date literals:
epoch [zoneId]-1970-01-01today [zoneId]- the current date in the time zone specified byspark.sql.session.timeZone.yesterday [zoneId]- the current date -1tomorrow [zoneId]- the current date + 1now- the date of running the current query. It has the same notion astoday.For example:
Why are the changes needed?
In the current implementation, Spark supports the special date/timestamp value in any input strings casted to dates/timestamps that leads to the following problems:
Does this PR introduce any user-facing change?
Yes but the probability should be small.
How was this patch tested?
By running existing test suites: