-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-31527][SQL] date add/subtract interval only allow those day precision in ansi mode #28310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ecision in ansi mode
|
cc: @cloud-fan @dongjoon-hyun @HyukjinKwon many thanks. |
|
Test build #121666 has started for PR 28310 at commit |
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
Outdated
Show resolved
Hide resolved
|
Test build #121681 has finished for PR 28310 at commit
|
| } | ||
| case s @ Subtract(l, r) if s.childrenResolved => (l.dataType, r.dataType) match { | ||
| case (CalendarIntervalType, CalendarIntervalType) => s | ||
| case (DateType, CalendarIntervalType) => DateAddInterval(l, UnaryMinus(r)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good idea, maybe we can remove TimeSub later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, we can clean it up later
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
Outdated
Show resolved
Hide resolved
|
Test build #121708 has finished for PR 28310 at commit
|
|
Test build #121720 has finished for PR 28310 at commit
|
|
retest this please |
|
Test build #121738 has finished for PR 28310 at commit
|
|
seems legitimate test failures |
|
My mistake. Forgot to re-generate SQL stript test results |
|
Test build #121765 has finished for PR 28310 at commit
|
|
Test build #121799 has finished for PR 28310 at commit
|
|
retest this please |
|
The test failure is unrelated the contained |
|
Test build #121804 has finished for PR 28310 at commit
|
|
retest this please |
|
Test build #121805 has finished for PR 28310 at commit
|
|
thanks, merging to master/3.0! |
…ecision in ansi mode To follow ANSI,the expressions - `date + interval`, `interval + date` and `date - interval` should only accept intervals which the `microseconds` part is 0. Better ANSI compliance No, this PR should target 3.0.0 in which this feature is newly added. add more unit tests Closes #28310 from yaooqinn/SPARK-31527. Authored-by: Kent Yao <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit ebc8fa5) Signed-off-by: Wenchen Fan <[email protected]>
### What changes were proposed in this pull request? The implementation of TimeSub for the operation of timestamp subtracting interval is almost repetitive with TimeAdd. We can replace it with TimeAdd(l, -r) since there are equivalent. Suggestion from #28310 (comment) Besides, the Coercion rules for TimeAdd/TimeSub(date, interval) are useless anymore, so remove them in this PR since they are touched in this PR. ### Why are the changes needed? remove redundant and useless code for easy maintenance ### Does this PR introduce any user-facing change? Yes, the SQL string of `datetime - interval` become `datetime + (- interval)` ### How was this patch tested? modified existing unit tests. Closes #28381 from yaooqinn/SPARK-31586. Authored-by: Kent Yao <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
… add/subtract interval operations ### What changes were proposed in this pull request? With #28310, the operation of date +/- interval(m, d, 0) has been improved a lot. According to the benchmark results, about 75% time cost is reduced because of no casting date to timestamp back and forth. In this PR, we add a benchmark for these operations, and timestamp +/- interval operations as accessories. ### Why are the changes needed? Performance test coverage, since these operations are missing in the DateTimeBenchmark. ### Does this PR introduce any user-facing change? No, just test ### How was this patch tested? regenerated benchmark results Closes #28369 from yaooqinn/SPARK-31527-F. Authored-by: Kent Yao <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
… add/subtract interval operations ### What changes were proposed in this pull request? With #28310, the operation of date +/- interval(m, d, 0) has been improved a lot. According to the benchmark results, about 75% time cost is reduced because of no casting date to timestamp back and forth. In this PR, we add a benchmark for these operations, and timestamp +/- interval operations as accessories. ### Why are the changes needed? Performance test coverage, since these operations are missing in the DateTimeBenchmark. ### Does this PR introduce any user-facing change? No, just test ### How was this patch tested? regenerated benchmark results Closes #28369 from yaooqinn/SPARK-31527-F. Authored-by: Kent Yao <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 54996be) Signed-off-by: Wenchen Fan <[email protected]>
| interval: CalendarInterval): SQLDate = { | ||
| require(interval.microseconds == 0, | ||
| "Cannot add hours, minutes or seconds, milliseconds, microseconds to a date") | ||
| val ld = LocalDate.ofEpochDay(start).plusMonths(interval.months).plusDays(interval.days) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, the result depends on the order of plusMonths() and plusDays(). @yaooqinn Did you make the choice intentionally? I am asking you because adding days and months can be much cheaper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes,here we are follow the previous behavior of using timestamp + interval
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks. It would be nice to document such behavior of this function and timestampAddInterval somewhere. It is not obvious that we add month then days and then micros. The order could be opposite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, in snowflake internal '1 month 1 day' is different from internal '1 day 1 month'. We should at least document our own behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will make time for that PR.
What changes were proposed in this pull request?
To follow ANSI,the expressions -
date + interval,interval + dateanddate - intervalshould only accept intervals which themicrosecondspart is 0.Why are the changes needed?
Better ANSI compliance
Does this PR introduce any user-facing change?
No, this PR should target 3.0.0 in which this feature is newly added.
How was this patch tested?
add more unit tests