Skip to content

Conversation

@yaooqinn
Copy link
Member

What changes were proposed in this pull request?

To follow ANSI,the expressions - date + interval, interval + date and date - interval should only accept intervals which the microseconds part is 0.

Why are the changes needed?

Better ANSI compliance

Does this PR introduce any user-facing change?

No, this PR should target 3.0.0 in which this feature is newly added.

How was this patch tested?

add more unit tests

@yaooqinn
Copy link
Member Author

cc: @cloud-fan @dongjoon-hyun @HyukjinKwon many thanks.

@SparkQA
Copy link

SparkQA commented Apr 23, 2020

Test build #121666 has started for PR 28310 at commit 77e75a7.

@SparkQA
Copy link

SparkQA commented Apr 23, 2020

Test build #121681 has finished for PR 28310 at commit 9d7ea7e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class DateAddInterval(

}
case s @ Subtract(l, r) if s.childrenResolved => (l.dataType, r.dataType) match {
case (CalendarIntervalType, CalendarIntervalType) => s
case (DateType, CalendarIntervalType) => DateAddInterval(l, UnaryMinus(r))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good idea, maybe we can remove TimeSub later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, we can clean it up later

@SparkQA
Copy link

SparkQA commented Apr 24, 2020

Test build #121708 has finished for PR 28310 at commit 0c96bf5.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 24, 2020

Test build #121720 has finished for PR 28310 at commit 65a0e4d.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yaooqinn
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Apr 24, 2020

Test build #121738 has finished for PR 28310 at commit 65a0e4d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

seems legitimate test failures

@yaooqinn
Copy link
Member Author

My mistake. Forgot to re-generate SQL stript test results

@SparkQA
Copy link

SparkQA commented Apr 24, 2020

Test build #121765 has finished for PR 28310 at commit c1964a8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 25, 2020

Test build #121799 has finished for PR 28310 at commit c64d27e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yaooqinn
Copy link
Member Author

retest this please

@yaooqinn
Copy link
Member Author

yaooqinn commented Apr 25, 2020

The test failure is unrelated

the org.apache.spark.sql.execution.ui.AllExecutionsPageSuite.SPARK-27019:correctly display SQL page when event reordering happens is flaky for just checking the html not containing 1970. I will add a ticket to check and fix that.
In the specific failure above, it failed because

...
<td sorttable_customkey="1587806019707">
...

contained 1970

@SparkQA
Copy link

SparkQA commented Apr 25, 2020

Test build #121804 has finished for PR 28310 at commit c64d27e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yaooqinn
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Apr 25, 2020

Test build #121805 has finished for PR 28310 at commit c64d27e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yaooqinn yaooqinn requested a review from cloud-fan April 27, 2020 05:24
@cloud-fan
Copy link
Contributor

thanks, merging to master/3.0!

@cloud-fan cloud-fan closed this in ebc8fa5 Apr 27, 2020
cloud-fan pushed a commit that referenced this pull request Apr 27, 2020
…ecision in ansi mode

To follow ANSI,the expressions - `date + interval`, `interval + date` and `date - interval` should only accept intervals which the `microseconds` part is 0.

Better ANSI compliance

No, this PR should target 3.0.0 in which this feature is newly added.

add more unit tests

Closes #28310 from yaooqinn/SPARK-31527.

Authored-by: Kent Yao <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit ebc8fa5)
Signed-off-by: Wenchen Fan <[email protected]>
cloud-fan pushed a commit that referenced this pull request Apr 28, 2020
### What changes were proposed in this pull request?

The implementation of TimeSub for the operation of timestamp subtracting interval is almost repetitive with TimeAdd. We can replace it with TimeAdd(l, -r) since there are equivalent.

Suggestion from #28310 (comment)

Besides, the Coercion rules for TimeAdd/TimeSub(date, interval) are useless anymore, so remove them in this PR since they are touched in this PR.

### Why are the changes needed?

remove redundant and useless code for easy maintenance

### Does this PR introduce any user-facing change?

Yes, the SQL string of `datetime - interval` become `datetime + (- interval)`
### How was this patch tested?

modified existing unit tests.

Closes #28381 from yaooqinn/SPARK-31586.

Authored-by: Kent Yao <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
cloud-fan pushed a commit that referenced this pull request Apr 28, 2020
… add/subtract interval operations

### What changes were proposed in this pull request?
With #28310, the operation of date +/- interval(m, d, 0) has been improved a lot.

According to the benchmark results, about 75% time cost is reduced because of no casting date to timestamp back and forth.

In this PR, we add a benchmark for these operations, and timestamp +/- interval operations as accessories.

### Why are the changes needed?

Performance test coverage, since these operations are missing in the DateTimeBenchmark.

### Does this PR introduce any user-facing change?

No, just test

### How was this patch tested?

regenerated benchmark results

Closes #28369 from yaooqinn/SPARK-31527-F.

Authored-by: Kent Yao <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
cloud-fan pushed a commit that referenced this pull request Apr 28, 2020
… add/subtract interval operations

### What changes were proposed in this pull request?
With #28310, the operation of date +/- interval(m, d, 0) has been improved a lot.

According to the benchmark results, about 75% time cost is reduced because of no casting date to timestamp back and forth.

In this PR, we add a benchmark for these operations, and timestamp +/- interval operations as accessories.

### Why are the changes needed?

Performance test coverage, since these operations are missing in the DateTimeBenchmark.

### Does this PR introduce any user-facing change?

No, just test

### How was this patch tested?

regenerated benchmark results

Closes #28369 from yaooqinn/SPARK-31527-F.

Authored-by: Kent Yao <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 54996be)
Signed-off-by: Wenchen Fan <[email protected]>
interval: CalendarInterval): SQLDate = {
require(interval.microseconds == 0,
"Cannot add hours, minutes or seconds, milliseconds, microseconds to a date")
val ld = LocalDate.ofEpochDay(start).plusMonths(interval.months).plusDays(interval.days)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the result depends on the order of plusMonths() and plusDays(). @yaooqinn Did you make the choice intentionally? I am asking you because adding days and months can be much cheaper.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes,here we are follow the previous behavior of using timestamp + interval

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks. It would be nice to document such behavior of this function and timestampAddInterval somewhere. It is not obvious that we add month then days and then micros. The order could be opposite.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, in snowflake internal '1 month 1 day' is different from internal '1 day 1 month'. We should at least document our own behavior.

Copy link
Member Author

@yaooqinn yaooqinn May 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will make time for that PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants