Skip to content

Conversation

@adrian-wang
Copy link
Contributor

This subsumes #6782

@adrian-wang
Copy link
Contributor Author

cc @chenghao-intel

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we want to support:

timestamp + interval returns timestamp
date + interval returns date
date + int returns date ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, will update today. BTW, How do we tell the difference if we are doing
string + interval/int,
should we just take strings as date?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

string can get implicitly cast to date or timestamp, can't it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but then we have to decide it is date or timestamp, and that need to be done at run time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we can just put timestamp first, and as a result implicitly cast string to timestamp.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think date_add('2015-07-22', 1) return '2015-07-23 00:00:00' is not so natural...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just checked with hive 1.2.
hive> select date_add("2015-01-01 00:11:22", 2);
2015-01-03
hive> select date_add(cast("2015-01-01 00:11:22" as timestamp), 2);
2015-01-03

shall we use the same pattern?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

date_add is used for adding days, where does hive support timestamp + interval? in + or another special operator?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hive and oracle use + sign
select birthday + interval 3 days from xxx;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK that makes sense. Let's use date for string then.

@SparkQA
Copy link

SparkQA commented Jul 22, 2015

Test build #38051 has finished for PR 7589 at commit c506661.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class DateAdd(startDate: Expression, days: Expression)
    • case class DateSub(startDate: Expression, days: Expression)

@SparkQA
Copy link

SparkQA commented Jul 22, 2015

Test build #38078 has finished for PR 7589 at commit 1a68e03.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class DateAdd(startDate: Expression, days: Expression)
    • case class DateSub(startDate: Expression, days: Expression)
    • case class TimeAdd(start: Expression, interval: Expression)
    • case class TimeSub(start: Expression, interval: Expression)

@adrian-wang
Copy link
Contributor Author

@rxin On my second thought, I think we should keep date_add and date_sub as simple as it should be. When it comes to Datetime IntervalType computation, we need another node specially for this. Here are my considerations:

  1. hive 1.2 supports interval type, but still keep date_add and date_sub simple.
    2, use another node can keep the registered function the same as most database systems.
  2. for strings, it is hard to decide whether it is a date or timestamp, this would be troublesome. For most databases, we use date '2015-01-01' + interval XXXX to do this calculation. so the sql engine could know how we should take care of the string here.

And I have several other TODOS for the simple POC shown here, as follows:

  1. we need to let us be aware of the precision of the interval type. For the calculation here, we would return DateType if we are add a Date and an Year-Month Interval, otherwise we should return TimestampType. For POC shown here, I just use TimestampType as the general returning type. Maybe we need two interval types here.
  2. I need to get add support in parser. to translate the shiptime + interval XXXX into corresponding logical node.

2-1. the left operand of timestamp '2015-01-01 11:22:33' + interval XXXX should be translated into a Cast that cast string into corresponding type.

2-2. Then we translate the calculation to a TimeAdd or a TimeSub.

I think the whole stuff deserves another PR.
cc @chenghao-intel

@SparkQA
Copy link

SparkQA commented Jul 22, 2015

Test build #1163 has finished for PR 7589 at commit 1a68e03.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class DateAdd(startDate: Expression, days: Expression)
    • case class DateSub(startDate: Expression, days: Expression)
    • case class TimeAdd(start: Expression, interval: Expression)
    • case class TimeSub(start: Expression, interval: Expression)

@rxin
Copy link
Contributor

rxin commented Jul 22, 2015

Let's cast string to "date". not "timestamp" then.

We should probably also add an expression for date/timestamp + interval. In that case, I don't think we want to do implicit type cast.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rxin I did implement the expression for add date/timestamp and interval

@adrian-wang adrian-wang changed the title [SPARK-8186] [SPARK-8187] [SQL] datetime function: date_add, date_sub [SPARK-8186][SPARK-8187][SPARK-8194][SPARK-8198][SQL] functions: date_add, date_sub, add_months, months_between, time-interval calculation Jul 24, 2015
@SparkQA
Copy link

SparkQA commented Jul 24, 2015

Test build #38374 has finished for PR 7589 at commit 87c4b77.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class DateAdd(startDate: Expression, days: Expression)
    • case class DateSub(startDate: Expression, days: Expression)
    • case class TimeAdd(left: Expression, right: Expression)
    • case class TimeSub(left: Expression, right: Expression)
    • case class AddMonths(left: Expression, right: Expression)
    • case class MonthsBetween(left: Expression, right: Expression)

@SparkQA
Copy link

SparkQA commented Jul 24, 2015

Test build #38376 has finished for PR 7589 at commit 7fcd107.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class DateAdd(startDate: Expression, days: Expression)
    • case class DateSub(startDate: Expression, days: Expression)
    • case class TimeAdd(left: Expression, right: Expression)
    • case class TimeSub(left: Expression, right: Expression)
    • case class AddMonths(left: Expression, right: Expression)
    • case class MonthsBetween(left: Expression, right: Expression)

@SparkQA
Copy link

SparkQA commented Jul 24, 2015

Test build #38380 has finished for PR 7589 at commit 42df486.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class DateAdd(startDate: Expression, days: Expression)
    • case class DateSub(startDate: Expression, days: Expression)
    • case class TimeAdd(left: Expression, right: Expression)
    • case class TimeSub(left: Expression, right: Expression)
    • case class AddMonths(left: Expression, right: Expression)
    • case class MonthsBetween(left: Expression, right: Expression)

@adrian-wang
Copy link
Contributor Author

retest this please.

@SparkQA
Copy link

SparkQA commented Jul 24, 2015

Test build #38391 has finished for PR 7589 at commit e8a639a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class DateAdd(startDate: Expression, days: Expression)
    • case class DateSub(startDate: Expression, days: Expression)
    • case class TimeAdd(left: Expression, right: Expression)
    • case class TimeSub(left: Expression, right: Expression)
    • case class AddMonths(left: Expression, right: Expression)
    • case class MonthsBetween(left: Expression, right: Expression)

@SparkQA
Copy link

SparkQA commented Jul 24, 2015

Test build #92 has finished for PR 7589 at commit e8a639a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class DateAdd(startDate: Expression, days: Expression)
    • case class DateSub(startDate: Expression, days: Expression)
    • case class TimeAdd(left: Expression, right: Expression)
    • case class TimeSub(left: Expression, right: Expression)
    • case class AddMonths(left: Expression, right: Expression)
    • case class MonthsBetween(left: Expression, right: Expression)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we make argument names consistent with comment?

@yjshen
Copy link
Member

yjshen commented Jul 25, 2015

For months_between:

  1. we should create a new method to get (year, month, day) from a date, in order to avoid repeated call getYearAndDayInYear in getMonth/getYear/getDayOfMonth.
  2. when we fail to satisfy both the last day of its month or the same day, we should just use (time1 - time2) / (31.0 * MILLIS_PER_DAY * 1000L) to get month, no consideration of timezone at all.

@adrian-wang
Copy link
Contributor Author

@yjshen I have checked with hive 1.2
hive> select months_between('2014-10-01 00:00:00', '2014-09-16 12:00:00');
0.5

but if we just use (time1 - time2) / (31.0 * MILLIS_PER_DAY * 1000L)
we will get 0.4677419354838....

@SparkQA
Copy link

SparkQA commented Jul 27, 2015

Test build #38513 has finished for PR 7589 at commit 522e91a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class DateAdd(startDate: Expression, days: Expression)
    • case class DateSub(startDate: Expression, days: Expression)
    • case class TimeAdd(left: Expression, right: Expression)
    • case class TimeSub(left: Expression, right: Expression)
    • case class AddMonths(left: Expression, right: Expression)
    • case class MonthsBetween(left: Expression, right: Expression)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update the doc

@davies
Copy link
Contributor

davies commented Jul 28, 2015

@adrian-wang Thanks for working on this. Since we are getting close to 1.5 cut, there are several PRs on your side, do you mind to let me take over this one? (BTW, you will still take the credit.)

@adrian-wang
Copy link
Contributor Author

@davies You can go ahead. I left several comments here, please take a look before you take over this one, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants