Skip to content

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Sep 30, 2019

What changes were proposed in this pull request?

The date_part() function can accept the source parameter of the INTERVAL type (CalendarIntervalType). The following values of the field parameter are supported:

  • "MILLENNIUM" ("MILLENNIA", "MIL", "MILS") - number of millenniums in the given interval. It is YEAR / 1000.
  • "CENTURY" ("CENTURIES", "C", "CENT") - number of centuries in the interval calculated as YEAR / 100.
  • "DECADE" ("DECADES", "DEC", "DECS") - decades in the YEAR part of the interval calculated as YEAR / 10.
  • "YEAR" ("Y", "YEARS", "YR", "YRS") - years in a values of CalendarIntervalType. It is MONTHS / 12.
  • "QUARTER" ("QTR") - a quarter of year calculated as MONTHS / 3 + 1
  • "MONTH" ("MON", "MONS", "MONTHS") - the months part of the interval calculated as CalendarInterval.months % 12
  • "DAY" ("D", "DAYS") - total number of days in CalendarInterval.microseconds
  • "HOUR" ("H", "HOURS", "HR", "HRS") - the hour part of the interval.
  • "MINUTE" ("M", "MIN", "MINS", "MINUTES") - the minute part of the interval.
  • "SECOND" ("S", "SEC", "SECONDS", "SECS") - the seconds part with fractional microsecond part.
  • "MILLISECONDS" ("MSEC", "MSECS", "MILLISECON", "MSECONDS", "MS") - the millisecond part of the interval with fractional microsecond part.
  • "MICROSECONDS" ("USEC", "USECS", "USECONDS", "MICROSECON", "US") - the total number of microseconds in the second, millisecond and microsecond parts of the given interval.
  • "EPOCH" - the total number of seconds in the interval including the fractional part with microsecond precision. Here we assume 365.25 days per year (leap year every four years).

For example:

> SELECT date_part('days', interval 1 year 10 months 5 days);
 5
> SELECT date_part('seconds', interval 30 seconds 1 milliseconds 1 microseconds);
 30.001001

Why are the changes needed?

To maintain feature parity with PostgreSQL (https://www.postgresql.org/docs/11/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT)

Does this PR introduce any user-facing change?

No

How was this patch tested?

  • Added new test suite IntervalExpressionsSuite
  • Add new test cases to date_part.sql

@MaxGekk
Copy link
Member Author

MaxGekk commented Oct 12, 2019

jenkins, retest this, please

@SparkQA
Copy link

SparkQA commented Oct 12, 2019

Test build #111946 has finished for PR 25981 at commit e8a61c8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 12, 2019

Test build #111967 has finished for PR 25981 at commit a496d73.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member Author

MaxGekk commented Oct 13, 2019

@cloud-fan @dongjoon-hyun Please, review this PR one more time.

@MaxGekk
Copy link
Member Author

MaxGekk commented Oct 15, 2019

What else can I do here?

@cloud-fan
Copy link
Contributor

There is one question not addressed: https://github.com/apache/spark/pull/25981/files#r332578044

What's the motivation of this PR? If it's to add a new feature, internal consistency is very important. If it's for pgsql compatibility, let's follow pgsql completely and enable it only when dialect=pgsql.

@MaxGekk
Copy link
Member Author

MaxGekk commented Oct 17, 2019

There is one question not addressed: https://github.com/apache/spark/pull/25981/files#r332578044

Github didn't allow me to put my answer under @srowen question, and I had to continue discussion on the main page: #25981 (comment)

What's the motivation of this PR?

The motivation is still the same as I wrote in the PR description: To maintain feature parity with PostgreSQL (https://www.postgresql.org/docs/11/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT)

If it's for pgsql compatibility, let's follow pgsql completely and enable it only when dialect=pgsql.

The DatePart expressions for dates/timestamps was originally added for pgsql compatibility without any flags: #25410 . This PR extends DatePart to support the INTERVAL type for its parameter. I don't think this should be guarded by the dialect flag. If you believe we should hide entire DatePart under flag than it is out of scope of this PR.

@MaxGekk
Copy link
Member Author

MaxGekk commented Oct 17, 2019

Using precise correct constant 365.2425 days per year.

This seems better.

@cloud-fan Should I revert the last 2 commits?

@cloud-fan
Copy link
Contributor

Before I look into the code, let me ask a few high-level questions.

I'm ok with exposing the date_part function in Spark if it makes sense. After reading all the discussions, seems like the behavior of date_part: when extracting unit bigger than second, return an integer of the unit's value from the normalized date/timestamp/interval. when extracting unit smaller than second, return a fraction of the second unit's value from the normalized date/timestamp/interval. If that's how it is implemented, I'm fine.

@MaxGekk
Copy link
Member Author

MaxGekk commented Oct 17, 2019

when extracting unit smaller than second, return a fraction of the second unit's value from the normalized date/timestamp/interval.

Here are examples for units smaller than second in PostgreSQL (my implementation behaves the same):

maxim=# SELECT date_part('milliseconds', interval '10 minutes 30 seconds 1 milliseconds 1 microseconds');
 date_part
-----------
 30001.001
(1 row)

maxim=# SELECT date_part('microseconds', interval '10 minutes 30 seconds 1 milliseconds 1 microseconds');
 date_part
-----------
  30001001
(1 row)

Similar for timestamps:

maxim=# SELECT date_part('milliseconds', timestamp'2019-10-17 11:12:30.001001');
 date_part
-----------
 30001.001
(1 row)

maxim=# SELECT date_part('microseconds', timestamp'2019-10-17 11:12:30.001001');
 date_part
-----------
  30001001
(1 row)

case class ExtractIntervalMilliseconds(child: Expression)
extends ExtractIntervalPart(child, DecimalType(8, 3), getMilliseconds, "getMilliseconds")

case class ExtractIntervalMicroseconds(child: Expression)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not related to this PR, but we can apply the same naming policy to the related date/timestamp functions.

@SparkQA
Copy link

SparkQA commented Oct 18, 2019

Test build #112260 has finished for PR 25981 at commit d4375b5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • abstract class ExtractIntervalPart(
  • case class ExtractIntervalMillenniums(child: Expression)
  • case class ExtractIntervalCenturies(child: Expression)
  • case class ExtractIntervalDecades(child: Expression)
  • case class ExtractIntervalYears(child: Expression)
  • case class ExtractIntervalQuarters(child: Expression)
  • case class ExtractIntervalMonths(child: Expression)
  • case class ExtractIntervalDays(child: Expression)
  • case class ExtractIntervalHours(child: Expression)
  • case class ExtractIntervalMinutes(child: Expression)
  • case class ExtractIntervalSeconds(child: Expression)
  • case class ExtractIntervalMilliseconds(child: Expression)
  • case class ExtractIntervalMicroseconds(child: Expression)
  • case class ExtractIntervalEpoch(child: Expression)

@SparkQA
Copy link

SparkQA commented Oct 18, 2019

Test build #112262 has finished for PR 25981 at commit 5620472.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan cloud-fan closed this in 77fe8a8 Oct 18, 2019
@cloud-fan
Copy link
Contributor

thanks, merging to master!

@MaxGekk
Copy link
Member Author

MaxGekk commented Oct 18, 2019

@cloud-fan Thank you.

@MaxGekk MaxGekk deleted the extract-from-intervals branch June 5, 2020 19:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants