-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-28420][SQL] Support the INTERVAL type in date_part()
#25981
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
jenkins, retest this, please |
|
Test build #111946 has finished for PR 25981 at commit
|
|
Test build #111967 has finished for PR 25981 at commit
|
|
@cloud-fan @dongjoon-hyun Please, review this PR one more time. |
|
What else can I do here? |
|
There is one question not addressed: https://github.com/apache/spark/pull/25981/files#r332578044 What's the motivation of this PR? If it's to add a new feature, internal consistency is very important. If it's for pgsql compatibility, let's follow pgsql completely and enable it only when dialect=pgsql. |
Github didn't allow me to put my answer under @srowen question, and I had to continue discussion on the main page: #25981 (comment)
The motivation is still the same as I wrote in the PR description: To maintain feature parity with PostgreSQL (https://www.postgresql.org/docs/11/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT)
The |
@cloud-fan Should I revert the last 2 commits? |
|
Before I look into the code, let me ask a few high-level questions. I'm ok with exposing the |
Here are examples for units smaller than second in PostgreSQL (my implementation behaves the same): maxim=# SELECT date_part('milliseconds', interval '10 minutes 30 seconds 1 milliseconds 1 microseconds');
date_part
-----------
30001.001
(1 row)
maxim=# SELECT date_part('microseconds', interval '10 minutes 30 seconds 1 milliseconds 1 microseconds');
date_part
-----------
30001001
(1 row)Similar for timestamps: maxim=# SELECT date_part('milliseconds', timestamp'2019-10-17 11:12:30.001001');
date_part
-----------
30001.001
(1 row)
maxim=# SELECT date_part('microseconds', timestamp'2019-10-17 11:12:30.001001');
date_part
-----------
30001001
(1 row) |
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/intervalExpressions.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/intervalExpressions.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/intervalExpressions.scala
Outdated
Show resolved
Hide resolved
| case class ExtractIntervalMilliseconds(child: Expression) | ||
| extends ExtractIntervalPart(child, DecimalType(8, 3), getMilliseconds, "getMilliseconds") | ||
|
|
||
| case class ExtractIntervalMicroseconds(child: Expression) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not related to this PR, but we can apply the same naming policy to the related date/timestamp functions.
|
Test build #112260 has finished for PR 25981 at commit
|
|
Test build #112262 has finished for PR 25981 at commit
|
|
thanks, merging to master! |
|
@cloud-fan Thank you. |
What changes were proposed in this pull request?
The
date_part()function can accept thesourceparameter of theINTERVALtype (CalendarIntervalType). The following values of thefieldparameter are supported:"MILLENNIUM"("MILLENNIA","MIL","MILS") - number of millenniums in the given interval. It isYEAR / 1000."CENTURY"("CENTURIES","C","CENT") - number of centuries in the interval calculated asYEAR / 100."DECADE"("DECADES","DEC","DECS") - decades in theYEARpart of the interval calculated asYEAR / 10."YEAR"("Y","YEARS","YR","YRS") - years in a values ofCalendarIntervalType. It isMONTHS / 12."QUARTER"("QTR") - a quarter of year calculated asMONTHS / 3 + 1"MONTH"("MON","MONS","MONTHS") - the months part of the interval calculated asCalendarInterval.months % 12"DAY"("D","DAYS") - total number of days inCalendarInterval.microseconds"HOUR"("H","HOURS","HR","HRS") - the hour part of the interval."MINUTE"("M","MIN","MINS","MINUTES") - the minute part of the interval."SECOND"("S","SEC","SECONDS","SECS") - the seconds part with fractional microsecond part."MILLISECONDS"("MSEC","MSECS","MILLISECON","MSECONDS","MS") - the millisecond part of the interval with fractional microsecond part."MICROSECONDS"("USEC","USECS","USECONDS","MICROSECON","US") - the total number of microseconds in thesecond,millisecondandmicrosecondparts of the given interval."EPOCH"- the total number of seconds in the interval including the fractional part with microsecond precision. Here we assume 365.25 days per year (leap year every four years).For example:
Why are the changes needed?
To maintain feature parity with PostgreSQL (https://www.postgresql.org/docs/11/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT)
Does this PR introduce any user-facing change?
No
How was this patch tested?
IntervalExpressionsSuitedate_part.sql