-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-31507][SQL] Remove uncommon fields support and update some fields with meaningful names for extract function #28284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…icrosecond and epoch fields support from extract fucntion
|
Test build #121581 has finished for PR 28284 at commit
|
cloud-fan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. We can add them back later if we find them useful. It's very difficult to remove them if they are released with 3.0
|
Test build #121586 has finished for PR 28284 at commit
|
|
cc @maropu |
| case "DAY" | "D" | "DAYS" => DayOfMonth(source) | ||
| case "DAYOFWEEK" | "DOW" => DayOfWeek(source) | ||
| case "ISODOW" => Add(WeekDay(source), Literal(1)) | ||
| case "DAYOFWEEK_ISO" | "DOW_ISO" => Add(WeekDay(source), Literal(1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unlike the PR title, this PR is adding DAYOFWEEK_ISO newly. Which platform are you referring for DAYOFWEEK_ISO? IIRC, @gatorsmile commented that we should not consider IBM DB i.
isodow is PostgreSQL specific but iso as a suffix is more commonly used across platforms
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @dongjoon-hyun, for historical reasons, we have [dayofweek, dow] implemented for representing a non-ISO day-of-week and a newly added isodow from PostgreSQL for ISO day-of-week. Many other systems only have one week-numbering system support and use either full names or abbreviations. Things in spark become a little bit complicated.
- because of the existence of
isodow, so we need to add iso-prefix todayofweekto make a pair for it too. [dayofweek,isodayofweek,dowandisodow] - because there are rare
iso-prefixed systems and more systems chooseiso-suffixed way, so we may result in [dayofweek,dayofweekiso,dow,dowiso] dayofweekisolooks nice and has use cases in the platforms listed above, e.g. snowflake, butdowisolooks weird and no use cases found.- after a discussion with @cloud-fan, we have both agreed with an underscore before
isomay look much better becauseisodowis new and there is no standard forisokind of things, so this may be good for us to make it simple and clear for end-users if they are well documented too.
Thus, we finally result in [dayofweek, dow] for Non-ISO week-numbering system and [dayofweek_iso, dow_iso] for ISO system
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yaooqinn . Ya. It's clear for that part. Could you enumerate the name of systems which supports DAYOFWEEK_ISO? (except IBM DB i). I'm wondering that specifically.
It's because you wrote like Many other systems in the PR description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifically, snowflake use dayofweek_iso except db2. I couldn't find more because most platforms do not have two day-of-week system implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Thanks. I understand that Snowflake is considered important. Could you add some of the above comment(#28284 (comment)) to the PR description then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @dongjoon-hyun. PR description updated
|
thanks, merging to master/3.0! |
…lds with meaningful names for extract function ### What changes were proposed in this pull request? Extracting millennium, century, decade, millisecond, microsecond and epoch from datetime is neither ANSI standard nor quite common in modern SQL platforms. Most of the systems listing below does not support these except PostgreSQL and redshift. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions050.htm https://prestodb.io/docs/current/functions/datetime.html https://docs.cloudera.com/documentation/enterprise/5-8-x/topics/impala_datetime_functions.html https://docs.snowflake.com/en/sql-reference/functions-date-time.html#label-supported-date-time-parts https://www.postgresql.org/docs/9.1/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT This PR removes these extract fields support from extract function for date and timestamp values `isoyear` is PostgreSQL specific but `yearofweek` is more commonly used across platforms `isodow` is PostgreSQL specific but `iso` as a suffix is more commonly used across platforms so, `dow_iso` and `dayofweek_iso` is used to replace it. For historical reasons, we have [`dayofweek`, `dow`] implemented for representing a non-ISO day-of-week and a newly added `isodow` from PostgreSQL for ISO day-of-week. Many other systems only have one week-numbering system support and use either full names or abbreviations. Things in spark become a little bit complicated. 1. because of the existence of `isodow`, so we need to add iso-prefix to `dayofweek` to make a pair for it too. [`dayofweek`, `isodayofweek`, `dow` and `isodow`] 2. because there are rare `iso`-prefixed systems and more systems choose `iso`-suffixed way, so we may result in [`dayofweek`, `dayofweekiso`, `dow`, `dowiso`] 3. `dayofweekiso` looks nice and has use cases in the platforms listed above, e.g. snowflake, but `dowiso` looks weird and no use cases found. 4. with a discussion the community,we have agreed with an underscore before `iso` may look much better because `isodow` is new and there is no standard for `iso` kind of things, so this may be good for us to make it simple and clear for end-users if they are well documented too. Thus, we finally result in [`dayofweek`, `dow`] for Non-ISO day-of-week system and [`dayofweek_iso`, `dow_iso`] for ISO system ### Why are the changes needed? Remove some nonstandard and uncommon features as we can add them back if necessary ### Does this PR introduce any user-facing change? NO, we should target this to 3.0.0 and these are added during 3.0.0 ### How was this patch tested? Remove unused tests Closes #28284 from yaooqinn/SPARK-31507. Authored-by: Kent Yao <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 37d2e03) Signed-off-by: Wenchen Fan <[email protected]>
|
Thank you all! |
What changes were proposed in this pull request?
Extracting millennium, century, decade, millisecond, microsecond and epoch from datetime is neither ANSI standard nor quite common in modern SQL platforms. Most of the systems listing below does not support these except PostgreSQL and redshift.
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF
https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions050.htm
https://prestodb.io/docs/current/functions/datetime.html
https://docs.cloudera.com/documentation/enterprise/5-8-x/topics/impala_datetime_functions.html
https://docs.snowflake.com/en/sql-reference/functions-date-time.html#label-supported-date-time-parts
https://www.postgresql.org/docs/9.1/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT
This PR removes these extract fields support from extract function for date and timestamp values
isoyearis PostgreSQL specific butyearofweekis more commonly used across platformsisodowis PostgreSQL specific butisoas a suffix is more commonly used across platforms so,dow_isoanddayofweek_isois used to replace it.For historical reasons, we have [
dayofweek,dow] implemented for representing a non-ISO day-of-week and a newly addedisodowfrom PostgreSQL for ISO day-of-week. Many other systems only have one week-numbering system support and use either full names or abbreviations. Things in spark become a little bit complicated.isodow, so we need to add iso-prefix todayofweekto make a pair for it too. [dayofweek,isodayofweek,dowandisodow]iso-prefixed systems and more systems chooseiso-suffixed way, so we may result in [dayofweek,dayofweekiso,dow,dowiso]dayofweekisolooks nice and has use cases in the platforms listed above, e.g. snowflake, butdowisolooks weird and no use cases found.isomay look much better becauseisodowis new and there is no standard forisokind of things, so this may be good for us to make it simple and clear for end-users if they are well documented too.Thus, we finally result in [
dayofweek,dow] for Non-ISO day-of-week system and [dayofweek_iso,dow_iso] for ISO systemWhy are the changes needed?
Remove some nonstandard and uncommon features as we can add them back if necessary
Does this PR introduce any user-facing change?
NO, we should target this to 3.0.0 and these are added during 3.0.0
How was this patch tested?
Remove unused tests