Skip to content

Conversation

@yaooqinn
Copy link
Member

@yaooqinn yaooqinn commented Apr 21, 2020

What changes were proposed in this pull request?

Extracting millennium, century, decade, millisecond, microsecond and epoch from datetime is neither ANSI standard nor quite common in modern SQL platforms. Most of the systems listing below does not support these except PostgreSQL and redshift.

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF

https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions050.htm

https://prestodb.io/docs/current/functions/datetime.html

https://docs.cloudera.com/documentation/enterprise/5-8-x/topics/impala_datetime_functions.html

https://docs.snowflake.com/en/sql-reference/functions-date-time.html#label-supported-date-time-parts

https://www.postgresql.org/docs/9.1/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT

This PR removes these extract fields support from extract function for date and timestamp values

isoyear is PostgreSQL specific but yearofweek is more commonly used across platforms
isodow is PostgreSQL specific but iso as a suffix is more commonly used across platforms so, dow_iso and dayofweek_iso is used to replace it.

For historical reasons, we have [dayofweek, dow] implemented for representing a non-ISO day-of-week and a newly added isodow from PostgreSQL for ISO day-of-week. Many other systems only have one week-numbering system support and use either full names or abbreviations. Things in spark become a little bit complicated.

  1. because of the existence of isodow, so we need to add iso-prefix to dayofweek to make a pair for it too. [dayofweek, isodayofweek, dow and isodow]
  2. because there are rare iso-prefixed systems and more systems choose iso-suffixed way, so we may result in [dayofweek, dayofweekiso, dow, dowiso]
  3. dayofweekiso looks nice and has use cases in the platforms listed above, e.g. snowflake, but dowiso looks weird and no use cases found.
  4. with a discussion the community,we have agreed with an underscore before iso may look much better because isodow is new and there is no standard for iso kind of things, so this may be good for us to make it simple and clear for end-users if they are well documented too.

Thus, we finally result in [dayofweek, dow] for Non-ISO day-of-week system and [dayofweek_iso, dow_iso] for ISO system

Why are the changes needed?

Remove some nonstandard and uncommon features as we can add them back if necessary

Does this PR introduce any user-facing change?

NO, we should target this to 3.0.0 and these are added during 3.0.0

How was this patch tested?

Remove unused tests

…icrosecond and epoch fields support from extract fucntion
@yaooqinn yaooqinn marked this pull request as draft April 21, 2020 10:36
@SparkQA
Copy link

SparkQA commented Apr 21, 2020

Test build #121581 has finished for PR 28284 at commit 3146b65.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yaooqinn yaooqinn marked this pull request as ready for review April 21, 2020 13:28
@yaooqinn yaooqinn changed the title [SPARK-31507][SQL] Remove millennium, century, decade, millisecond, m… [SPARK-31507][SQL] Remove millennium, century, decade, millisecond, microsecond and epoch fields support from extract function Apr 21, 2020
@yaooqinn yaooqinn changed the title [SPARK-31507][SQL] Remove millennium, century, decade, millisecond, microsecond and epoch fields support from extract function [SPARK-31507][SQL] Remove uncommon fields support and update some fields with meaningful names for extract function Apr 21, 2020
Copy link
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. We can add them back later if we find them useful. It's very difficult to remove them if they are released with 3.0

@cloud-fan
Copy link
Contributor

@SparkQA
Copy link

SparkQA commented Apr 21, 2020

Test build #121586 has finished for PR 28284 at commit 521bc2c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class YearOfWeek(child: Expression) extends UnaryExpression with ImplicitCastInputTypes

@dongjoon-hyun
Copy link
Member

cc @maropu

case "DAY" | "D" | "DAYS" => DayOfMonth(source)
case "DAYOFWEEK" | "DOW" => DayOfWeek(source)
case "ISODOW" => Add(WeekDay(source), Literal(1))
case "DAYOFWEEK_ISO" | "DOW_ISO" => Add(WeekDay(source), Literal(1))
Copy link
Member

@dongjoon-hyun dongjoon-hyun Apr 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlike the PR title, this PR is adding DAYOFWEEK_ISO newly. Which platform are you referring for DAYOFWEEK_ISO? IIRC, @gatorsmile commented that we should not consider IBM DB i.

isodow is PostgreSQL specific but iso as a suffix is more commonly used across platforms

Copy link
Member Author

@yaooqinn yaooqinn Apr 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @dongjoon-hyun, for historical reasons, we have [dayofweek, dow] implemented for representing a non-ISO day-of-week and a newly added isodow from PostgreSQL for ISO day-of-week. Many other systems only have one week-numbering system support and use either full names or abbreviations. Things in spark become a little bit complicated.

  1. because of the existence of isodow, so we need to add iso-prefix to dayofweek to make a pair for it too. [dayofweek, isodayofweek, dow and isodow]
  2. because there are rare iso-prefixed systems and more systems choose iso-suffixed way, so we may result in [dayofweek, dayofweekiso, dow, dowiso]
  3. dayofweekiso looks nice and has use cases in the platforms listed above, e.g. snowflake, but dowiso looks weird and no use cases found.
  4. after a discussion with @cloud-fan, we have both agreed with an underscore before iso may look much better because isodow is new and there is no standard for iso kind of things, so this may be good for us to make it simple and clear for end-users if they are well documented too.

Thus, we finally result in [dayofweek, dow] for Non-ISO week-numbering system and [dayofweek_iso, dow_iso] for ISO system

Copy link
Member

@dongjoon-hyun dongjoon-hyun Apr 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yaooqinn . Ya. It's clear for that part. Could you enumerate the name of systems which supports DAYOFWEEK_ISO? (except IBM DB i). I'm wondering that specifically.

It's because you wrote like Many other systems in the PR description.

Copy link
Member Author

@yaooqinn yaooqinn Apr 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically, snowflake use dayofweek_iso except db2. I couldn't find more because most platforms do not have two day-of-week system implemented.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Thanks. I understand that Snowflake is considered important. Could you add some of the above comment(#28284 (comment)) to the PR description then?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @dongjoon-hyun. PR description updated

@cloud-fan
Copy link
Contributor

thanks, merging to master/3.0!

@cloud-fan cloud-fan closed this in 37d2e03 Apr 22, 2020
cloud-fan pushed a commit that referenced this pull request Apr 22, 2020
…lds with meaningful names for extract function

### What changes were proposed in this pull request?

Extracting millennium, century, decade, millisecond, microsecond and epoch from datetime is neither ANSI standard nor quite common in modern SQL platforms. Most of the systems listing below does not support these except PostgreSQL and redshift.

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF

https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions050.htm

https://prestodb.io/docs/current/functions/datetime.html

https://docs.cloudera.com/documentation/enterprise/5-8-x/topics/impala_datetime_functions.html

https://docs.snowflake.com/en/sql-reference/functions-date-time.html#label-supported-date-time-parts

https://www.postgresql.org/docs/9.1/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT

This PR removes these extract fields support from extract function for date and timestamp values

`isoyear` is PostgreSQL specific but `yearofweek` is more commonly used across platforms
`isodow` is PostgreSQL specific but `iso` as a suffix is more commonly used across platforms so, `dow_iso` and `dayofweek_iso` is used to replace it.

For historical reasons, we have [`dayofweek`, `dow`] implemented for representing a non-ISO day-of-week and a newly added `isodow` from PostgreSQL for ISO day-of-week. Many other systems only have one week-numbering system support and use either full names or abbreviations. Things in spark become a little bit complicated.
1. because of the existence of `isodow`, so we need to add iso-prefix to `dayofweek` to make a pair for it too. [`dayofweek`, `isodayofweek`, `dow` and `isodow`]
2. because there are rare `iso`-prefixed systems and more systems choose `iso`-suffixed way, so we may result in [`dayofweek`, `dayofweekiso`, `dow`, `dowiso`]
3. `dayofweekiso` looks nice and has use cases in the platforms listed above, e.g. snowflake, but `dowiso` looks weird and no use cases found.
4. with a discussion the community,we have agreed with an underscore before `iso` may look much better because `isodow` is new and there is no standard for `iso` kind of things, so this may be good for us to make it simple and clear for end-users if they are well documented too.

Thus, we finally result in [`dayofweek`, `dow`] for Non-ISO day-of-week system and [`dayofweek_iso`, `dow_iso`] for ISO system

### Why are the changes needed?

Remove some nonstandard and uncommon features as we can add them back if necessary

### Does this PR introduce any user-facing change?

NO, we should target this to 3.0.0 and these are added during 3.0.0

### How was this patch tested?

Remove unused tests

Closes #28284 from yaooqinn/SPARK-31507.

Authored-by: Kent Yao <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 37d2e03)
Signed-off-by: Wenchen Fan <[email protected]>
@dongjoon-hyun
Copy link
Member

Thank you all!

@yaooqinn yaooqinn deleted the SPARK-31507 branch April 22, 2020 16:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants