Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -1408,6 +1408,12 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging
DayOfMonth(expression(ctx.source))
case "DAYOFWEEK" =>
DayOfWeek(expression(ctx.source))
case "DOW" =>
Subtract(DayOfWeek(expression(ctx.source)), Literal(1))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's very confusing that DOW returns different results from DAYOFWEEK. DOW should just be an abbreviation of DAYOFWEEK. See also https://docs.snowflake.com/en/sql-reference/functions-date-time.html#label-supported-date-time-parts

Do we have a clear document about what these names mean? cc @HyukjinKwon @maropu @wangyum

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we just follow PostgreSQL definition of DOW, see:

dow
The day of the week as Sunday (0) to Saturday (6)

SELECT EXTRACT(DOW FROM TIMESTAMP '2001-02-16 20:38:40');
Result: 5

Do you propose to break compatibility with PostgreSQL?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We never promise pgsql compatibility and internal consistency is much more important for Spark.

For this case, I think the behavior of DOW from pgsql is correct and commonly used by other databases. We should fix our DAYOFWEEK instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a clear document about what these names mean? cc @HyukjinKwon @maropu @wangyum

Currently, we just list up these names in ExpressionDescription of DatePart: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala#L2124
If we add ExpressionDescription for EXTRACT, the list will appear for EXTRACT, too, in the Built-in Functions document for Spark 3.0. If we need to describe more about these parameters, I think we need to update the argument section there.

Copy link
Member

@dongjoon-hyun dongjoon-hyun Apr 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, @cloud-fan . DAYOFWEEK is added two years ago. (#21479)

We should fix our DAYOFWEEK instead.

cc @marmbrus and @gatorsmile since the request might mean a behavior change.

case "ISODOW" =>
Add(WeekDay(expression(ctx.source)), Literal(1))
case "DOY" =>
DayOfYear(expression(ctx.source))
case "HOUR" =>
Hour(expression(ctx.source))
case "MINUTE" =>
Expand Down
6 changes: 6 additions & 0 deletions sql/core/src/test/resources/sql-tests/inputs/extract.sql
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,12 @@ select extract(day from c) from t;

select extract(dayofweek from c) from t;

select extract(dow from c) from t;

select extract(isodow from c) from t;

select extract(doy from c) from t;

select extract(hour from c) from t;

select extract(minute from c) from t;
Expand Down
48 changes: 36 additions & 12 deletions sql/core/src/test/resources/sql-tests/results/extract.sql.out
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
-- Automatically generated by SQLQueryTestSuite
-- Number of queries: 11
-- Number of queries: 14


-- !query 0
Expand Down Expand Up @@ -59,34 +59,58 @@ struct<dayofweek(CAST(c AS DATE)):int>


-- !query 7
select extract(hour from c) from t
select extract(dow from c) from t
-- !query 7 schema
struct<hour(CAST(c AS TIMESTAMP)):int>
struct<(dayofweek(CAST(c AS DATE)) - 1):int>
-- !query 7 output
7
5


-- !query 8
select extract(minute from c) from t
select extract(isodow from c) from t
-- !query 8 schema
struct<minute(CAST(c AS TIMESTAMP)):int>
struct<(weekday(CAST(c AS DATE)) + 1):int>
-- !query 8 output
8
5


-- !query 9
select extract(second from c) from t
select extract(doy from c) from t
-- !query 9 schema
struct<second(CAST(c AS TIMESTAMP)):int>
struct<dayofyear(CAST(c AS DATE)):int>
-- !query 9 output
9
126


-- !query 10
select extract(not_supported from c) from t
select extract(hour from c) from t
-- !query 10 schema
struct<>
struct<hour(CAST(c AS TIMESTAMP)):int>
-- !query 10 output
7


-- !query 11
select extract(minute from c) from t
-- !query 11 schema
struct<minute(CAST(c AS TIMESTAMP)):int>
-- !query 11 output
8


-- !query 12
select extract(second from c) from t
-- !query 12 schema
struct<second(CAST(c AS TIMESTAMP)):int>
-- !query 12 output
9


-- !query 13
select extract(not_supported from c) from t
-- !query 13 schema
struct<>
-- !query 13 output
org.apache.spark.sql.catalyst.parser.ParseException

Literals of type 'NOT_SUPPORTED' are currently not supported.(line 1, pos 7)
Expand Down