Skip to content

Commit 67b4329

Browse files
MaxGekkmaropu
andcommitted
[SPARK-28690][SQL] Add date_part function for timestamps/dates
## What changes were proposed in this pull request? In the PR, I propose new function `date_part()`. The function is modeled on the traditional Ingres equivalent to the SQL-standard function `extract`: ``` date_part('field', source) ``` and added for feature parity with PostgreSQL (https://www.postgresql.org/docs/11/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT). The `source` can have `DATE` or `TIMESTAMP` type. Supported string values of `'field'` are: - `millennium` - the current millennium for given date (or a timestamp implicitly casted to a date). For example, years in the 1900s are in the second millennium. The third millennium started _January 1, 2001_. - `century` - the current millennium for given date (or timestamp). The first century starts at 0001-01-01 AD. - `decade` - the current decade for given date (or timestamp). Actually, this is the year field divided by 10. - isoyear` - the ISO 8601 week-numbering year that the date falls in. Each ISO 8601 week-numbering year begins with the Monday of the week containing the 4th of January. - `year`, `month`, `day`, `hour`, `minute`, `second` - `week` - the number of the ISO 8601 week-numbering week of the year. By definition, ISO weeks start on Mondays and the first week of a year contains January 4 of that year. - `quarter` - the quarter of the year (1 - 4) - `dayofweek` - the day of the week for date/timestamp (1 = Sunday, 2 = Monday, ..., 7 = Saturday) - `dow` - the day of the week as Sunday (0) to Saturday (6) - `isodow` - the day of the week as Monday (1) to Sunday (7) - `doy` - the day of the year (1 - 365/366) - `milliseconds` - the seconds field including fractional parts multiplied by 1,000. - `microseconds` - the seconds field including fractional parts multiplied by 1,000,000. - `epoch` - the number of seconds since 1970-01-01 00:00:00 local time in microsecond precision. Here are examples: ```sql spark-sql> select date_part('year', timestamp'2019-08-12 01:00:00.123456'); 2019 spark-sql> select date_part('week', timestamp'2019-08-12 01:00:00.123456'); 33 spark-sql> select date_part('doy', timestamp'2019-08-12 01:00:00.123456'); 224 ``` I changed implementation of `extract` to re-use `date_part()` internally. ## How was this patch tested? Added `date_part.sql` and regenerated results of `extract.sql`. Closes #25410 from MaxGekk/date_part. Lead-authored-by: Maxim Gekk <[email protected]> Co-authored-by: Takeshi Yamamuro <[email protected]> Signed-off-by: Takeshi Yamamuro <[email protected]>
1 parent 905b7f7 commit 67b4329

File tree

9 files changed

+727
-153
lines changed

9 files changed

+727
-153
lines changed

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -417,6 +417,7 @@ object FunctionRegistry {
417417
expression[TimeWindow]("window"),
418418
expression[MakeDate]("make_date"),
419419
expression[MakeTimestamp]("make_timestamp"),
420+
expression[DatePart]("date_part"),
420421

421422
// collection functions
422423
expression[CreateArray]("array"),

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala

Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1963,3 +1963,90 @@ case class Epoch(child: Expression, timeZoneId: Option[String] = None)
19631963
defineCodeGen(ctx, ev, c => s"$dtu.getEpoch($c, $zid)")
19641964
}
19651965
}
1966+
1967+
object DatePart {
1968+
1969+
def parseExtractField(
1970+
extractField: String,
1971+
source: Expression,
1972+
errorHandleFunc: => Nothing): Expression = extractField.toUpperCase(Locale.ROOT) match {
1973+
case "MILLENNIUM" | "MILLENNIA" | "MIL" | "MILS" => Millennium(source)
1974+
case "CENTURY" | "CENTURIES" | "C" | "CENT" => Century(source)
1975+
case "DECADE" | "DECADES" | "DEC" | "DECS" => Decade(source)
1976+
case "YEAR" | "Y" | "YEARS" | "YR" | "YRS" => Year(source)
1977+
case "ISOYEAR" => IsoYear(source)
1978+
case "QUARTER" | "QTR" => Quarter(source)
1979+
case "MONTH" | "MON" | "MONS" | "MONTHS" => Month(source)
1980+
case "WEEK" | "W" | "WEEKS" => WeekOfYear(source)
1981+
case "DAY" | "D" | "DAYS" => DayOfMonth(source)
1982+
case "DAYOFWEEK" => DayOfWeek(source)
1983+
case "DOW" => Subtract(DayOfWeek(source), Literal(1))
1984+
case "ISODOW" => Add(WeekDay(source), Literal(1))
1985+
case "DOY" => DayOfYear(source)
1986+
case "HOUR" | "H" | "HOURS" | "HR" | "HRS" => Hour(source)
1987+
case "MINUTE" | "M" | "MIN" | "MINS" | "MINUTES" => Minute(source)
1988+
case "SECOND" | "S" | "SEC" | "SECONDS" | "SECS" => Second(source)
1989+
case "MILLISECONDS" | "MSEC" | "MSECS" | "MILLISECON" | "MSECONDS" | "MS" =>
1990+
Milliseconds(source)
1991+
case "MICROSECONDS" | "USEC" | "USECS" | "USECONDS" | "MICROSECON" | "US" =>
1992+
Microseconds(source)
1993+
case "EPOCH" => Epoch(source)
1994+
case _ => errorHandleFunc
1995+
}
1996+
}
1997+
1998+
@ExpressionDescription(
1999+
usage = "_FUNC_(field, source) - Extracts a part of the date/timestamp.",
2000+
arguments = """
2001+
Arguments:
2002+
* field - selects which part of the source should be extracted. Supported string values are:
2003+
["MILLENNIUM", ("MILLENNIA", "MIL", "MILS"),
2004+
"CENTURY", ("CENTURIES", "C", "CENT"),
2005+
"DECADE", ("DECADES", "DEC", "DECS"),
2006+
"YEAR", ("Y", "YEARS", "YR", "YRS"),
2007+
"ISOYEAR",
2008+
"QUARTER", ("QTR"),
2009+
"MONTH", ("MON", "MONS", "MONTHS"),
2010+
"WEEK", ("W", "WEEKS"),
2011+
"DAY", ("D", "DAYS"),
2012+
"DAYOFWEEK",
2013+
"DOW",
2014+
"ISODOW",
2015+
"DOY",
2016+
"HOUR", ("H", "HOURS", "HR", "HRS"),
2017+
"MINUTE", ("M", "MIN", "MINS", "MINUTES"),
2018+
"SECOND", ("S", "SEC", "SECONDS", "SECS"),
2019+
"MILLISECONDS", ("MSEC", "MSECS", "MILLISECON", "MSECONDS", "MS"),
2020+
"MICROSECONDS", ("USEC", "USECS", "USECONDS", "MICROSECON", "US"),
2021+
"EPOCH"]
2022+
* source - a date (or timestamp) column from where `field` should be extracted
2023+
""",
2024+
examples = """
2025+
Examples:
2026+
> SELECT _FUNC_('YEAR', TIMESTAMP '2019-08-12 01:00:00.123456');
2027+
2019
2028+
> SELECT _FUNC_('week', timestamp'2019-08-12 01:00:00.123456');
2029+
33
2030+
> SELECT _FUNC_('doy', DATE'2019-08-12');
2031+
224
2032+
""",
2033+
since = "3.0.0")
2034+
case class DatePart(field: Expression, source: Expression, child: Expression)
2035+
extends RuntimeReplaceable {
2036+
2037+
def this(field: Expression, source: Expression) {
2038+
this(field, source, {
2039+
if (!field.foldable) {
2040+
throw new AnalysisException("The field parameter needs to be a foldable string value.")
2041+
}
2042+
val fieldStr = field.eval().asInstanceOf[UTF8String].toString
2043+
DatePart.parseExtractField(fieldStr, source, {
2044+
throw new AnalysisException(s"Literals of type '$fieldStr' are currently not supported.")
2045+
})
2046+
})
2047+
}
2048+
2049+
override def flatArguments: Iterator[Any] = Iterator(field, source)
2050+
override def sql: String = s"$prettyName(${field.sql}, ${source.sql})"
2051+
override def prettyName: String = "date_part"
2052+
}

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala

Lines changed: 6 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -1409,48 +1409,12 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging
14091409
* Create a Extract expression.
14101410
*/
14111411
override def visitExtract(ctx: ExtractContext): Expression = withOrigin(ctx) {
1412-
ctx.field.getText.toUpperCase(Locale.ROOT) match {
1413-
case "MILLENNIUM" | "MILLENNIA" | "MIL" | "MILS" =>
1414-
Millennium(expression(ctx.source))
1415-
case "CENTURY" | "CENTURIES" | "C" | "CENT" =>
1416-
Century(expression(ctx.source))
1417-
case "DECADE" | "DECADES" | "DEC" | "DECS" =>
1418-
Decade(expression(ctx.source))
1419-
case "YEAR" | "Y" | "YEARS" | "YR" | "YRS" =>
1420-
Year(expression(ctx.source))
1421-
case "ISOYEAR" =>
1422-
IsoYear(expression(ctx.source))
1423-
case "QUARTER" | "QTR" =>
1424-
Quarter(expression(ctx.source))
1425-
case "MONTH" | "MON" | "MONS" | "MONTHS" =>
1426-
Month(expression(ctx.source))
1427-
case "WEEK" | "W" | "WEEKS" =>
1428-
WeekOfYear(expression(ctx.source))
1429-
case "DAY" | "D" | "DAYS" =>
1430-
DayOfMonth(expression(ctx.source))
1431-
case "DAYOFWEEK" =>
1432-
DayOfWeek(expression(ctx.source))
1433-
case "DOW" =>
1434-
Subtract(DayOfWeek(expression(ctx.source)), Literal(1))
1435-
case "ISODOW" =>
1436-
Add(WeekDay(expression(ctx.source)), Literal(1))
1437-
case "DOY" =>
1438-
DayOfYear(expression(ctx.source))
1439-
case "HOUR" | "H" | "HOURS" | "HR" | "HRS" =>
1440-
Hour(expression(ctx.source))
1441-
case "MINUTE" | "M" | "MIN" | "MINS" | "MINUTES" =>
1442-
Minute(expression(ctx.source))
1443-
case "SECOND" | "S" | "SEC" | "SECONDS" | "SECS" =>
1444-
Second(expression(ctx.source))
1445-
case "MILLISECONDS" | "MSEC" | "MSECS" | "MILLISECON" | "MSECONDS" | "MS" =>
1446-
Milliseconds(expression(ctx.source))
1447-
case "MICROSECONDS" | "USEC" | "USECS" | "USECONDS" | "MICROSECON" | "US" =>
1448-
Microseconds(expression(ctx.source))
1449-
case "EPOCH" =>
1450-
Epoch(expression(ctx.source))
1451-
case other =>
1452-
throw new ParseException(s"Literals of type '$other' are currently not supported.", ctx)
1453-
}
1412+
val fieldStr = ctx.field.getText
1413+
val source = expression(ctx.source)
1414+
val extractField = DatePart.parseExtractField(fieldStr, source, {
1415+
throw new ParseException(s"Literals of type '$fieldStr' are currently not supported.", ctx)
1416+
})
1417+
new DatePart(Literal(fieldStr), expression(ctx.source), extractField)
14541418
}
14551419

14561420
/**
Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
CREATE TEMPORARY VIEW t AS select '2011-05-06 07:08:09.1234567' as c;
2+
3+
select date_part('millennium', c) from t;
4+
select date_part('millennia', c) from t;
5+
select date_part('mil', c) from t;
6+
select date_part('mils', c) from t;
7+
8+
select date_part('century', c) from t;
9+
select date_part('centuries', c) from t;
10+
select date_part('c', c) from t;
11+
select date_part('cent', c) from t;
12+
13+
select date_part('decade', c) from t;
14+
select date_part('decades', c) from t;
15+
select date_part('dec', c) from t;
16+
select date_part('decs', c) from t;
17+
18+
select date_part('year', c) from t;
19+
select date_part('y', c) from t;
20+
select date_part('years', c) from t;
21+
select date_part('yr', c) from t;
22+
select date_part('yrs', c) from t;
23+
24+
select date_part('quarter', c) from t;
25+
select date_part('qtr', c) from t;
26+
27+
select date_part('month', c) from t;
28+
select date_part('mon', c) from t;
29+
select date_part('mons', c) from t;
30+
select date_part('months', c) from t;
31+
32+
select date_part('week', c) from t;
33+
select date_part('w', c) from t;
34+
select date_part('weeks', c) from t;
35+
36+
select date_part('day', c) from t;
37+
select date_part('d', c) from t;
38+
select date_part('days', c) from t;
39+
40+
select date_part('dayofweek', c) from t;
41+
42+
select date_part('dow', c) from t;
43+
44+
select date_part('isodow', c) from t;
45+
46+
select date_part('doy', c) from t;
47+
48+
select date_part('hour', c) from t;
49+
select date_part('h', c) from t;
50+
select date_part('hours', c) from t;
51+
select date_part('hr', c) from t;
52+
select date_part('hrs', c) from t;
53+
54+
select date_part('minute', c) from t;
55+
select date_part('m', c) from t;
56+
select date_part('min', c) from t;
57+
select date_part('mins', c) from t;
58+
select date_part('minutes', c) from t;
59+
60+
select date_part('second', c) from t;
61+
select date_part('s', c) from t;
62+
select date_part('sec', c) from t;
63+
select date_part('seconds', c) from t;
64+
select date_part('secs', c) from t;
65+
66+
select date_part('not_supported', c) from t;
67+
68+
select date_part(c, c) from t;

sql/core/src/test/resources/sql-tests/inputs/pgSQL/timestamp.sql

Lines changed: 15 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -187,22 +187,21 @@ SELECT '' AS date_trunc_week, date_trunc( 'week', timestamp '2004-02-29 15:44:17
187187
-- WHERE d1 BETWEEN timestamp '1902-01-01'
188188
-- AND timestamp '2038-01-01';
189189

190-
-- [SPARK-28420] Date/Time Functions: date_part
191-
-- SELECT '' AS "54", d1 as "timestamp",
192-
-- date_part( 'year', d1) AS year, date_part( 'month', d1) AS month,
193-
-- date_part( 'day', d1) AS day, date_part( 'hour', d1) AS hour,
194-
-- date_part( 'minute', d1) AS minute, date_part( 'second', d1) AS second
195-
-- FROM TIMESTAMP_TBL WHERE d1 BETWEEN '1902-01-01' AND '2038-01-01';
196-
197-
-- SELECT '' AS "54", d1 as "timestamp",
198-
-- date_part( 'quarter', d1) AS quarter, date_part( 'msec', d1) AS msec,
199-
-- date_part( 'usec', d1) AS usec
200-
-- FROM TIMESTAMP_TBL WHERE d1 BETWEEN '1902-01-01' AND '2038-01-01';
201-
202-
-- SELECT '' AS "54", d1 as "timestamp",
203-
-- date_part( 'isoyear', d1) AS isoyear, date_part( 'week', d1) AS week,
204-
-- date_part( 'dow', d1) AS dow
205-
-- FROM TIMESTAMP_TBL WHERE d1 BETWEEN '1902-01-01' AND '2038-01-01';
190+
SELECT '' AS `54`, d1 as `timestamp`,
191+
date_part( 'year', d1) AS `year`, date_part( 'month', d1) AS `month`,
192+
date_part( 'day', d1) AS `day`, date_part( 'hour', d1) AS `hour`,
193+
date_part( 'minute', d1) AS `minute`, date_part( 'second', d1) AS `second`
194+
FROM TIMESTAMP_TBL WHERE d1 BETWEEN '1902-01-01' AND '2038-01-01';
195+
196+
SELECT '' AS `54`, d1 as `timestamp`,
197+
date_part( 'quarter', d1) AS quarter, date_part( 'msec', d1) AS msec,
198+
date_part( 'usec', d1) AS usec
199+
FROM TIMESTAMP_TBL WHERE d1 BETWEEN '1902-01-01' AND '2038-01-01';
200+
201+
SELECT '' AS `54`, d1 as `timestamp`,
202+
date_part( 'isoyear', d1) AS isoyear, date_part( 'week', d1) AS week,
203+
date_part( 'dow', d1) AS dow
204+
FROM TIMESTAMP_TBL WHERE d1 BETWEEN '1902-01-01' AND '2038-01-01';
206205

207206
-- [SPARK-28137] Data Type Formatting Functions
208207
-- TO_CHAR()

0 commit comments

Comments
 (0)