Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/sql-migration-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ license: |

### UDFs and Built-in Functions

- Since Spark 3.0, the `date_add` and `date_sub` functions only accepts int, smallint, tinyint as the 2nd argument, fractional and string types are not valid anymore, e.g. `date_add(cast('1964-05-23' as date), '12.34')` will cause `AnalysisException`. In Spark version 2.4 and earlier, if the 2nd argument is fractional or string value, it will be coerced to int value, and the result will be a date value of `1964-06-04`.
- Since Spark 3.0, the `date_add` and `date_sub` functions only accept int, smallint, tinyint as the 2nd argument, fractional and non-literal string are not valid anymore, e.g. `date_add(cast('1964-05-23' as date), 12.34)` will cause `AnalysisException`. Note that, string literals are still allowed, but Spark will throw Analysis Exception if the string content is not a valid integer. In Spark version 2.4 and earlier, if the 2nd argument is fractional or string value, it will be coerced to int value, and the result will be a date value of `1964-06-04`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new approach looks good.


- Since Spark 3.0, the function `percentile_approx` and its alias `approx_percentile` only accept integral value with range in `[1, 2147483647]` as its 3rd argument `accuracy`, fractional and string types are disallowed, e.g. `percentile_approx(10.0, 0.2, 1.8D)` will cause `AnalysisException`. In Spark version 2.4 and earlier, if `accuracy` is fractional or string value, it will be coerced to an int value, `percentile_approx(10.0, 0.2, 1.8D)` is operated as `percentile_approx(10.0, 0.2, 1)` which results in `10.0`.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -295,8 +295,8 @@ class Analyzer(
case (CalendarIntervalType, CalendarIntervalType) => a
case (_, CalendarIntervalType) => Cast(TimeAdd(l, r), l.dataType)
case (CalendarIntervalType, _) => Cast(TimeAdd(r, l), r.dataType)
case (DateType, _) => DateAdd(l, r)
case (_, DateType) => DateAdd(r, l)
case (DateType, dt) if dt != StringType => DateAdd(l, r)
case (dt, DateType) if dt != StringType => DateAdd(r, l)
case _ => a
}
case s @ Subtract(l, r) if s.childrenResolved => (l.dataType, r.dataType) match {
Expand All @@ -305,7 +305,7 @@ class Analyzer(
case (TimestampType, _) => SubtractTimestamps(l, r)
case (_, TimestampType) => SubtractTimestamps(l, r)
case (_, DateType) => SubtractDates(l, r)
case (DateType, _) => DateSub(l, r)
case (DateType, dt) if dt != StringType => DateSub(l, r)
case _ => s
}
case m @ Multiply(l, r) if m.childrenResolved => (l.dataType, r.dataType) match {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ import scala.annotation.tailrec
import scala.collection.mutable

import org.apache.spark.internal.Logging
import org.apache.spark.sql.AnalysisException
import org.apache.spark.sql.catalyst.expressions._
import org.apache.spark.sql.catalyst.expressions.aggregate._
import org.apache.spark.sql.catalyst.plans.logical._
Expand Down Expand Up @@ -63,6 +64,7 @@ object TypeCoercion {
ImplicitTypeCasts ::
DateTimeOperations ::
WindowFrameCoercion ::
StringLiteralCoercion ::
Nil

// See https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types.
Expand Down Expand Up @@ -1043,6 +1045,34 @@ object TypeCoercion {
}
}
}

/**
* A special rule to support string literal as the second argument of date_add/date_sub functions,
* to keep backward compatibility as a temporary workaround.
* TODO(SPARK-28589): implement ANSI type type coercion and handle string literals.
*/
object StringLiteralCoercion extends TypeCoercionRule {
Copy link
Member

@dongjoon-hyun dongjoon-hyun Mar 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This causes a behavior difference in arithmatic operations, too. Could you describe the following change in the PR description? New one looks reasonable to me.

2.4.5 and 3.0.0-preview2

scala> sql("select (cast('2020-03-28' AS DATE) + '1')").show
org.apache.spark.sql.AnalysisException: cannot resolve '(CAST('2020-03-28' AS DATE) + CAST('1' AS DOUBLE))' due to data type mismatch: differing types in '(CAST('2020-03-28' AS DATE) + CAST('1' AS DOUBLE))' (date and double).; line 1 pos 8;

This PR.

scala> sql("select (cast('2020-03-28' AS DATE) + '1')").show
+-------------------------------------+
|date_add(CAST(2020-03-28 AS DATE), 1)|
+-------------------------------------+
|                           2020-03-29|
+-------------------------------------+

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You forgot to write a function name in the second query above?

scala> sql("select (cast('2020-03-28' AS DATE) + '1')").show
                 ^^^^

Copy link
Member

@dongjoon-hyun dongjoon-hyun Mar 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No~ The both queries are the same. What I meant was it's the behavior of this PR; this PR extends expressions, too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see.

override protected def coerceTypes(plan: LogicalPlan): LogicalPlan = plan resolveExpressions {
// Skip nodes who's children have not been resolved yet.
case e if !e.childrenResolved => e
case DateAdd(l, r) if r.dataType == StringType && r.foldable =>
val days = try {
AnsiCast(r, IntegerType).eval().asInstanceOf[Int]
} catch {
case e: NumberFormatException => throw new AnalysisException(
"The second argument of 'date_add' function needs to be an integer.", cause = Some(e))
}
DateAdd(l, Literal(days))
case DateSub(l, r) if r.dataType == StringType && r.foldable =>
val days = try {
AnsiCast(r, IntegerType).eval().asInstanceOf[Int]
} catch {
case e: NumberFormatException => throw new AnalysisException(
"The second argument of 'date_sub' function needs to be an integer.", cause = Some(e))
}
DateSub(l, Literal(days))
}
}
}

trait TypeCoercionRule extends Rule[LogicalPlan] with Logging {
Expand Down
10 changes: 10 additions & 0 deletions sql/core/src/test/resources/sql-tests/inputs/datetime.sql
Original file line number Diff line number Diff line change
Expand Up @@ -58,20 +58,30 @@ select date_add('2011-11-11', 1L);
select date_add('2011-11-11', 1.0);
select date_add('2011-11-11', 1E1);
select date_add('2011-11-11', '1');
select date_add('2011-11-11', '1.2');
select date_add(date'2011-11-11', 1);
select date_add(timestamp'2011-11-11', 1);
select date_sub(date'2011-11-11', 1);
select date_sub(date'2011-11-11', '1');
select date_sub(date'2011-11-11', '1.2');
select date_sub(timestamp'2011-11-11', 1);
select date_sub(null, 1);
select date_sub(date'2011-11-11', null);
select date'2011-11-11' + 1E1;
select date'2011-11-11' + '1';
select null + date '2001-09-28';
select date '2001-09-28' + 7Y;
select 7S + date '2001-09-28';
select date '2001-10-01' - 7;
select date '2001-10-01' - '7';
select date '2001-09-28' + null;
select date '2001-09-28' - null;

-- date add/sub with non-literal string column
create temp view v as select '1' str;
select date_add('2011-11-11', str) from v;
select date_sub('2011-11-11', str) from v;

-- subtract dates
select null - date '2019-10-06';
select date '2001-10-01' - date '2001-09-28';
Expand Down
73 changes: 71 additions & 2 deletions sql/core/src/test/resources/sql-tests/results/datetime.sql.out
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
-- Automatically generated by SQLQueryTestSuite
-- Number of queries: 77
-- Number of queries: 85


-- !query
Expand Down Expand Up @@ -266,10 +266,18 @@ cannot resolve 'date_add(CAST('2011-11-11' AS DATE), 10.0D)' due to data type mi
-- !query
select date_add('2011-11-11', '1')
-- !query schema
struct<date_add(CAST(2011-11-11 AS DATE), 1):date>
-- !query output
2011-11-12


-- !query
select date_add('2011-11-11', '1.2')
-- !query schema
struct<>
-- !query output
org.apache.spark.sql.AnalysisException
cannot resolve 'date_add(CAST('2011-11-11' AS DATE), '1')' due to data type mismatch: argument 2 requires (int or smallint or tinyint) type, however, ''1'' is of string type.; line 1 pos 7
The second argument of 'date_add' function needs to be an integer.;


-- !query
Expand All @@ -296,6 +304,23 @@ struct<date_sub(DATE '2011-11-11', 1):date>
2011-11-10


-- !query
select date_sub(date'2011-11-11', '1')
-- !query schema
struct<date_sub(DATE '2011-11-11', 1):date>
-- !query output
2011-11-10


-- !query
select date_sub(date'2011-11-11', '1.2')
-- !query schema
struct<>
-- !query output
org.apache.spark.sql.AnalysisException
The second argument of 'date_sub' function needs to be an integer.;


-- !query
select date_sub(timestamp'2011-11-11', 1)
-- !query schema
Expand Down Expand Up @@ -329,6 +354,15 @@ org.apache.spark.sql.AnalysisException
cannot resolve 'date_add(DATE '2011-11-11', 10.0D)' due to data type mismatch: argument 2 requires (int or smallint or tinyint) type, however, '10.0D' is of double type.; line 1 pos 7


-- !query
select date'2011-11-11' + '1'
-- !query schema
struct<>
-- !query output
org.apache.spark.sql.AnalysisException
cannot resolve 'date_add(DATE '2011-11-11', CAST('1' AS DOUBLE))' due to data type mismatch: argument 2 requires (int or smallint or tinyint) type, however, 'CAST('1' AS DOUBLE)' is of double type.; line 1 pos 7


-- !query
select null + date '2001-09-28'
-- !query schema
Expand Down Expand Up @@ -361,6 +395,15 @@ struct<date_sub(DATE '2001-10-01', 7):date>
2001-09-24


-- !query
select date '2001-10-01' - '7'
-- !query schema
struct<>
-- !query output
org.apache.spark.sql.AnalysisException
cannot resolve 'date_sub(DATE '2001-10-01', CAST('7' AS DOUBLE))' due to data type mismatch: argument 2 requires (int or smallint or tinyint) type, however, 'CAST('7' AS DOUBLE)' is of double type.; line 1 pos 7


-- !query
select date '2001-09-28' + null
-- !query schema
Expand All @@ -377,6 +420,32 @@ struct<date_sub(DATE '2001-09-28', CAST(NULL AS INT)):date>
NULL


-- !query
create temp view v as select '1' str
-- !query schema
struct<>
-- !query output



-- !query
select date_add('2011-11-11', str) from v
-- !query schema
struct<>
-- !query output
org.apache.spark.sql.AnalysisException
cannot resolve 'date_add(CAST('2011-11-11' AS DATE), v.`str`)' due to data type mismatch: argument 2 requires (int or smallint or tinyint) type, however, 'v.`str`' is of string type.; line 1 pos 7


-- !query
select date_sub('2011-11-11', str) from v
-- !query schema
struct<>
-- !query output
org.apache.spark.sql.AnalysisException
cannot resolve 'date_sub(CAST('2011-11-11' AS DATE), v.`str`)' due to data type mismatch: argument 2 requires (int or smallint or tinyint) type, however, 'v.`str`' is of string type.; line 1 pos 7


-- !query
select null - date '2019-10-06'
-- !query schema
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,7 @@ SELECT '1' + cast('2017-12-11 09:30:00' as date) FROM t
struct<>
-- !query output
org.apache.spark.sql.AnalysisException
cannot resolve 'date_add(CAST('2017-12-11 09:30:00' AS DATE), '1')' due to data type mismatch: argument 2 requires (int or smallint or tinyint) type, however, ''1'' is of string type.; line 1 pos 7
cannot resolve 'date_add(CAST('2017-12-11 09:30:00' AS DATE), CAST('1' AS DOUBLE))' due to data type mismatch: argument 2 requires (int or smallint or tinyint) type, however, 'CAST('1' AS DOUBLE)' is of double type.; line 1 pos 7


-- !query
Expand Down Expand Up @@ -698,7 +698,7 @@ SELECT cast('2017-12-11 09:30:00' as date) + '1' FROM t
struct<>
-- !query output
org.apache.spark.sql.AnalysisException
cannot resolve 'date_add(CAST('2017-12-11 09:30:00' AS DATE), '1')' due to data type mismatch: argument 2 requires (int or smallint or tinyint) type, however, ''1'' is of string type.; line 1 pos 7
cannot resolve 'date_add(CAST('2017-12-11 09:30:00' AS DATE), CAST('1' AS DOUBLE))' due to data type mismatch: argument 2 requires (int or smallint or tinyint) type, however, 'CAST('1' AS DOUBLE)' is of double type.; line 1 pos 7


-- !query
Expand Down Expand Up @@ -790,7 +790,7 @@ SELECT cast('2017-12-11 09:30:00' as date) - '1' FROM t
struct<>
-- !query output
org.apache.spark.sql.AnalysisException
cannot resolve 'date_sub(CAST('2017-12-11 09:30:00' AS DATE), '1')' due to data type mismatch: argument 2 requires (int or smallint or tinyint) type, however, ''1'' is of string type.; line 1 pos 7
cannot resolve 'date_sub(CAST('2017-12-11 09:30:00' AS DATE), CAST('1' AS DOUBLE))' due to data type mismatch: argument 2 requires (int or smallint or tinyint) type, however, 'CAST('1' AS DOUBLE)' is of double type.; line 1 pos 7


-- !query
Expand Down