Skip to content

Commit 1d0f549

Browse files
committed
[SPARK-31205][SQL] support string literal as the second argument of date_add/date_sub functions
### What changes were proposed in this pull request? #26412 introduced a behavior change that `date_add`/`date_sub` functions can't accept string and double values in the second parameter. This is reasonable as it's error-prone to cast string/double to int at runtime. However, using string literals as function arguments is very common in SQL databases. To avoid breaking valid use cases that the string literal is indeed an integer, this PR proposes to add ansi_cast for string literal in date_add/date_sub functions. If the string value is not a valid integer, we fail at query compiling time because of constant folding. ### Why are the changes needed? avoid breaking changes ### Does this PR introduce any user-facing change? Yes, now 3.0 can run `date_add('2011-11-11', '1')` like 2.4 ### How was this patch tested? new tests. Closes #27965 from cloud-fan/string. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
1 parent aa3a742 commit 1d0f549

File tree

6 files changed

+118
-9
lines changed

6 files changed

+118
-9
lines changed

docs/sql-migration-guide.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -113,7 +113,7 @@ license: |
113113

114114
### UDFs and Built-in Functions
115115

116-
- Since Spark 3.0, the `date_add` and `date_sub` functions only accepts int, smallint, tinyint as the 2nd argument, fractional and string types are not valid anymore, e.g. `date_add(cast('1964-05-23' as date), '12.34')` will cause `AnalysisException`. In Spark version 2.4 and earlier, if the 2nd argument is fractional or string value, it will be coerced to int value, and the result will be a date value of `1964-06-04`.
116+
- Since Spark 3.0, the `date_add` and `date_sub` functions only accept int, smallint, tinyint as the 2nd argument, fractional and non-literal string are not valid anymore, e.g. `date_add(cast('1964-05-23' as date), 12.34)` will cause `AnalysisException`. Note that, string literals are still allowed, but Spark will throw Analysis Exception if the string content is not a valid integer. In Spark version 2.4 and earlier, if the 2nd argument is fractional or string value, it will be coerced to int value, and the result will be a date value of `1964-06-04`.
117117

118118
- Since Spark 3.0, the function `percentile_approx` and its alias `approx_percentile` only accept integral value with range in `[1, 2147483647]` as its 3rd argument `accuracy`, fractional and string types are disallowed, e.g. `percentile_approx(10.0, 0.2, 1.8D)` will cause `AnalysisException`. In Spark version 2.4 and earlier, if `accuracy` is fractional or string value, it will be coerced to an int value, `percentile_approx(10.0, 0.2, 1.8D)` is operated as `percentile_approx(10.0, 0.2, 1)` which results in `10.0`.
119119

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -295,8 +295,8 @@ class Analyzer(
295295
case (CalendarIntervalType, CalendarIntervalType) => a
296296
case (_, CalendarIntervalType) => Cast(TimeAdd(l, r), l.dataType)
297297
case (CalendarIntervalType, _) => Cast(TimeAdd(r, l), r.dataType)
298-
case (DateType, _) => DateAdd(l, r)
299-
case (_, DateType) => DateAdd(r, l)
298+
case (DateType, dt) if dt != StringType => DateAdd(l, r)
299+
case (dt, DateType) if dt != StringType => DateAdd(r, l)
300300
case _ => a
301301
}
302302
case s @ Subtract(l, r) if s.childrenResolved => (l.dataType, r.dataType) match {
@@ -305,7 +305,7 @@ class Analyzer(
305305
case (TimestampType, _) => SubtractTimestamps(l, r)
306306
case (_, TimestampType) => SubtractTimestamps(l, r)
307307
case (_, DateType) => SubtractDates(l, r)
308-
case (DateType, _) => DateSub(l, r)
308+
case (DateType, dt) if dt != StringType => DateSub(l, r)
309309
case _ => s
310310
}
311311
case m @ Multiply(l, r) if m.childrenResolved => (l.dataType, r.dataType) match {

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ import scala.annotation.tailrec
2323
import scala.collection.mutable
2424

2525
import org.apache.spark.internal.Logging
26+
import org.apache.spark.sql.AnalysisException
2627
import org.apache.spark.sql.catalyst.expressions._
2728
import org.apache.spark.sql.catalyst.expressions.aggregate._
2829
import org.apache.spark.sql.catalyst.plans.logical._
@@ -63,6 +64,7 @@ object TypeCoercion {
6364
ImplicitTypeCasts ::
6465
DateTimeOperations ::
6566
WindowFrameCoercion ::
67+
StringLiteralCoercion ::
6668
Nil
6769

6870
// See https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types.
@@ -1043,6 +1045,34 @@ object TypeCoercion {
10431045
}
10441046
}
10451047
}
1048+
1049+
/**
1050+
* A special rule to support string literal as the second argument of date_add/date_sub functions,
1051+
* to keep backward compatibility as a temporary workaround.
1052+
* TODO(SPARK-28589): implement ANSI type type coercion and handle string literals.
1053+
*/
1054+
object StringLiteralCoercion extends TypeCoercionRule {
1055+
override protected def coerceTypes(plan: LogicalPlan): LogicalPlan = plan resolveExpressions {
1056+
// Skip nodes who's children have not been resolved yet.
1057+
case e if !e.childrenResolved => e
1058+
case DateAdd(l, r) if r.dataType == StringType && r.foldable =>
1059+
val days = try {
1060+
AnsiCast(r, IntegerType).eval().asInstanceOf[Int]
1061+
} catch {
1062+
case e: NumberFormatException => throw new AnalysisException(
1063+
"The second argument of 'date_add' function needs to be an integer.", cause = Some(e))
1064+
}
1065+
DateAdd(l, Literal(days))
1066+
case DateSub(l, r) if r.dataType == StringType && r.foldable =>
1067+
val days = try {
1068+
AnsiCast(r, IntegerType).eval().asInstanceOf[Int]
1069+
} catch {
1070+
case e: NumberFormatException => throw new AnalysisException(
1071+
"The second argument of 'date_sub' function needs to be an integer.", cause = Some(e))
1072+
}
1073+
DateSub(l, Literal(days))
1074+
}
1075+
}
10461076
}
10471077

10481078
trait TypeCoercionRule extends Rule[LogicalPlan] with Logging {

sql/core/src/test/resources/sql-tests/inputs/datetime.sql

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,20 +58,30 @@ select date_add('2011-11-11', 1L);
5858
select date_add('2011-11-11', 1.0);
5959
select date_add('2011-11-11', 1E1);
6060
select date_add('2011-11-11', '1');
61+
select date_add('2011-11-11', '1.2');
6162
select date_add(date'2011-11-11', 1);
6263
select date_add(timestamp'2011-11-11', 1);
6364
select date_sub(date'2011-11-11', 1);
65+
select date_sub(date'2011-11-11', '1');
66+
select date_sub(date'2011-11-11', '1.2');
6467
select date_sub(timestamp'2011-11-11', 1);
6568
select date_sub(null, 1);
6669
select date_sub(date'2011-11-11', null);
6770
select date'2011-11-11' + 1E1;
71+
select date'2011-11-11' + '1';
6872
select null + date '2001-09-28';
6973
select date '2001-09-28' + 7Y;
7074
select 7S + date '2001-09-28';
7175
select date '2001-10-01' - 7;
76+
select date '2001-10-01' - '7';
7277
select date '2001-09-28' + null;
7378
select date '2001-09-28' - null;
7479

80+
-- date add/sub with non-literal string column
81+
create temp view v as select '1' str;
82+
select date_add('2011-11-11', str) from v;
83+
select date_sub('2011-11-11', str) from v;
84+
7585
-- subtract dates
7686
select null - date '2019-10-06';
7787
select date '2001-10-01' - date '2001-09-28';

sql/core/src/test/resources/sql-tests/results/datetime.sql.out

Lines changed: 71 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
-- Automatically generated by SQLQueryTestSuite
2-
-- Number of queries: 77
2+
-- Number of queries: 85
33

44

55
-- !query
@@ -266,10 +266,18 @@ cannot resolve 'date_add(CAST('2011-11-11' AS DATE), 10.0D)' due to data type mi
266266
-- !query
267267
select date_add('2011-11-11', '1')
268268
-- !query schema
269+
struct<date_add(CAST(2011-11-11 AS DATE), 1):date>
270+
-- !query output
271+
2011-11-12
272+
273+
274+
-- !query
275+
select date_add('2011-11-11', '1.2')
276+
-- !query schema
269277
struct<>
270278
-- !query output
271279
org.apache.spark.sql.AnalysisException
272-
cannot resolve 'date_add(CAST('2011-11-11' AS DATE), '1')' due to data type mismatch: argument 2 requires (int or smallint or tinyint) type, however, ''1'' is of string type.; line 1 pos 7
280+
The second argument of 'date_add' function needs to be an integer.;
273281

274282

275283
-- !query
@@ -296,6 +304,23 @@ struct<date_sub(DATE '2011-11-11', 1):date>
296304
2011-11-10
297305

298306

307+
-- !query
308+
select date_sub(date'2011-11-11', '1')
309+
-- !query schema
310+
struct<date_sub(DATE '2011-11-11', 1):date>
311+
-- !query output
312+
2011-11-10
313+
314+
315+
-- !query
316+
select date_sub(date'2011-11-11', '1.2')
317+
-- !query schema
318+
struct<>
319+
-- !query output
320+
org.apache.spark.sql.AnalysisException
321+
The second argument of 'date_sub' function needs to be an integer.;
322+
323+
299324
-- !query
300325
select date_sub(timestamp'2011-11-11', 1)
301326
-- !query schema
@@ -329,6 +354,15 @@ org.apache.spark.sql.AnalysisException
329354
cannot resolve 'date_add(DATE '2011-11-11', 10.0D)' due to data type mismatch: argument 2 requires (int or smallint or tinyint) type, however, '10.0D' is of double type.; line 1 pos 7
330355

331356

357+
-- !query
358+
select date'2011-11-11' + '1'
359+
-- !query schema
360+
struct<>
361+
-- !query output
362+
org.apache.spark.sql.AnalysisException
363+
cannot resolve 'date_add(DATE '2011-11-11', CAST('1' AS DOUBLE))' due to data type mismatch: argument 2 requires (int or smallint or tinyint) type, however, 'CAST('1' AS DOUBLE)' is of double type.; line 1 pos 7
364+
365+
332366
-- !query
333367
select null + date '2001-09-28'
334368
-- !query schema
@@ -361,6 +395,15 @@ struct<date_sub(DATE '2001-10-01', 7):date>
361395
2001-09-24
362396

363397

398+
-- !query
399+
select date '2001-10-01' - '7'
400+
-- !query schema
401+
struct<>
402+
-- !query output
403+
org.apache.spark.sql.AnalysisException
404+
cannot resolve 'date_sub(DATE '2001-10-01', CAST('7' AS DOUBLE))' due to data type mismatch: argument 2 requires (int or smallint or tinyint) type, however, 'CAST('7' AS DOUBLE)' is of double type.; line 1 pos 7
405+
406+
364407
-- !query
365408
select date '2001-09-28' + null
366409
-- !query schema
@@ -377,6 +420,32 @@ struct<date_sub(DATE '2001-09-28', CAST(NULL AS INT)):date>
377420
NULL
378421

379422

423+
-- !query
424+
create temp view v as select '1' str
425+
-- !query schema
426+
struct<>
427+
-- !query output
428+
429+
430+
431+
-- !query
432+
select date_add('2011-11-11', str) from v
433+
-- !query schema
434+
struct<>
435+
-- !query output
436+
org.apache.spark.sql.AnalysisException
437+
cannot resolve 'date_add(CAST('2011-11-11' AS DATE), v.`str`)' due to data type mismatch: argument 2 requires (int or smallint or tinyint) type, however, 'v.`str`' is of string type.; line 1 pos 7
438+
439+
440+
-- !query
441+
select date_sub('2011-11-11', str) from v
442+
-- !query schema
443+
struct<>
444+
-- !query output
445+
org.apache.spark.sql.AnalysisException
446+
cannot resolve 'date_sub(CAST('2011-11-11' AS DATE), v.`str`)' due to data type mismatch: argument 2 requires (int or smallint or tinyint) type, however, 'v.`str`' is of string type.; line 1 pos 7
447+
448+
380449
-- !query
381450
select null - date '2019-10-06'
382451
-- !query schema

sql/core/src/test/resources/sql-tests/results/typeCoercion/native/promoteStrings.sql.out

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -107,7 +107,7 @@ SELECT '1' + cast('2017-12-11 09:30:00' as date) FROM t
107107
struct<>
108108
-- !query output
109109
org.apache.spark.sql.AnalysisException
110-
cannot resolve 'date_add(CAST('2017-12-11 09:30:00' AS DATE), '1')' due to data type mismatch: argument 2 requires (int or smallint or tinyint) type, however, ''1'' is of string type.; line 1 pos 7
110+
cannot resolve 'date_add(CAST('2017-12-11 09:30:00' AS DATE), CAST('1' AS DOUBLE))' due to data type mismatch: argument 2 requires (int or smallint or tinyint) type, however, 'CAST('1' AS DOUBLE)' is of double type.; line 1 pos 7
111111

112112

113113
-- !query
@@ -698,7 +698,7 @@ SELECT cast('2017-12-11 09:30:00' as date) + '1' FROM t
698698
struct<>
699699
-- !query output
700700
org.apache.spark.sql.AnalysisException
701-
cannot resolve 'date_add(CAST('2017-12-11 09:30:00' AS DATE), '1')' due to data type mismatch: argument 2 requires (int or smallint or tinyint) type, however, ''1'' is of string type.; line 1 pos 7
701+
cannot resolve 'date_add(CAST('2017-12-11 09:30:00' AS DATE), CAST('1' AS DOUBLE))' due to data type mismatch: argument 2 requires (int or smallint or tinyint) type, however, 'CAST('1' AS DOUBLE)' is of double type.; line 1 pos 7
702702

703703

704704
-- !query
@@ -790,7 +790,7 @@ SELECT cast('2017-12-11 09:30:00' as date) - '1' FROM t
790790
struct<>
791791
-- !query output
792792
org.apache.spark.sql.AnalysisException
793-
cannot resolve 'date_sub(CAST('2017-12-11 09:30:00' AS DATE), '1')' due to data type mismatch: argument 2 requires (int or smallint or tinyint) type, however, ''1'' is of string type.; line 1 pos 7
793+
cannot resolve 'date_sub(CAST('2017-12-11 09:30:00' AS DATE), CAST('1' AS DOUBLE))' due to data type mismatch: argument 2 requires (int or smallint or tinyint) type, however, 'CAST('1' AS DOUBLE)' is of double type.; line 1 pos 7
794794

795795

796796
-- !query

0 commit comments

Comments
 (0)