-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-31830][SQL] Consistent error handling for datetime formatting and parsing functions #28650
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
retest this please |
|
Test build #123160 has finished for PR 28650 at commit
|
|
Test build #123165 has finished for PR 28650 at commit
|
|
retest this please |
|
cc @cloud-fan thanks. |
|
retest this please |
|
Test build #123182 has finished for PR 28650 at commit
|
|
Test build #123213 has finished for PR 28650 at commit
|
| } | ||
| // Test escaping of format | ||
| GenerateUnsafeProjection.generate(FromUnixTime(Literal(0L), Literal("\"quote"), UTC_OPT) :: Nil) | ||
| GenerateUnsafeProjection.generate(FromUnixTime(Literal(0L), Literal("\""), UTC_OPT) :: Nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
quote contains invalid format patterns, removing it does not affect the purpose of this test case.
| checkEvaluation( | ||
| FromUnixTime(Literal(1000L), Literal.create(null, StringType), timeZoneId), | ||
| null) | ||
| checkEvaluation( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests for invalid formats are gathered into this case https://github.com/apache/spark/pull/28650/files#diff-3cae35e69eb5cf9538d1d0947c13d8ebR1165.
| UnixTimestamp(Literal(date1), Literal.create(null, StringType), timeZoneId), | ||
| MICROSECONDS.toSeconds( | ||
| DateTimeUtils.daysToMicros(DateTimeUtils.fromJavaDate(date1), tz.toZoneId))) | ||
| checkEvaluation( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests for invalid formats are gathered into this case https://github.com/apache/spark/pull/28650/files#diff-3cae35e69eb5cf9538d1d0947c13d8ebR1165.
| Literal(date1), Literal.create(null, StringType), timeZoneId), | ||
| MICROSECONDS.toSeconds( | ||
| DateTimeUtils.daysToMicros(DateTimeUtils.fromJavaDate(date1), zid))) | ||
| checkEvaluation( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests for invalid formats are gathered into this case https://github.com/apache/spark/pull/28650/files#diff-3cae35e69eb5cf9538d1d0947c13d8ebR1165.
| select from_csv('26/October/2015', 'time Timestamp', map('timestampFormat', 'dd/MMMMM/yyyy')); | ||
| select from_csv('26/October/2015', 'date Date', map('dateFormat', 'dd/MMMMM/yyyy')); | ||
|
|
||
| select date_format(null, null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests were added because datetime function end-2-end tests were missing so far
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can add from_utc_timestamp and to_utc_timestamp here later too
|
Test build #123217 has finished for PR 28650 at commit
|
|
retest this please |
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
Show resolved
Hide resolved
| zoneId = zoneId, | ||
| legacyFormat = SIMPLE_DATE_FORMAT, | ||
| needVarLengthSecondFraction = isParsing) | ||
| formatter.validatePatternString() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need this? It's already done in TimestampFormatter.apply AFAIK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TimestampFormatter.apply works for the new Parser. Adding this to fail the legacy formatter here will give us a narrower scope of influence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we call validatePatternString in TimestampFormatter.apply for legacy parser as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me push a commit to verify with jenkins
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks ok through Jenkins
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
Outdated
Show resolved
Hide resolved
|
Test build #123230 has finished for PR 28650 at commit
|
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
Show resolved
Hide resolved
Why we don't throw |
that logic belongs to |
|
So we didn't fail when we construct the formatter with |
|
No the pattern it will do the same as |
|
But maybe we should apply |
|
Test build #123239 has finished for PR 28650 at commit
|
Yea, can you open a PR for it? We should merge that to 3.0. |
|
OK |
|
Since this PR is for master only, let's fix the |
|
Test build #123674 has finished for PR 28650 at commit
|
| @transient lazy val zoneId: ZoneId = DateTimeUtils.getZoneId(timeZoneId.get) | ||
| } | ||
|
|
||
| trait TimestampFormatterHelper extends TimeZoneAwareExpression { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have expressions that create DateFormatter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no datetime functions, only csv/json ones
| ToUnixTimestamp(Literal("1"), Literal(c.toString)), "3.0") | ||
| checkExceptionInExpression[SparkUpgradeException]( | ||
| UnixTimestamp(Literal("1"), Literal(c.toString)), "3.0") | ||
| def checkException[T <: Exception : ClassTag](c: String, onlyParsing: Boolean = false): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need the onlyParsing flag? we can get it by Seq('E', 'F', 'q', 'Q').contains(c)
| Row(null), Row(null), Row(null), Row(null))) | ||
| val invalid = df1.selectExpr(s"to_unix_timestamp(x, 'yyyy-MM-dd bb:HH:ss')") | ||
| val e = intercept[IllegalArgumentException](invalid.collect()) | ||
| assert(e.getMessage.contains('b')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we check more in the error message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just pick the intersection of Illegal pattern character 'b' and Unknown pattern letter: b
| select to_timestamp("2019 10:10:10", "yyyy hh:mm:ss"); | ||
|
|
||
| -- Unsupported narrow text style | ||
| select date_format(date '2020-05-23', 'GGGGG'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why we remove these tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
datetime-formatting-invalid.sql should be enough to cover these cases
|
Test build #123680 has finished for PR 28650 at commit
|
| case class FromUnixTime(sec: Expression, format: Expression, timeZoneId: Option[String] = None) | ||
| extends BinaryExpression with TimeZoneAwareExpression with ImplicitCastInputTypes { | ||
| extends BinaryExpression with TimestampFormatterHelper with ImplicitCastInputTypes | ||
| with NullIntolerant { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we add the NullIntolerant in the base class TimestampFormatterHelper? and it seems better to do it in a new PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copy that.
|
Test build #123687 has finished for PR 28650 at commit
|
|
Test build #123689 has finished for PR 28650 at commit
|
|
thanks, merging to master! |
What changes were proposed in this pull request?
Currently,
date_formatandfrom_unixtime,unix_timestamp,to_unix_timestamp,to_timestamp,to_datehave different exception handling behavior for formatting datetime values.In this PR, we apply the exception handling behavior of
date_formattofrom_unixtime,unix_timestamp,to_unix_timestamp,to_timestampandto_date.In the phase of creating the datetime formatted or formating, exceptions will be raised.
e.g.
In the phase of parsing,
DateTimeParseException | DateTimeException | ParseExceptionwill be suppressed, butSparkUpgradeExceptionwill still be raisede.g.
Why are the changes needed?
Consistency
Does this PR introduce any user-facing change?
Yes, invalid datetime patterns will fail
from_unixtime,unix_timestamp,to_unix_timestamp,to_timestampandto_dateinstead of resultingNULLHow was this patch tested?
add more tests