-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-49768][SQL] Provide error conditions for make_date/make_timestamp errors _LEGACY_ERROR_TEMP_2000
#48242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…amp errors _LEGACY_ERROR_TEMP_2000
srielau
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please compose an appropriate parameterized message taking: unit, upper and lower bound, bad value
|
Thanks @srielau for the review! Just adjusted the comments :) |
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala
Outdated
Show resolved
Hide resolved
srielau
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error message needs more infromation
| }, | ||
| "DATE_TIME_FIELD_OUT_OF_BOUNDS" : { | ||
| "message" : [ | ||
| "The value '<badValue>' you entered for <unit> is not valid. Please provide a value between <range>. To disable strict validation, you can turn off ANSI mode by setting <ansiConfig> to false." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @srielau updated the error message including more parameters such as unit, range and badValue
...catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala
Outdated
Show resolved
Hide resolved
...catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala
Outdated
Show resolved
Hide resolved
...catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala
Outdated
Show resolved
Hide resolved
...catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala
Outdated
Show resolved
Hide resolved
...catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
Show resolved
Hide resolved
sql/core/src/test/resources/sql-tests/results/ansi/date.sql.out
Outdated
Show resolved
Hide resolved
sql/core/src/test/resources/sql-tests/results/ansi/date.sql.out
Outdated
Show resolved
Hide resolved
sql/core/src/test/resources/sql-tests/results/ansi/timestamp.sql.out
Outdated
Show resolved
Hide resolved
sql/core/src/test/resources/sql-tests/results/postgreSQL/date.sql.out
Outdated
Show resolved
Hide resolved
srielau
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do not inherit SecondOfMinute etc... produce your own ranges so we can use '...' which is more readable that the '-' which reads like arithmetric.
Something odd about that Feb 30 error. Clearly that is not an internal error.
@srielau applied the comments. Thanks for the review! |
sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
Outdated
Show resolved
Hide resolved
| (unit, formattedRange, badValue) | ||
| case datePattern(badDate) => | ||
| ("DAY", "1 ... 28/31", badDate) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@itholic Do you expect that it should match to both the cases above. What if not? I would add a default case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, toInstant can raise:
if (seconds < MIN_SECOND || seconds > MAX_SECOND) {
throw new DateTimeException("Instant exceeds minimum or maximum instant");
}
How does your code handles this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point.
I think DATETIME_FIELD_OUT_OF_BOUNDS is not very proper error class to cover the example case, so let me leave the existing _LEGACY_ERROR_TEMP_2000 as it is for default case.
We may need introduce several new error classes to cover the all potential exception cases along with more test cases in a separate ticket.
MaxGekk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, add a default case in parsing error messages.
mihailom-db
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@itholic thanks for taking up this ticket, I left some suggestions, as I feel we need to think carefully about this change, not to leave some cases unhandled as it would be hard to fix/improve it later, without special requests from users, who might not be completely aware of how our error message system works
| val formattedRange = range.replace(" - ", " ... ") | ||
| (unit, formattedRange, badValue) | ||
| case datePattern(badDate) => | ||
| ("DAY", "1 ... 28/31", badDate) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really weird message now. The error we are getting here now is totally relentless of leap years. Java library returns special case errors when we have leap years, look at LocalDate.create.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify, I find it weird that we are removing extra information which java is giving us for free.
| case datePattern(badDate) => | ||
| ("DAY", "1 ... 28/31", badDate) | ||
| case _ => | ||
| throw new SparkDateTimeException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@srielau @gengliangwang let's link issues to the proper tickets and not open duplicate tickets. The reason for not having a error condition is that _LEGACY_ERROR_TEMP_#### errors do not show error condition. There is an existing ticket https://issues.apache.org/jira/browse/SPARK-49768 which was made to correct this. Also, could you please use the epic ANSI by default in JIRA if the change is related to ANSI, as this will make us not create multiple PRs/tickets for same issues. Also, @itholic leaving default branch like this does not solve the issue. We need to make sure we assign a proper error condition for all cases, which in this case we are not doing. (Example is if you try february 29th on a non-leap year) Ultimate goal is to remove _LEGACY_ERROR_TEMP_2000. @MaxGekk what do you think? I find it weird to have a method in QueryExecutionErrors throw completely new error just because of the error message parsing.
| }, | ||
| "DATETIME_FIELD_OUT_OF_BOUNDS" : { | ||
| "message" : [ | ||
| "The value <badValue> you entered for <unit> is not valid. Please provide a value between <range>. To disable strict validation, you can turn off ANSI mode by setting <ansiConfig> to \"false\"." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please follow the same way of providing ANSI turn off suggestion, as it makes it easier to track what error messages are related to ANSI. If necessary set <ansiConfig> to "false" to bypass this error. Removal of this message is related to https://issues.apache.org/jira/browse/SPARK-49642
| val valuePattern = "Invalid value for ([A-Za-z]+) \\(valid values (.+)\\): (.+)".r | ||
| val datePattern = "Invalid date '[A-Z]+ ([0-9]+)'".r | ||
|
|
||
| errorMessage match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@srielau If I understood @MaxGekk we want to get away from this message creation in the specific calls of the error, but actually keep all the info in error-conditions.json. This will make it really hard to sync with other APIs, especially as we are expanding these endless number of exception messages that we can get from the java library. Also, since this is external dependancy, who is to guarantee that this message format will stay the same when java updates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The guarantee is the tests we write against them. I'm not opposed to having catch all for the "unforeseen".
Simply put: We should be able to rip out Java and replace it with C and messages should stay the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, but are we really going for this over getting as much information as we can provide to the user? For example there is a specific exception that is returning leap year error, but we will just say this should be in range. If I got a message like this, tbh I would be confused as to why it is not working when my value actually is in 1...28/31 range (29th february). We could do this changing for errors that we know for sure how they behave, but for all others, I lean to leaving the java message, as it provides sufficient info on why it is failing and end goal should be providing user with sufficient info to fix the error imo.
|
@itholic please update description of the PR to follow the changes added. |
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
What changes were proposed in this pull request?
This PR proposes to provide proper error conditions for
_LEGACY_ERROR_TEMP_2000and improve the error messageWhy are the changes needed?
To provide better user-facing error message
Does this PR introduce any user-facing change?
No API changes, but the user-facing error message will be improved:
Before
After
How was this patch tested?
The existing SQL tests should pass
Was this patch authored or co-authored using generative AI tooling?
No