Skip to content

Conversation

@itholic
Copy link
Contributor

@itholic itholic commented Sep 25, 2024

What changes were proposed in this pull request?

This PR proposes to provide proper error conditions for _LEGACY_ERROR_TEMP_2000 and improve the error message

Why are the changes needed?

To provide better user-facing error message

Does this PR introduce any user-facing change?

No API changes, but the user-facing error message will be improved:

Before

>>> spark.sql("SELECT make_date(2024, 13, 1);").show()
Invalid value for MonthOfYear (valid values 1 - 12): 13. If necessary set "spark.sql.ansi.enabled" to false to bypass this error.

After

>>> spark.sql("SELECT make_date(2024, 13, 1);").show()
[DATE_TIME_FIELD_OUT_OF_BOUNDS] Invalid value for datetime field. If necessary set "spark.sql.ansi.enabled" to false to bypass this error. SQLSTATE: 22008

How was this patch tested?

The existing SQL tests should pass

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Sep 25, 2024
@itholic
Copy link
Contributor Author

itholic commented Sep 25, 2024

cc @srielau @cloud-fan

Copy link
Contributor

@srielau srielau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please compose an appropriate parameterized message taking: unit, upper and lower bound, bad value

@itholic
Copy link
Contributor Author

itholic commented Sep 26, 2024

Thanks @srielau for the review! Just adjusted the comments :)

Copy link
Contributor

@srielau srielau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error message needs more infromation

},
"DATE_TIME_FIELD_OUT_OF_BOUNDS" : {
"message" : [
"The value '<badValue>' you entered for <unit> is not valid. Please provide a value between <range>. To disable strict validation, you can turn off ANSI mode by setting <ansiConfig> to false."
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @srielau updated the error message including more parameters such as unit, range and badValue

Copy link
Contributor

@srielau srielau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not inherit SecondOfMinute etc... produce your own ranges so we can use '...' which is more readable that the '-' which reads like arithmetric.
Something odd about that Feb 30 error. Clearly that is not an internal error.

@itholic
Copy link
Contributor Author

itholic commented Oct 3, 2024

Please do not inherit SecondOfMinute etc... produce your own ranges so we can use '...' which is more readable that the '-' which reads like arithmetric.
Something odd about that Feb 30 error. Clearly that is not an internal error.

@srielau applied the comments. Thanks for the review!

(unit, formattedRange, badValue)
case datePattern(badDate) =>
("DAY", "1 ... 28/31", badDate)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@itholic Do you expect that it should match to both the cases above. What if not? I would add a default case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, toInstant can raise:

        if (seconds < MIN_SECOND || seconds > MAX_SECOND) {
            throw new DateTimeException("Instant exceeds minimum or maximum instant");
        }

How does your code handles this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.

I think DATETIME_FIELD_OUT_OF_BOUNDS is not very proper error class to cover the example case, so let me leave the existing _LEGACY_ERROR_TEMP_2000 as it is for default case.

We may need introduce several new error classes to cover the all potential exception cases along with more test cases in a separate ticket.

Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, add a default case in parsing error messages.

Copy link
Contributor

@mihailom-db mihailom-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@itholic thanks for taking up this ticket, I left some suggestions, as I feel we need to think carefully about this change, not to leave some cases unhandled as it would be hard to fix/improve it later, without special requests from users, who might not be completely aware of how our error message system works

val formattedRange = range.replace(" - ", " ... ")
(unit, formattedRange, badValue)
case datePattern(badDate) =>
("DAY", "1 ... 28/31", badDate)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really weird message now. The error we are getting here now is totally relentless of leap years. Java library returns special case errors when we have leap years, look at LocalDate.create.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, I find it weird that we are removing extra information which java is giving us for free.

case datePattern(badDate) =>
("DAY", "1 ... 28/31", badDate)
case _ =>
throw new SparkDateTimeException(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srielau @gengliangwang let's link issues to the proper tickets and not open duplicate tickets. The reason for not having a error condition is that _LEGACY_ERROR_TEMP_#### errors do not show error condition. There is an existing ticket https://issues.apache.org/jira/browse/SPARK-49768 which was made to correct this. Also, could you please use the epic ANSI by default in JIRA if the change is related to ANSI, as this will make us not create multiple PRs/tickets for same issues. Also, @itholic leaving default branch like this does not solve the issue. We need to make sure we assign a proper error condition for all cases, which in this case we are not doing. (Example is if you try february 29th on a non-leap year) Ultimate goal is to remove _LEGACY_ERROR_TEMP_2000. @MaxGekk what do you think? I find it weird to have a method in QueryExecutionErrors throw completely new error just because of the error message parsing.

},
"DATETIME_FIELD_OUT_OF_BOUNDS" : {
"message" : [
"The value <badValue> you entered for <unit> is not valid. Please provide a value between <range>. To disable strict validation, you can turn off ANSI mode by setting <ansiConfig> to \"false\"."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please follow the same way of providing ANSI turn off suggestion, as it makes it easier to track what error messages are related to ANSI. If necessary set <ansiConfig> to "false" to bypass this error. Removal of this message is related to https://issues.apache.org/jira/browse/SPARK-49642

val valuePattern = "Invalid value for ([A-Za-z]+) \\(valid values (.+)\\): (.+)".r
val datePattern = "Invalid date '[A-Z]+ ([0-9]+)'".r

errorMessage match {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srielau If I understood @MaxGekk we want to get away from this message creation in the specific calls of the error, but actually keep all the info in error-conditions.json. This will make it really hard to sync with other APIs, especially as we are expanding these endless number of exception messages that we can get from the java library. Also, since this is external dependancy, who is to guarantee that this message format will stay the same when java updates.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The guarantee is the tests we write against them. I'm not opposed to having catch all for the "unforeseen".
Simply put: We should be able to rip out Java and replace it with C and messages should stay the same.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, but are we really going for this over getting as much information as we can provide to the user? For example there is a specific exception that is returning leap year error, but we will just say this should be in range. If I got a message like this, tbh I would be confused as to why it is not working when my value actually is in 1...28/31 range (29th february). We could do this changing for errors that we know for sure how they behave, but for all others, I lean to leaving the java message, as it provides sufficient info on why it is failing and end goal should be providing user with sufficient info to fix the error imo.

@mihailom-db
Copy link
Contributor

@itholic please update description of the PR to follow the changes added.

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants