-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-39419][SQL] Fix ArraySort to throw an exception when the comparator returns null #36812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
Outdated
Show resolved
Hide resolved
HyukjinKwon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM otherwise. two questions
| argument: Expression, | ||
| function: Expression) | ||
| function: Expression, | ||
| handleComparisonResultNullAsZero: Boolean) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about failOnNullComparisonResult?
| .createWithDefault(false) | ||
|
|
||
| val LEGACY_ARRAY_SORT_FAILS_ON_NULL_COMPARISON_RESULT = | ||
| buildConf("spark.sql.legacy.arraySortFailsOnNullComparisonResult") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel the "fail" behavior is the legacy one from the new conf name. The previous name was good. Or we can follow the other conf style spark.sql.legacy.allowNullComparisonResultInArraySort.
Sorry about the previous trivial comment: #36812 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries, thanks for the suggestions!
spark.sql.legacy.allowNullComparisonResultInArraySort sounds better to me.
| }, | ||
| "NULL_COMPARISON_RESULT" : { | ||
| "message" : [ | ||
| "The comparison result is null. If you want to handle null as 0 (equal), you can set \"<config>\" to \"<value>\"." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, why not hardcode the config name and value here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just thought it could be reusable. Let me just change to hardcode it for now.
|
@ueshin please fix the conflicts, thanks! |
| .doc("When set to false, `array_sort` function throws an error " + | ||
| "if the comparator function returns null. " + | ||
| "If set to true, it restores the legacy behavior that handles null as zero (equal).") | ||
| .version("3.2.2") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @MaxGekk FYI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a 3.3 regression so technically not a release blocker.
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM.
|
thanks, merging to master! @ueshin can you open backport PRs for 3.2 and 3.3? |
…comparator returns null ### What changes were proposed in this pull request? Backport of #36812. Fixes `ArraySort` to throw an exception when the comparator returns `null`. Also updates the doc to follow the corrected behavior. ### Why are the changes needed? When the comparator of `ArraySort` returns `null`, currently it handles it as `0` (equal). According to the doc, ``` It returns -1, 0, or 1 as the first element is less than, equal to, or greater than the second element. If the comparator function returns other values (including null), the function will fail and raise an error. ``` It's fine to return non -1, 0, 1 integers to follow the Java convention (still need to update the doc, though), but it should throw an exception for `null` result. ### Does this PR introduce _any_ user-facing change? Yes, if a user uses a comparator that returns `null`, it will throw an error after this PR. The legacy flag `spark.sql.legacy.allowNullComparisonResultInArraySort` can be used to restore the legacy behavior that handles `null` as `0` (equal). ### How was this patch tested? Added some tests. Closes #36834 from ueshin/issues/SPARK-39419/3.3/array_sort. Authored-by: Takuya UESHIN <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
…comparator returns null ### What changes were proposed in this pull request? Backport of #36812. Fixes `ArraySort` to throw an exception when the comparator returns `null`. Also updates the doc to follow the corrected behavior. ### Why are the changes needed? When the comparator of `ArraySort` returns `null`, currently it handles it as `0` (equal). According to the doc, ``` It returns -1, 0, or 1 as the first element is less than, equal to, or greater than the second element. If the comparator function returns other values (including null), the function will fail and raise an error. ``` It's fine to return non -1, 0, 1 integers to follow the Java convention (still need to update the doc, though), but it should throw an exception for `null` result. ### Does this PR introduce _any_ user-facing change? Yes, if a user uses a comparator that returns `null`, it will throw an error after this PR. The legacy flag `spark.sql.legacy.allowNullComparisonResultInArraySort` can be used to restore the legacy behavior that handles `null` as `0` (equal). ### How was this patch tested? Added some tests. Closes #36835 from ueshin/issues/SPARK-39419/3.2/array_sort. Authored-by: Takuya UESHIN <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
…comparator returns null Backport of apache#36812. Fixes `ArraySort` to throw an exception when the comparator returns `null`. Also updates the doc to follow the corrected behavior. When the comparator of `ArraySort` returns `null`, currently it handles it as `0` (equal). According to the doc, ``` It returns -1, 0, or 1 as the first element is less than, equal to, or greater than the second element. If the comparator function returns other values (including null), the function will fail and raise an error. ``` It's fine to return non -1, 0, 1 integers to follow the Java convention (still need to update the doc, though), but it should throw an exception for `null` result. Yes, if a user uses a comparator that returns `null`, it will throw an error after this PR. The legacy flag `spark.sql.legacy.allowNullComparisonResultInArraySort` can be used to restore the legacy behavior that handles `null` as `0` (equal). Added some tests. Closes apache#36835 from ueshin/issues/SPARK-39419/3.2/array_sort. Authored-by: Takuya UESHIN <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
Fixes
ArraySortto throw an exception when the comparator returnsnull.Also updates the doc to follow the corrected behavior.
Why are the changes needed?
When the comparator of
ArraySortreturnsnull, currently it handles it as0(equal).According to the doc,
It's fine to return non -1, 0, 1 integers to follow the Java convention (still need to update the doc, though), but it should throw an exception for
nullresult.Does this PR introduce any user-facing change?
Yes, if a user uses a comparator that returns
null, it will throw an error after this PR.The legacy flag
spark.sql.legacy.allowNullComparisonResultInArraySortcan be used to restore the legacy behavior that handlesnullas0(equal).How was this patch tested?
Added some tests.