Skip to content

Conversation

@ueshin
Copy link
Member

@ueshin ueshin commented Jun 8, 2022

What changes were proposed in this pull request?

Fixes ArraySort to throw an exception when the comparator returns null.

Also updates the doc to follow the corrected behavior.

Why are the changes needed?

When the comparator of ArraySort returns null, currently it handles it as 0 (equal).

According to the doc,

It returns -1, 0, or 1 as the first element is less than, equal to, or greater than
the second element. If the comparator function returns other
values (including null), the function will fail and raise an error.

It's fine to return non -1, 0, 1 integers to follow the Java convention (still need to update the doc, though), but it should throw an exception for null result.

Does this PR introduce any user-facing change?

Yes, if a user uses a comparator that returns null, it will throw an error after this PR.

The legacy flag spark.sql.legacy.allowNullComparisonResultInArraySort can be used to restore the legacy behavior that handles null as 0 (equal).

How was this patch tested?

Added some tests.

@github-actions github-actions bot added the SQL label Jun 8, 2022
Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM otherwise. two questions

argument: Expression,
function: Expression)
function: Expression,
handleComparisonResultNullAsZero: Boolean)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about failOnNullComparisonResult?

@github-actions github-actions bot added the CORE label Jun 9, 2022
.createWithDefault(false)

val LEGACY_ARRAY_SORT_FAILS_ON_NULL_COMPARISON_RESULT =
buildConf("spark.sql.legacy.arraySortFailsOnNullComparisonResult")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel the "fail" behavior is the legacy one from the new conf name. The previous name was good. Or we can follow the other conf style spark.sql.legacy.allowNullComparisonResultInArraySort.
Sorry about the previous trivial comment: #36812 (comment)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries, thanks for the suggestions!
spark.sql.legacy.allowNullComparisonResultInArraySort sounds better to me.

},
"NULL_COMPARISON_RESULT" : {
"message" : [
"The comparison result is null. If you want to handle null as 0 (equal), you can set \"<config>\" to \"<value>\"."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, why not hardcode the config name and value here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just thought it could be reusable. Let me just change to hardcode it for now.

@cloud-fan
Copy link
Contributor

@ueshin please fix the conflicts, thanks!

.doc("When set to false, `array_sort` function throws an error " +
"if the comparator function returns null. " +
"If set to true, it restores the legacy behavior that handles null as zero (equal).")
.version("3.2.2")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @MaxGekk FYI

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a 3.3 regression so technically not a release blocker.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

@cloud-fan cloud-fan closed this in 6526172 Jun 10, 2022
@cloud-fan
Copy link
Contributor

thanks, merging to master! @ueshin can you open backport PRs for 3.2 and 3.3?

dongjoon-hyun pushed a commit that referenced this pull request Jun 10, 2022
…comparator returns null

### What changes were proposed in this pull request?

Backport of #36812.

Fixes `ArraySort` to throw an exception when the comparator returns `null`.

Also updates the doc to follow the corrected behavior.

### Why are the changes needed?

When the comparator of `ArraySort` returns `null`, currently it handles it as `0` (equal).

According to the doc,

```
It returns -1, 0, or 1 as the first element is less than, equal to, or greater than
the second element. If the comparator function returns other
values (including null), the function will fail and raise an error.
```

It's fine to return non -1, 0, 1 integers to follow the Java convention (still need to update the doc, though), but it should throw an exception for `null` result.

### Does this PR introduce _any_ user-facing change?

Yes, if a user uses a comparator that returns `null`, it will throw an error after this PR.

The legacy flag `spark.sql.legacy.allowNullComparisonResultInArraySort` can be used to restore the legacy behavior that handles `null` as `0` (equal).

### How was this patch tested?

Added some tests.

Closes #36834 from ueshin/issues/SPARK-39419/3.3/array_sort.

Authored-by: Takuya UESHIN <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
dongjoon-hyun pushed a commit that referenced this pull request Jun 10, 2022
…comparator returns null

### What changes were proposed in this pull request?

Backport of #36812.

Fixes `ArraySort` to throw an exception when the comparator returns `null`.

Also updates the doc to follow the corrected behavior.

### Why are the changes needed?

When the comparator of `ArraySort` returns `null`, currently it handles it as `0` (equal).

According to the doc,

```
It returns -1, 0, or 1 as the first element is less than, equal to, or greater than
the second element. If the comparator function returns other
values (including null), the function will fail and raise an error.
```

It's fine to return non -1, 0, 1 integers to follow the Java convention (still need to update the doc, though), but it should throw an exception for `null` result.

### Does this PR introduce _any_ user-facing change?

Yes, if a user uses a comparator that returns `null`, it will throw an error after this PR.

The legacy flag `spark.sql.legacy.allowNullComparisonResultInArraySort` can be used to restore the legacy behavior that handles `null` as `0` (equal).

### How was this patch tested?

Added some tests.

Closes #36835 from ueshin/issues/SPARK-39419/3.2/array_sort.

Authored-by: Takuya UESHIN <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
sunchao pushed a commit to sunchao/spark that referenced this pull request Jun 2, 2023
…comparator returns null

Backport of apache#36812.

Fixes `ArraySort` to throw an exception when the comparator returns `null`.

Also updates the doc to follow the corrected behavior.

When the comparator of `ArraySort` returns `null`, currently it handles it as `0` (equal).

According to the doc,

```
It returns -1, 0, or 1 as the first element is less than, equal to, or greater than
the second element. If the comparator function returns other
values (including null), the function will fail and raise an error.
```

It's fine to return non -1, 0, 1 integers to follow the Java convention (still need to update the doc, though), but it should throw an exception for `null` result.

Yes, if a user uses a comparator that returns `null`, it will throw an error after this PR.

The legacy flag `spark.sql.legacy.allowNullComparisonResultInArraySort` can be used to restore the legacy behavior that handles `null` as `0` (equal).

Added some tests.

Closes apache#36835 from ueshin/issues/SPARK-39419/3.2/array_sort.

Authored-by: Takuya UESHIN <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants