[SPARK-46485][SQL][3.5] V1Write should not add Sort when not needed #52692

pan3793 · 2025-10-22T07:54:20Z

Backport #44458 to branch-3.5.

Justification: it fixes a hidden bug (until exposed by #44429) that has existed since 3.4.

What changes were proposed in this pull request?

In V1Writes, we try to avoid adding Sort if the output ordering always satisfies. However, the code is completely broken with two issues:

we put SortOrder as the child of another SortOrder and compare, which always returns false.
once we add a project to do empty2null, we change the query output attribute id and the sort order never matches.

It's not a big issue as we still have QO rules to eliminate useless sorts, but #44429 exposes this problem because the way we optimize sort is a bit different. For V1Writes, we should always avoid adding sort even if the number of ordering key is less, to not change the user query.

Why are the changes needed?

fix code mistakes.

Does this PR introduce any user-facing change?

no

How was this patch tested?

updated test

Was this patch authored or co-authored using generative AI tooling?

no

### What changes were proposed in this pull request? In `V1Writes`, we try to avoid adding Sort if the output ordering always satisfies. However, the code is completely broken with two issues: - we put `SortOrder` as the child of another `SortOrder` and compare, which always returns false. - once we add a project to do `empty2null`, we change the query output attribute id and the sort order never matches. It's not a big issue as we still have QO rules to eliminate useless sorts, but apache#44429 exposes this problem because the way we optimize sort is a bit different. For `V1Writes`, we should always avoid adding sort even if the number of ordering key is less, to not change the user query. ### Why are the changes needed? fix code mistakes. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? updated test ### Was this patch authored or co-authored using generative AI tooling? no Closes apache#44458 from cloud-fan/sort. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

pan3793 · 2025-10-22T07:54:53Z

cc @cloud-fan @peter-toth

cloud-fan · 2025-10-22T08:18:53Z

do we need to cherry-pick the test change as well?

peter-toth · 2025-10-22T08:22:09Z

do we need to cherry-pick the test change as well?

No because, https://github.com/apache/spark/pull/44429/files#diff-0779e15156e4b0580442d540cc5f60fcee06a454b7f5fcbc4be67f33cec6069bR226-R243 never landed on 3.5. The test change just reverts that.

Backport #44458 to branch-3.5. Justification: it fixes a hidden bug (until exposed by #44429) that has existed since 3.4. ### What changes were proposed in this pull request? In `V1Writes`, we try to avoid adding Sort if the output ordering always satisfies. However, the code is completely broken with two issues: - we put `SortOrder` as the child of another `SortOrder` and compare, which always returns false. - once we add a project to do `empty2null`, we change the query output attribute id and the sort order never matches. It's not a big issue as we still have QO rules to eliminate useless sorts, but #44429 exposes this problem because the way we optimize sort is a bit different. For `V1Writes`, we should always avoid adding sort even if the number of ordering key is less, to not change the user query. ### Why are the changes needed? fix code mistakes. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? updated test ### Was this patch authored or co-authored using generative AI tooling? no Closes #52692 from pan3793/SPARK-46485-3.5. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Peter Toth <[email protected]>

peter-toth · 2025-10-22T13:00:56Z

Thanks, merged to branch-3.5 (3.5.8).

@pan3793, can you please open a SPARK-53738 / #52584 backport PR as well?

pan3793 · 2025-10-22T13:06:22Z

@peter-toth thanks, I have opened #52697 for SPARK-53738

github-actions bot added the SQL label Oct 22, 2025

pan3793 changed the title ~~[SPARK-46485][SQL] V1Write should not add Sort when not needed~~ [SPARK-46485][SQL][3.5] V1Write should not add Sort when not needed Oct 22, 2025

peter-toth approved these changes Oct 22, 2025

View reviewed changes

cloud-fan approved these changes Oct 22, 2025

View reviewed changes

peter-toth closed this Oct 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-46485][SQL][3.5] V1Write should not add Sort when not needed #52692

[SPARK-46485][SQL][3.5] V1Write should not add Sort when not needed #52692

pan3793 commented Oct 22, 2025

Uh oh!

pan3793 commented Oct 22, 2025

Uh oh!

cloud-fan commented Oct 22, 2025

Uh oh!

peter-toth commented Oct 22, 2025

Uh oh!

peter-toth commented Oct 22, 2025 •

edited

Loading

Uh oh!

pan3793 commented Oct 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-46485][SQL][3.5] V1Write should not add Sort when not needed #52692

[SPARK-46485][SQL][3.5] V1Write should not add Sort when not needed #52692

Conversation

pan3793 commented Oct 22, 2025

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

pan3793 commented Oct 22, 2025

Uh oh!

cloud-fan commented Oct 22, 2025

Uh oh!

peter-toth commented Oct 22, 2025

Uh oh!

peter-toth commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pan3793 commented Oct 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

peter-toth commented Oct 22, 2025 •

edited

Loading