Skip to content

Conversation

@leesf
Copy link
Contributor

@leesf leesf commented Feb 24, 2022

What changes were proposed in this pull request?

Remove deprecated logic in SortMergeJoinExec in full outer sort merge, while the PR handles compare result separately, so the original code to handle < logic would be removed.

Why are the changes needed?

Make code cleaner

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existing tests.

@github-actions github-actions bot added the SQL label Feb 24, 2022
@leesf
Copy link
Contributor Author

leesf commented Feb 24, 2022

@c21 @cloud-fan would you please help to review the PR when you have time? Thanks in advance.

@leesf leesf force-pushed the SPARK-38313 branch 2 times, most recently from 808ae71 to 51d3465 Compare February 24, 2022 11:16
Copy link
Contributor

@c21 c21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's safe to remove leftMatched/rightMatched. These two bit sets are used to record whether the rows in leftMatches/rightMatches has matched row or not. For unmatched rows, FULL OUTER join still needs to output one row with null values set on the other side. We still needs the bit sets here because rows in buffers (leftMatches/rightMatches) are having same join keys values, but may still not match because the join condition needs to be passed.

@leesf
Copy link
Contributor Author

leesf commented Feb 25, 2022

join still needs to output one row

@c21 Thanks for the comment. However in which case will it exists unmatched rows? rows in leftMatches/rightMatches are the same, so it should be all matched. And also would you please explain more about the but may still not match because the join condition needs to be passed.

@cloud-fan
Copy link
Contributor

There are test failures, looks like this change breaks some queries.

@c21
Copy link
Contributor

c21 commented Feb 25, 2022

However in which case will it exists unmatched rows?

@leesf - Beside the equi join condition, we also have non-equi condition. e.g.

SELECT *
FROM t1
JOIN t2
ON 
  t1.k = t2.k
  AND t1.v != t2.v     <- this is the non-equi condition

So rows in left, right buffers have same values for join keys (i.e. k column here), but not necessarily pass the non-equi join condition (i.e. v column here). SortMergeJoinExec.condition is the non-equi condition (i.e. t1.v != t2.v here).

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@leesf
Copy link
Contributor Author

leesf commented Feb 28, 2022

However in which case will it exists unmatched rows?

@leesf - Beside the equi join condition, we also have non-equi condition. e.g.

SELECT *
FROM t1
JOIN t2
ON 
  t1.k = t2.k
  AND t1.v != t2.v     <- this is the non-equi condition

So rows in left, right buffers have same values for join keys (i.e. k column here), but not necessarily pass the non-equi join condition (i.e. v column here). SortMergeJoinExec.condition is the non-equi condition (i.e. t1.v != t2.v here).

Thanks for you explanation and example, you are right and i am going to close the PR.

@leesf leesf closed this Feb 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants