-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-42851][SQL] Replace EquivalentExpressions with mutable map in PhysicalAggregation #40488
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-42851][SQL] Replace EquivalentExpressions with mutable map in PhysicalAggregation #40488
Conversation
e8f0649 to
ff974fa
Compare
|
@rednaxelafx, @cloud-fan let me know it this PR is a viable alternative to #40473. Or maybe if I should do a little cleanup like peter-toth@90421cb in this or in a follow-up PR... |
ff974fa to
345b9b5
Compare
|
Before the recent rounds of changes to EquivalentExpressions, the old Your proposed PR here further orphans that function from any actual use. Which is okay for keeping binary compatibility as much as possible. BTW I updated my PR's test case because it makes more sense to check the return value from |
| expr.collect { | ||
| // addExpr() always returns false for non-deterministic expressions and do not add them. | ||
| case a | ||
| if AggregateExpression.isAggregate(a) && !equivalentAggregateExpressions.addExpr(a) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's wrong with addExpr here? It does simplify the code IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The line of thought would be: adding the supportedExpression guard to addExpr() would cause performance regression, so let's just close our eyes and make the only remaining use of addExpr break away and do its own deduplication in the old logic without taking things like NamedLambdaVariable into account -- which is the way it's been for quite a few releases. This PR essentially inlines the addExpr path of the old EquivalentExpressions into PhysicalAggregation to recover what it used to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides the above, although .addExpr() fits here well and does the job, isn't it a bit weird that an add-like method of a collection-like object doesn't return true when a new item was added, but actually it flips the meaning of the return value? If it was used at multiple places then I would keep it, but we use it only here. But maybe I'm just nitpicking...
Anyways, I'm ok with #40473 too.
What changes were proposed in this pull request?
This PR proposes to replace
EquivalentExpressionsto a simple mutable map inPhysicalAggregation, the only place whereEquivalentExpressions.addExpr()is used.EquivalentExpressionsis useful for common subexpression elimination but inPhysicalAggregationit is used only to deduplicate whole expressions which can be easily done with a simple map.Why are the changes needed?
EquivalentExpressions.addExpr()is not guarded bysupportedExpression()and so it can cause inconsistent results when used together withEquivalentExpressions.getExprState(). This PR proposes replacing.addExpr()with other alternatives as its boolean result is a bit counter-intuitive to other collections'.add()methods. It returnsfalseif the expression was missing and either adds the expression or not depending on if the expression is deterministic.After this PR we no longer use
EquivalentExpressions.addExpr()so it can be deprecated or even removed...Does this PR introduce any user-facing change?
No.
How was this patch tested?
Added new UTs from @rednaxelafx's PR: #40473. Please note that those UTs actually pass after #40475, but they are added here to make sure there will be no regression in the future.