-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-26735][SQL] Verify plan integrity for special expressions #23658
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: use find?
|
I like this check! It can save us a lot of time when debugging! |
|
LGTM except a code style comment. |
|
Test build #101699 has finished for PR 23658 at commit
|
…wExpression and Generator should only be hosted in corresponding operators
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have another check now. We should update the doc here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Updated
cf3f288 to
c7ccc42
Compare
|
Thanks for your reviews, @cloud-fan and @viirya ! I've updated the PR addressing your comments. This PR has actually caught a genuine bug in the analyzer in one of the test cases: SELECT *
FROM t1
WHERE c1 = (SELECT max(t2.c1)
FROM t2
GROUP BY t2.c1
HAVING count(*) >= 1
ORDER BY max(t2.c1))The analyzer resolves it into: ... where It's somewhat tedious to fix because we need to tweak the order a bit. Working on it. |
|
Test build #101712 has finished for PR 23658 at commit
|
| val newAttr = UnresolvedAttribute("unresolvedAttr") | ||
| Project(projectList ++ Seq(newAttr), child) | ||
| case agg @ Aggregate(Nil, aggregateExpressions, child) => | ||
| Project(aggregateExpressions, child) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment here to explain Project is unable to handle Aggregate expressions.
|
Please open a JIRA and make the bug fix as a TO-DO item. Thanks! |
…fter fixing SPARK-26741
|
Test build #101718 has finished for PR 23658 at commit
|
|
Test build #101721 has finished for PR 23658 at commit
|
|
LGTM Thanks! Merged to master. |
## What changes were proposed in this pull request? Add verification of plan integrity with regards to special expressions being hosted only in supported operators. Specifically: - `AggregateExpression`: should only be hosted in `Aggregate`, or indirectly in `Window` - `WindowExpression`: should only be hosted in `Window` - `Generator`: should only be hosted in `Generate` This will help us catch errors in future optimizer rules that incorrectly hoist special expression out of their supported operator. TODO: This PR actually caught a bug in the analyzer in the test case `SPARK-23957 Remove redundant sort from subquery plan(scalar subquery)` in `SubquerySuite`, where a `max()` aggregate function is hosted in a `Sort` operator in the analyzed plan, which is invalid. That test case is disabled in this PR. SPARK-26741 has been opened to track the fix in the analyzer. ## How was this patch tested? Added new test case in `OptimizerStructuralIntegrityCheckerSuite` Closes apache#23658 from rednaxelafx/plan-integrity. Authored-by: Kris Mok <[email protected]> Signed-off-by: gatorsmile <[email protected]>
What changes were proposed in this pull request?
Add verification of plan integrity with regards to special expressions being hosted only in supported operators. Specifically:
AggregateExpression: should only be hosted inAggregate, or indirectly inWindowWindowExpression: should only be hosted inWindowGenerator: should only be hosted inGenerateThis will help us catch errors in future optimizer rules that incorrectly hoist special expression out of their supported operator.
TODO: This PR actually caught a bug in the analyzer in the test case
SPARK-23957 Remove redundant sort from subquery plan(scalar subquery)inSubquerySuite, where amax()aggregate function is hosted in aSortoperator in the analyzed plan, which is invalid. That test case is disabled in this PR.SPARK-26741 has been opened to track the fix in the analyzer.
How was this patch tested?
Added new test case in
OptimizerStructuralIntegrityCheckerSuite