-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-28583][SQL] Subqueries should not call onUpdatePlan in Adaptive Query Execution
#25316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #108479 has finished for PR 25316 at commit
|
| // Apply the same instance of this rule to sub-queries so that sub-queries all share the | ||
| // same `stageCache` for Exchange reuse. | ||
| val adaptivePlan = this.apply(queryExec.sparkPlan) | ||
| val adaptivePlan = this.applyInternal(queryExec.sparkPlan, queryExec) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we reach here, it means we are creating AdaptiveSparkPlanExec for a subquery. Shall we simply set a boolean flag here (e.g. adaptivePlan.copy(isSubquery = true)) instead of passing around the QueryExecution?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvm, this is more flexible, in case some places create QueryExecution without execution id and execute.
| session.sparkContext.getLocalProperty(SQLExecution.EXECUTION_ID_KEY)).flatMap { idStr => | ||
| val id = idStr.toLong | ||
| val qe = SQLExecution.getQueryExecution(id) | ||
| if (qe.eq(queryExecution)) Some(id) else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some doc on why this is needed. It is kinda annoying when you have to check the git blame to figure out why code is there.
|
Test build #108685 has finished for PR 25316 at commit
|
|
LGTM if https://github.com/apache/spark/pull/25316/files#r309864918 is addressed |
hvanhovell
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Merging to master |
|
Test build #108778 has finished for PR 25316 at commit
|
### What changes were proposed in this pull request? After [PR#25316](#25316) fixed the dead lock issue in [PR#25308](#25308), the subquery metrics can not be shown in UI as following screenshot.  This PR fix the subquery UI shown issue by adding `SparkListenerSQLAdaptiveSQLMetricUpdates` event to update the suquery sql metric. After with this PR, the suquery UI can show correctly as following screenshot:  ### Why are the changes needed? Showing the subquery metric in UI when enable AQE ### Does this PR introduce any user-facing change? No ### How was this patch tested? Existing UT Closes #27260 from JkSelf/fixSubqueryUI. Authored-by: jiake <[email protected]> Signed-off-by: Xiao Li <[email protected]>
…ive Query Execution ## What changes were proposed in this pull request? Subqueries do not have their own execution id, thus when calling `AdaptiveSparkPlanExec.onUpdatePlan`, it will actually get the `QueryExecution` instance of the main query, which is wasteful and problematic. It could cause issues like stack overflow or dead locks in some circumstances. This PR fixes this issue by making `AdaptiveSparkPlanExec` compare the `QueryExecution` object retrieved by current execution ID against the `QueryExecution` object from which this plan is created, and only update the UI when the two instances are the same. ## How was this patch tested? Manual tests on TPC-DS queries. Closes apache#25316 from maryannxue/aqe-updateplan-fix. Authored-by: maryannxue <[email protected]> Signed-off-by: herman <[email protected]>
What changes were proposed in this pull request?
Subqueries do not have their own execution id, thus when calling
AdaptiveSparkPlanExec.onUpdatePlan, it will actually get theQueryExecutioninstance of the main query, which is wasteful and problematic. It could cause issues like stack overflow or dead locks in some circumstances.This PR fixes this issue by making
AdaptiveSparkPlanExeccompare theQueryExecutionobject retrieved by current execution ID against theQueryExecutionobject from which this plan is created, and only update the UI when the two instances are the same.How was this patch tested?
Manual tests on TPC-DS queries.