Skip to content

Conversation

@maryannxue
Copy link
Contributor

What changes were proposed in this pull request?

Subqueries do not have their own execution id, thus when calling AdaptiveSparkPlanExec.onUpdatePlan, it will actually get the QueryExecution instance of the main query, which is wasteful and problematic. It could cause issues like stack overflow or dead locks in some circumstances.

This PR fixes this issue by making AdaptiveSparkPlanExec compare the QueryExecution object retrieved by current execution ID against the QueryExecution object from which this plan is created, and only update the UI when the two instances are the same.

How was this patch tested?

Manual tests on TPC-DS queries.

@SparkQA
Copy link

SparkQA commented Jul 31, 2019

Test build #108479 has finished for PR 25316 at commit 97bce6f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class InsertAdaptiveSparkPlan(

// Apply the same instance of this rule to sub-queries so that sub-queries all share the
// same `stageCache` for Exchange reuse.
val adaptivePlan = this.apply(queryExec.sparkPlan)
val adaptivePlan = this.applyInternal(queryExec.sparkPlan, queryExec)
Copy link
Contributor

@cloud-fan cloud-fan Aug 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we reach here, it means we are creating AdaptiveSparkPlanExec for a subquery. Shall we simply set a boolean flag here (e.g. adaptivePlan.copy(isSubquery = true)) instead of passing around the QueryExecution?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm, this is more flexible, in case some places create QueryExecution without execution id and execute.

session.sparkContext.getLocalProperty(SQLExecution.EXECUTION_ID_KEY)).flatMap { idStr =>
val id = idStr.toLong
val qe = SQLExecution.getQueryExecution(id)
if (qe.eq(queryExecution)) Some(id) else None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some doc on why this is needed. It is kinda annoying when you have to check the git blame to figure out why code is there.

@SparkQA
Copy link

SparkQA commented Aug 6, 2019

Test build #108685 has finished for PR 25316 at commit 1705242.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

Copy link
Contributor

@hvanhovell hvanhovell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hvanhovell
Copy link
Contributor

Merging to master

@hvanhovell hvanhovell closed this in 325bc8e Aug 7, 2019
@SparkQA
Copy link

SparkQA commented Aug 7, 2019

Test build #108778 has finished for PR 25316 at commit f570281.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

gatorsmile pushed a commit that referenced this pull request Jan 22, 2020
### What changes were proposed in this pull request?
After [PR#25316](#25316) fixed the dead lock issue in [PR#25308](#25308), the subquery metrics can not be shown in UI as following screenshot.
![image](https://user-images.githubusercontent.com/11972570/72891385-160ec980-3d4f-11ea-91fc-ccaad890f7dc.png)

 This PR fix the subquery UI shown issue by adding `SparkListenerSQLAdaptiveSQLMetricUpdates` event to update the suquery  sql metric. After with this PR, the suquery UI can show correctly as following screenshot:
![image](https://user-images.githubusercontent.com/11972570/72893610-66d4f100-3d54-11ea-93c9-f444b2f31952.png)

### Why are the changes needed?
Showing the subquery metric in UI when enable AQE

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Existing UT

Closes #27260 from JkSelf/fixSubqueryUI.

Authored-by: jiake <[email protected]>
Signed-off-by: Xiao Li <[email protected]>
j-baker pushed a commit to palantir/spark that referenced this pull request Jan 25, 2020
…ive Query Execution

## What changes were proposed in this pull request?

Subqueries do not have their own execution id, thus when calling `AdaptiveSparkPlanExec.onUpdatePlan`, it will actually get the `QueryExecution` instance of the main query, which is wasteful and problematic. It could cause issues like stack overflow or dead locks in some circumstances.

This PR fixes this issue by making `AdaptiveSparkPlanExec` compare the `QueryExecution` object retrieved by current execution ID against the `QueryExecution` object from which this plan is created, and only update the UI when the two instances are the same.

## How was this patch tested?

Manual tests on TPC-DS queries.

Closes apache#25316 from maryannxue/aqe-updateplan-fix.

Authored-by: maryannxue <[email protected]>
Signed-off-by: herman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants