Skip to content

Conversation

@erenavsarogullari
Copy link
Member

@erenavsarogullari erenavsarogullari commented Nov 21, 2022

What changes were proposed in this pull request?

JIRA: https://issues.apache.org/jira/browse/SPARK-41214
spark.sql.optimizer.canChangeCachedPlanOutputPartitioning enables AQE optimizations under InMemoryRelation(IMR) nodes by creating sub-SparkPlans for AQE optimizations. However, when spark.sql.optimizer.canChangeCachedPlanOutputPartitioning = true, Spark UI does not show correct final DAG due to lack of final sub-plans (under IMR) submissions (into UI).

Why are the changes needed?

spark.sql.optimizer.canChangeCachedPlanOutputPartitioning enables AQE optimizations under InMemoryRelation(IMR) nodes. Sample query in Jira has IMR node on both BroadcastHashJoin legs. However,
when spark.sql.optimizer.canChangeCachedPlanOutputPartitioning = true, following datas are missed due to lack of final sub-plans (under IMR) submissions (into UI).

  • Physical operators' SQLMetrics (before AdaptiveSparkPlan) are missed such as Exchange and HashAggregate on left BHJ leg and HashAggregate on right BHJ leg,
  • WSCG blocks are missed on left BHJ leg,
  • AQEShuffleRead node is missed on left BHJ leg.

DAG before fix:
DAG when AQE=ON and AQECachedDFSupport=ON without fix

DAG after fix:
DAG when AQE=ON and AQECachedDFSupport=ON with fix

Does this PR introduce any user-facing change?

Currently, Spark UI does not show final DAG when spark.sql.optimizer.canChangeCachedPlanOutputPartitioning = true so this causes some physical operators metrics as missed. This PR aims to fix this problem.

How was this patch tested?

Specific UT Coverage will be added (In Progress).

@github-actions github-actions bot added the SQL label Nov 21, 2022
@erenavsarogullari erenavsarogullari changed the title [WIP][SPARK-41214][SQL] - SubPlan metrics under InMemoryRelation are missed when … [WIP][SPARK-41214][SQL] - SubPlan metrics are missed when AQE is enabled under InMemoryRelation Nov 21, 2022
@erenavsarogullari
Copy link
Member Author

cc @cloud-fan @Ngone51

@erenavsarogullari erenavsarogullari changed the title [WIP][SPARK-41214][SQL] - SubPlan metrics are missed when AQE is enabled under InMemoryRelation [SPARK-41214][SQL] - SubPlan metrics are missed when AQE is enabled under InMemoryRelation Nov 22, 2022
@erenavsarogullari erenavsarogullari changed the title [SPARK-41214][SQL] - SubPlan metrics are missed when AQE is enabled under InMemoryRelation [SPARK-41214][SQL] - SQL Metrics are missing from Spark UI when AQE for Cached DataFrame is enabled Nov 22, 2022
@cloud-fan
Copy link
Contributor

is this still a problem after #38558 ?

@ulysses-you
Copy link
Contributor

The reason is, the AdaptiveSparkPlanExec inside InMemoryTableScanExec does not register a new execution id in SQLExecution. Then when it materializes, AdaptiveSparkPlanExec.getExecutionId can not find the corresponding QueryExecution.

private def getExecutionId: Option[Long] = {
// If the `QueryExecution` does not match the current execution ID, it means the execution ID
// belongs to another (parent) query, and we should not call update UI in this query.
Option(context.session.sparkContext.getLocalProperty(SQLExecution.EXECUTION_ID_KEY))
.map(_.toLong).filter(SQLExecution.getQueryExecution(_) eq context.qe)
}

A simple example like:

          ...
           |
  AdaptiveSparkPlanExec (queryexecution 0, no execution id)
           |
  InMemoryTableScanExec
           |
          ...
           |
  AdaptiveSparkPlanExec (queryexecution 1, execution id 0)

@ulysses-you
Copy link
Contributor

i s this still a problem after #38558 ?

@cloud-fan oh just see it. it not related

@cloud-fan
Copy link
Contributor

after #38558 , I think only the first cache access will miss the metrics?

@ulysses-you
Copy link
Contributor

ulysses-you commented Nov 25, 2022

There is another issue.. it seems the existed framework does not support to get metrics across multi sql execution. This issue is not about AQE, that means even we disable AQE for cache plan, we still can not get valid metrics after first access.

@github-actions
Copy link

github-actions bot commented Mar 6, 2023

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Mar 6, 2023
@github-actions github-actions bot closed this Mar 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants