[SPARK-41048][SQL] Improve output partitioning and ordering with AQE cache #38558

ulysses-you · 2022-11-08T10:54:28Z

What changes were proposed in this pull request?

Try our best to give a stable output partitioning and ordering if current executed plan is final plan in InMemoryTableScanExec.
Make AdaptiveSparkPlanExec expose the isFinal flag

Why are the changes needed?

The cached plan in InMemoryRelation can be AdaptiveSparkPlanExec, however AdaptiveSparkPlanExec deos not specify its output partitioning and ordering. It causes unnecessary shuffle and local sort for downstream action.

          ...
           |
  AdaptiveSparkPlanExec
           |
  InMemoryTableScanExec
           |
          ...

after this pr, the InMemoryTableScanExec can preverse the output partitioning and ordering.

Does this PR introduce any user-facing change?

no, only improve performance

How was this patch tested?

add test

ulysses-you · 2022-11-08T11:20:48Z

cc @cloud-fan @maryannxue @viirya thank you

cloud-fan · 2022-11-08T13:48:52Z

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala


  override def output: Seq[Attribute] = inputPlan.output

+  // Try our best to give a stable output partitioning and ordering.


I'm trying to understand this "best effort". AFAIK, table cache is lazy. For a query that accesses a cached query the first time, the cached query is not executed yet so we don't know the output partitioning/ordering and can't optimize out shuffles. But when the cached query is accessed the next time, it's already executed and we know the output partitioning/ordering.

yes, in general the first action for a cached plan is count, e.g. CacheTableAsSelectExec, so I think it is a not big issue that we can not optimize the shuffle/sort for the first action.

The usage of the cache is: user wants to reference it multi-times, then this optimization will help a lot.

This would be super limited use... and cause inconsistency.

I'd only return output partitioning if there is a user repartition op in the end. In other words, only if AQE plan is required to preserve user specified partitioning.

Unfortunately we hint this.. per my experience, user always caches an arbitrary df and use the cached df to build an another arbitrary df. So why can't we preserve the partitioning/ordering of the cached plan ? If you really feel inconsistency in AdaptiveSparkPlanExec, we can probably move to InMemoryRelationExec.

My original idea is to do the both but feel a little overkill (requiredOrdering should be inferred separately like #35924)

requiredDistribution.map(_.createPartitioning(conf.shufflePartitions)).getOrElse { if (isFinalPlan) { executedPlan.outputPartitioning } else { super.outputPartitioning } }

A useful distribution before caching is few in production since repartition(col) will introduce skew

cloud-fan · 2022-11-10T02:52:45Z

sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala

+
+  test("SPARK-41048: Improve output partitioning and ordering with AQE cache") {
+    withSQLConf(
+        SQLConf.CAN_CHANGE_CACHED_PLAN_OUTPUT_PARTITIONING.key -> "true",


after this PR, we can probably turn this on by default, to improve AQE coverage.

agree, We can also remvoe the internal tag

viirya · 2022-11-10T07:09:59Z

sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala

+    case adaptive: AdaptiveSparkPlanExec if adaptive.isFinalized => adaptive.executedPlan
+    case other => other


So this only affects output partitioning and ordering after the cached relation is materialized? And if the query plan refers this cached plan is already finished with planning but not executed yet, it will still use old partitioning and ordering, right?

I think for good side, this can improve AQE coverage for some limited cases. Although I'm also worrying about some inconsistent behaviors that could be seen by end-users regarding shuffling/sorting. They might ask for questions why sometimes shuffle/sort is added but sometimes it isn't.

I guess that might require users more tricks on optimizing cached AQE relation in practice.

Another idea is to materialize the AQE plan eagerly so that even the first cache access can be optimized. However, this requires triggering query execution during query planning, which is a bit risky.

A good practice is to ask users to do query caching eagerly, e.g. do a df.count right after the df is cached. Then they won't observe inconsistencies. Anyway, I think this PR is a net win as it optimizes all the following cache accesses after the first access. This is important for query caching as cache is meant to be accessed repeatedly.

Yes, this requires some good practice when caching query and using it. Otherwise, this looks a good one.

maryannxue · 2022-11-10T15:34:08Z

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala


  def executedPlan: SparkPlan = currentPhysicalPlan

+  def isFinalized: Boolean = isFinalPlan


Can we do private var _isFinalPlan and def isFinalPlan instead?

sure, addressed

cloud-fan · 2022-11-17T03:34:31Z

thanks, merging to master!

…cache ### What changes were proposed in this pull request? Try our best to give a stable output partitioning and ordering if current executed plan is final plan in `InMemoryTableScanExec`. Make AdaptiveSparkPlanExec expose the isFinal flag ### Why are the changes needed? The cached plan in InMemoryRelation can be AdaptiveSparkPlanExec, however AdaptiveSparkPlanExec deos not specify its output partitioning and ordering. It causes unnecessary shuffle and local sort for downstream action. ``` ... | AdaptiveSparkPlanExec | InMemoryTableScanExec | ... ``` after this pr, the `InMemoryTableScanExec` can preverse the output partitioning and ordering. ### Does this PR introduce _any_ user-facing change? no, only improve performance ### How was this patch tested? add test Closes apache#38558 from ulysses-you/aqe-cache. Authored-by: ulysses-you <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

Improve output partitioning and ordering with AQE cache

97063be

github-actions bot added the SQL label Nov 8, 2022

cloud-fan reviewed Nov 8, 2022

View reviewed changes

address

dce69b4

cloud-fan reviewed Nov 10, 2022

View reviewed changes

cloud-fan approved these changes Nov 10, 2022

View reviewed changes

viirya reviewed Nov 10, 2022

View reviewed changes

maryannxue reviewed Nov 10, 2022

View reviewed changes

address comment

8f1bcba

cloud-fan closed this in d218013 Nov 17, 2022

ulysses-you deleted the aqe-cache branch November 17, 2022 05:34

cloud-fan mentioned this pull request Nov 25, 2022

[SPARK-41214][SQL] - SQL Metrics are missing from Spark UI when AQE for Cached DataFrame is enabled #38736

Closed


		override def output: Seq[Attribute] = inputPlan.output

		// Try our best to give a stable output partitioning and ordering.

		case adaptive: AdaptiveSparkPlanExec if adaptive.isFinalized => adaptive.executedPlan
		case other => other


		def executedPlan: SparkPlan = currentPhysicalPlan

		def isFinalized: Boolean = isFinalPlan

[SPARK-41048][SQL] Improve output partitioning and ordering with AQE cache #38558

[SPARK-41048][SQL] Improve output partitioning and ordering with AQE cache #38558

Uh oh!

Conversation

ulysses-you commented Nov 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

ulysses-you commented Nov 8, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Nov 17, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ulysses-you commented Nov 8, 2022 •

edited

Loading