[SPARK-46090][SQL] Support plan fragment level SQL configs in AQE #44013

ulysses-you · 2023-11-26T03:01:33Z

What changes were proposed in this pull request?

This pr introduces case class AdaptiveRuleContext(isSubquery: Boolean, isFinalStage: Boolean) which can be used inside adaptive sql extension rules through thread local, so that developers can modify the next plan fragment configs using AdaptiveRuleContext.get().

The plan fragment configs can be propagated through multi-phases, e.g., if set a config in queryPostPlannerStrategyRules then the config can be gotten in queryStagePrepRules, queryStageOptimizerRules and columnarRules. The configs will be cleanup before going to execute, so in next round the configs will be empty.

Why are the changes needed?

To support modify the plan fragment level SQL configs through AQE rules.

Does this PR introduce any user-facing change?

no, only affect developers.

How was this patch tested?

add new tests

Was this patch authored or co-authored using generative AI tooling?

no

ulysses-you · 2023-11-27T02:18:18Z

cc @cloud-fan @maryannxue @dongjoon-hyun if you have time to take a look at this idea, thank you.

dongjoon-hyun

Thank you for pinging me, @ulysses-you .

Although this PR is not big, could you split this PR into two?

A PR for adding Adaptive Query Post Planner Strategy Rules
A PR for fragment-level SQL configs in AQE?

We can proceed (1) first independently.

…y rules in SparkSessionExtensions ### What changes were proposed in this pull request? This pr adds a new extension entrance `queryPostPlannerStrategyRules` in `SparkSessionExtensions`. It will be applied between plannerStrategy and queryStagePrepRules in AQE, so it can get the whole plan before injecting exchanges. ### Why are the changes needed? a part of #44013 ### Does this PR introduce _any_ user-facing change? no, only for develop ### How was this patch tested? add test ### Was this patch authored or co-authored using generative AI tooling? no Closes #44074 from ulysses-you/post-planner. Authored-by: ulysses-you <[email protected]> Signed-off-by: youxiduo <[email protected]>

dongjoon-hyun

Thank you for rebasing this PR, @ulysses-you .

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SQLConfHelper.scala

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala

dongjoon-hyun · 2023-12-01T08:13:39Z

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala

Is this used in somewhere else? Otherwise, let's not define a new one~

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala

sql/core/src/test/scala/org/apache/spark/sql/internal/ExecutorSideSQLConfSuite.scala

sql/core/src/test/scala/org/apache/spark/sql/test/SQLTestUtils.scala

sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveSerDeSuite.scala

dongjoon-hyun · 2023-12-01T08:17:26Z

To @ulysses-you , could you address the second-round review comments, please?

ulysses-you · 2023-12-04T02:21:24Z

thank you @dongjoon-hyun, will rebase and address comments after #44142

…per` ### What changes were proposed in this pull request? This pr moves method `withSQLConf` from `SQLHelper` in catalyst test module to `SQLConfHelper` trait in catalyst module. To make it easy to use such case: `val x = withSQLConf {}`, this pr also changes its return type. ### Why are the changes needed? A part of #44013 ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? Pass CI ### Was this patch authored or co-authored using generative AI tooling? no Closes #44142 from ulysses-you/withSQLConf. Authored-by: ulysses-you <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

dongjoon-hyun · 2023-12-04T05:36:42Z

I merged it. Could you rebase this PR, @ulysses-you ? 😄

[SPARK-46227][SQL] Move withSQLConf from SQLHelper to SQLConfHelper #44142

ulysses-you · 2023-12-04T06:17:58Z

thank you @dongjoon-hyun it's done

…per` ### What changes were proposed in this pull request? This pr moves method `withSQLConf` from `SQLHelper` in catalyst test module to `SQLConfHelper` trait in catalyst module. To make it easy to use such case: `val x = withSQLConf {}`, this pr also changes its return type. ### Why are the changes needed? A part of apache#44013 ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? Pass CI ### Was this patch authored or co-authored using generative AI tooling? no Closes apache#44142 from ulysses-you/withSQLConf. Authored-by: ulysses-you <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

dongjoon-hyun

+1, LGTM from my side. Thank you, @ulysses-you .

dongjoon-hyun · 2023-12-07T23:07:27Z

cc @cloud-fan , @wangyum

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveRuleContext.scala

…per` ### What changes were proposed in this pull request? This pr moves method `withSQLConf` from `SQLHelper` in catalyst test module to `SQLConfHelper` trait in catalyst module. To make it easy to use such case: `val x = withSQLConf {}`, this pr also changes its return type. ### Why are the changes needed? A part of apache#44013 ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? Pass CI ### Was this patch authored or co-authored using generative AI tooling? no Closes apache#44142 from ulysses-you/withSQLConf. Authored-by: ulysses-you <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

beliefer · 2023-12-12T02:50:45Z

@ulysses-you Could you explain what scenario would require adjusting the SQL configs segment by segment? I'm just out of curiosity.

ulysses-you · 2023-12-12T04:11:48Z

@beliefer for example, change the initial shuffle partition number per plan fragment to avoid too big or too small, change the advisory partition size according to the feature of plan fragment(small for generate, big for filter).

beliefer · 2023-12-12T05:54:48Z

@beliefer for example, change the initial shuffle partition number per plan fragment to avoid too big or too small, change the advisory partition size according to the feature of plan fragment(small for generate, big for filter).

Thank you for explanation.

…y rules in SparkSessionExtensions ### What changes were proposed in this pull request? This pr adds a new extension entrance `queryPostPlannerStrategyRules` in `SparkSessionExtensions`. It will be applied between plannerStrategy and queryStagePrepRules in AQE, so it can get the whole plan before injecting exchanges. ### Why are the changes needed? a part of apache#44013 ### Does this PR introduce _any_ user-facing change? no, only for develop ### How was this patch tested? add test ### Was this patch authored or co-authored using generative AI tooling? no Closes apache#44074 from ulysses-you/post-planner. Authored-by: ulysses-you <[email protected]> Signed-off-by: youxiduo <[email protected]>

github-actions · 2024-03-22T00:18:02Z

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

ulysses-you · 2024-05-16T01:22:57Z

@cloud-fan any concern to land this at 4.0 ?

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveRuleContext.scala

yaooqinn · 2024-05-24T06:17:05Z

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveRuleContext.scala

+  def get(): Option[AdaptiveRuleContext] = Option(ruleContextThreadLocal.get())
+
+  private[sql] def withRuleContext[T](ruleContext: AdaptiveRuleContext)(block: => T): T = {
+    assert(ruleContext != null)


make it null-tolerant?

it's used to make sure rule conext is valid, null is meanless even if we tolerate..

yaooqinn · 2024-05-24T06:20:02Z

LGTM

…e/AdaptiveRuleContext.scala

ulysses-you · 2024-05-24T06:32:36Z

thank you all, merged to master(4.0.0)

cloud-fan · 2024-09-19T19:45:40Z

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala

+  private def withRuleContext[T](f: => T): T =
+    AdaptiveRuleContext.withRuleContext(ruleContext) { f }
+
+  private def applyPhysicalRulesWithRuleContext(


Shall we update applyPhysicalRules directly? Do we expect people to call applyPhysicalRules without rule context?

applyPhysicalRules is a kind of private method, not sure how can people use it.

I mean, people who develop Spark. When applyPhysicalRules should be called instead of applyPhysicalRulesWithRuleContext? If never, why do we still keep it?

applyPhysicalRules it is used by InsertAdaptiveSparkPlan for Spark internal rules and at that time we do not have a AdaptiveSparkPlanExe. It does not affect user-specifed rules, so I leave it.

cloud-fan · 2024-10-18T02:51:31Z

Hi @ulysses-you , we tried to use this RuleContext framework and found some design issues:

the AQE rules are not always stage-local, but can transform the whole plan. It doesn't make sense to put stage information in the RuleContext. e.g. how can OptimizeSkewedJoin rule leverage it? What does the isFinalStage even mean in this context?
The protocol for setting this plan fragment level configs is very hacky. The test uses one custom rule to update RuleContext, expecting another rule to access the RuleContext. This assumes that there is a global RuleContext instance shared between all rules, which is quite messy if the rule transforms the whole plan and needs to deal with multiple stages. We also need to define how to propagate the rule context between different AQE phases.

I think a better design is to put the context (plan fragment level confs or something more general) in the query stage itself. Then all the rules can either update or consume the query stage context. The only problem is that we don't have a query stage node for the final result stage, but we should add one (@liuzqt is working on it).

cloud-fan · 2025-02-04T05:25:21Z

I've reverted it since #49715 is out and it should be the right design for this.

### What changes were proposed in this pull request? Added ResultQueryStageExec for AQE How does the query plan look like in explain string: ``` AdaptiveSparkPlan isFinalPlan=true +- == Final Plan == ResultQueryStage 2 ------> newly added +- *(5) Project [id#26L] +- *(5) SortMergeJoin [id#26L], [id#27L], Inner :- *(3) Sort [id#26L ASC NULLS FIRST], false, 0 : +- AQEShuffleRead coalesced : +- ShuffleQueryStage 0 : +- Exchange hashpartitioning(id#26L, 200), ENSURE_REQUIREMENTS, [plan_id=247] : +- *(1) Range (0, 25600, step=1, splits=10) +- *(4) Sort [id#27L ASC NULLS FIRST], false, 0 +- AQEShuffleRead coalesced +- ShuffleQueryStage 1 +- Exchange hashpartitioning(id#27L, 200), ENSURE_REQUIREMENTS, [plan_id=257] +- *(2) Ran... ``` How does the query plan look like in Spark UI: <img width="680" alt="Screenshot 2025-02-03 at 4 11 43 PM" src="https://github.com/user-attachments/assets/86946e19-ffdd-42dd-974a-62a8300ddac8" /> ### Why are the changes needed? Currently AQE framework is not fully self-contained since not all plan segments can be put into a query stage: the final "stage" basically executed as a nonAQE plan. This PR added a result query stage for AQE to unify the framework. With this change, we can build more query stage level features, one use case like #44013 (comment) ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? new unit tests. Also exisiting tests which are impacted by this change are updated to keep their original test semantics. ### Was this patch authored or co-authored using generative AI tooling? NO Closes #49715 from liuzqt/SPARK-51008. Lead-authored-by: liuzqt <[email protected]> Co-authored-by: Ziqi Liu <[email protected]> Co-authored-by: Wenchen Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

### What changes were proposed in this pull request? Added ResultQueryStageExec for AQE How does the query plan look like in explain string: ``` AdaptiveSparkPlan isFinalPlan=true +- == Final Plan == ResultQueryStage 2 ------> newly added +- *(5) Project [id#26L] +- *(5) SortMergeJoin [id#26L], [id#27L], Inner :- *(3) Sort [id#26L ASC NULLS FIRST], false, 0 : +- AQEShuffleRead coalesced : +- ShuffleQueryStage 0 : +- Exchange hashpartitioning(id#26L, 200), ENSURE_REQUIREMENTS, [plan_id=247] : +- *(1) Range (0, 25600, step=1, splits=10) +- *(4) Sort [id#27L ASC NULLS FIRST], false, 0 +- AQEShuffleRead coalesced +- ShuffleQueryStage 1 +- Exchange hashpartitioning(id#27L, 200), ENSURE_REQUIREMENTS, [plan_id=257] +- *(2) Ran... ``` How does the query plan look like in Spark UI: <img width="680" alt="Screenshot 2025-02-03 at 4 11 43 PM" src="https://github.com/user-attachments/assets/86946e19-ffdd-42dd-974a-62a8300ddac8" /> ### Why are the changes needed? Currently AQE framework is not fully self-contained since not all plan segments can be put into a query stage: the final "stage" basically executed as a nonAQE plan. This PR added a result query stage for AQE to unify the framework. With this change, we can build more query stage level features, one use case like #44013 (comment) ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? new unit tests. Also exisiting tests which are impacted by this change are updated to keep their original test semantics. ### Was this patch authored or co-authored using generative AI tooling? NO Closes #49715 from liuzqt/SPARK-51008. Lead-authored-by: liuzqt <[email protected]> Co-authored-by: Ziqi Liu <[email protected]> Co-authored-by: Wenchen Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 207390b) Signed-off-by: Wenchen Fan <[email protected]>

github-actions bot added the SQL label Nov 26, 2023

dongjoon-hyun reviewed Nov 28, 2023

View reviewed changes

ulysses-you mentioned this pull request Nov 29, 2023

[SPARK-46170][SQL] Support inject adaptive query post planner strategy rules in SparkSessionExtensions #44074

Closed

ulysses-you force-pushed the rule-context branch 2 times, most recently from b358f71 to 380bc21 Compare November 30, 2023 10:22