[SPARK-38138][SQL] Materialize QueryPlan subqueries #35438

pan3793 · 2022-02-08T10:20:56Z

What changes were proposed in this pull request?

This PR propose to materialize QueryPlan#subqueries and pruned by PLAN_EXPRESSION on searching to improve the SQL compile performance.

Why are the changes needed?

We found a query in production that cost lots of time in optimize phase (also include AQE optimize phase) when enable DPP, the SQL pattern likes

select <cols...>
from a
left join b on a.<col> = b.<col>
left join c on b.<col> = c.<col>
left join d on c.<col> = d.<col>
left join e on d.<col> = e.<col>
left join f on e.<col> = f.<col>
left join g on f.<col> = g.<col>
left join h on g.<col> = h.<col>
...

SPARK-36444 significantly reduces the optimize time (exclude AQE phase), see detail at #35431, but there are still lots of time costs in InsertAdaptiveSparkPlan on AQE optimize phase.

Before this change, the query costs 658s, after this change only costs 65s.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing UTs.

pan3793 · 2022-02-08T10:21:46Z

cc @wangyum @cloud-fan @yaooqinn

HyukjinKwon · 2022-02-08T10:25:48Z

cc @maryannxue @allisonwang-db @sigmod FYI

ulysses-you · 2022-02-08T10:55:22Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala

   */
-  def subqueries: Seq[PlanType] = {
-    expressions.flatMap(_.collect {
+  lazy val subqueries: Seq[PlanType] = {


please add @transient

Thanks for tips, updated.

Just for education purpose: why @transient is useful here?

SparkPlan is the subclass of QueryPlan, which need to be sent to executor, use @transient to reduce the memory usage of executor.

abstract class SparkPlan extends QueryPlan[SparkPlan] with Logging with Serializable

amaliujia · 2022-02-09T03:45:00Z

cc @amaliujia

AmplabJenkins · 2022-02-09T05:59:32Z

Can one of the admins verify this patch?

pan3793 · 2022-02-16T02:56:03Z

@cloud-fan would you please take a look? thanks

cloud-fan · 2022-02-18T06:18:10Z

thanks, merging to master!

Materialize QueryPlan subqueries

e4fdfd5

github-actions bot added the SQL label Feb 8, 2022

ulysses-you reviewed Feb 8, 2022

View reviewed changes

improve

d7080ee

sigmod approved these changes Feb 17, 2022

View reviewed changes

cloud-fan approved these changes Feb 18, 2022

View reviewed changes

cloud-fan closed this in 0fcb560 Feb 18, 2022

pan3793 deleted the subquery branch April 4, 2022 09:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-38138][SQL] Materialize QueryPlan subqueries #35438

[SPARK-38138][SQL] Materialize QueryPlan subqueries #35438

Uh oh!

pan3793 commented Feb 8, 2022 •

edited

Loading

Uh oh!

pan3793 commented Feb 8, 2022

Uh oh!

HyukjinKwon commented Feb 8, 2022

Uh oh!

ulysses-you Feb 8, 2022

Uh oh!

pan3793 Feb 8, 2022

Uh oh!

amaliujia Feb 9, 2022

Uh oh!

pan3793 Feb 9, 2022 •

edited

Loading

Uh oh!

amaliujia commented Feb 9, 2022

Uh oh!

AmplabJenkins commented Feb 9, 2022

Uh oh!

pan3793 commented Feb 16, 2022

Uh oh!

cloud-fan commented Feb 18, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

[SPARK-38138][SQL] Materialize QueryPlan subqueries #35438

[SPARK-38138][SQL] Materialize QueryPlan subqueries #35438

Uh oh!

Conversation

pan3793 commented Feb 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

pan3793 commented Feb 8, 2022

Uh oh!

HyukjinKwon commented Feb 8, 2022

Uh oh!

ulysses-you Feb 8, 2022

Choose a reason for hiding this comment

Uh oh!

pan3793 Feb 8, 2022

Choose a reason for hiding this comment

Uh oh!

amaliujia Feb 9, 2022

Choose a reason for hiding this comment

Uh oh!

pan3793 Feb 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amaliujia commented Feb 9, 2022

Uh oh!

AmplabJenkins commented Feb 9, 2022

Uh oh!

pan3793 commented Feb 16, 2022

Uh oh!

cloud-fan commented Feb 18, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

pan3793 commented Feb 8, 2022 •

edited

Loading

pan3793 Feb 9, 2022 •

edited

Loading