-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-38138][SQL] Materialize QueryPlan subqueries #35438
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @maryannxue @allisonwang-db @sigmod FYI |
| */ | ||
| def subqueries: Seq[PlanType] = { | ||
| expressions.flatMap(_.collect { | ||
| lazy val subqueries: Seq[PlanType] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add @transient
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for tips, updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for education purpose: why @transient is useful here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SparkPlan is the subclass of QueryPlan, which need to be sent to executor, use @transient to reduce the memory usage of executor.
abstract class SparkPlan extends QueryPlan[SparkPlan] with Logging with Serializable
|
cc @amaliujia |
|
Can one of the admins verify this patch? |
|
@cloud-fan would you please take a look? thanks |
|
thanks, merging to master! |
What changes were proposed in this pull request?
This PR propose to materialize
QueryPlan#subqueriesand pruned byPLAN_EXPRESSIONon searching to improve the SQL compile performance.Why are the changes needed?
We found a query in production that cost lots of time in optimize phase (also include AQE optimize phase) when enable DPP, the SQL pattern likes
SPARK-36444 significantly reduces the optimize time (exclude AQE phase), see detail at #35431, but there are still lots of time costs in
InsertAdaptiveSparkPlanon AQE optimize phase.Before this change, the query costs 658s, after this change only costs 65s.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Existing UTs.