[SPARK-43190][SQL] ListQuery.childOutput should be consistent with child output #40851

cloud-fan · 2023-04-19T12:52:29Z

What changes were proposed in this pull request?

Update ListQuery to only store the number of columns of the original plan, instead of directly storing the original plan output attributes.

Why are the changes needed?

Storing the plan output attributes is troublesome as we have to maintain them and keep them in sync with the plan. For example, DeduplicateRelations may change the plan output, and today we do not update ListQuery.childOutputs to keep sync.

ListQuery.childOutputs was added by #18968 . It's only used to track the original plan output attributes as subquery de-correlation may add more columns. We can do the same thing by storing the number of columns of the plan.

Does this PR introduce any user-facing change?

No, there is no user-facing bug exposed.

How was this patch tested?

a new plan test

cloud-fan · 2023-04-19T12:54:57Z

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala

+    val listQuery2 = ListQuery(testRelation2.select($"b"))
+    val plan = testRelation3.where($"f".in(listQuery1) && $"f".in(listQuery2)).analyze
+    val resolvedCondition = plan.expressions.head
+    val finalPlan = testRelation2.join(testRelation3).where(resolvedCondition).analyze


The test uses the resolved ListQuery to build a new plan and resolve it, to trigger the bug. Otherwise the bug is hidden because DeduplicateRelations runs before ResolveSubqueries, and the plan output of ListQuery won't be changed again.

cloud-fan · 2023-04-19T12:55:09Z

cc @viirya

viirya · 2023-04-20T07:02:49Z

Looks good to me.

cloud-fan · 2023-04-20T07:12:57Z

thanks for the review, merging to master!

…ild output ### What changes were proposed in this pull request? Update `ListQuery` to only store the number of columns of the original plan, instead of directly storing the original plan output attributes. ### Why are the changes needed? Storing the plan output attributes is troublesome as we have to maintain them and keep them in sync with the plan. For example, `DeduplicateRelations` may change the plan output, and today we do not update `ListQuery.childOutputs` to keep sync. `ListQuery.childOutputs` was added by apache#18968 . It's only used to track the original plan output attributes as subquery de-correlation may add more columns. We can do the same thing by storing the number of columns of the plan. ### Does this PR introduce _any_ user-facing change? No, there is no user-facing bug exposed. ### How was this patch tested? a new plan test Closes apache#40851 from cloud-fan/list_query. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

PengleiShi · 2024-11-06T13:28:04Z

@cloud-fan it seems that this patch resolved issue https://issues.apache.org/jira/browse/SPARK-41191?. In SPARK-41191, nested cache table does not work because of different attribute exprId @mcdull-zhang https://github.com/apache/spark/pull/38703/files#r1027906540

SPARK-43190

c52052f

github-actions bot added the SQL label Apr 19, 2023

cloud-fan commented Apr 19, 2023

View reviewed changes

viirya approved these changes Apr 20, 2023

View reviewed changes

cloud-fan closed this in 9e17731 Apr 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-43190][SQL] ListQuery.childOutput should be consistent with child output #40851

[SPARK-43190][SQL] ListQuery.childOutput should be consistent with child output #40851

Uh oh!

cloud-fan commented Apr 19, 2023 •

edited

Loading

Uh oh!

cloud-fan Apr 19, 2023

Uh oh!

cloud-fan commented Apr 19, 2023

Uh oh!

viirya commented Apr 20, 2023

Uh oh!

cloud-fan commented Apr 20, 2023

Uh oh!

PengleiShi commented Nov 6, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-43190][SQL] ListQuery.childOutput should be consistent with child output #40851

[SPARK-43190][SQL] ListQuery.childOutput should be consistent with child output #40851

Uh oh!

Conversation

cloud-fan commented Apr 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

cloud-fan Apr 19, 2023

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Apr 19, 2023

Uh oh!

viirya commented Apr 20, 2023

Uh oh!

cloud-fan commented Apr 20, 2023

Uh oh!

PengleiShi commented Nov 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cloud-fan commented Apr 19, 2023 •

edited

Loading

PengleiShi commented Nov 6, 2024 •

edited

Loading