Skip to content

Conversation

@allisonwang-db
Copy link
Contributor

@allisonwang-db allisonwang-db commented Apr 27, 2022

What changes were proposed in this pull request?

Backport #36216 to branch-3.2

Why are the changes needed?

To fix a bug in SchemaPruning.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Unit test

@github-actions github-actions bot added the SQL label Apr 27, 2022
@viirya
Copy link
Member

viirya commented May 1, 2022

@allisonwang-db Can you fix the test failure?

@allisonwang-db
Copy link
Contributor Author

@viirya I tried a few times but can't reproduce this TPCDSV1_4_PlanStabilitySuite test failure locally...

@viirya
Copy link
Member

viirya commented May 3, 2022

@allisonwang-db Maybe you can just re-trigger the CI?

@viirya
Copy link
Member

viirya commented May 7, 2022

Hmm, from the logs, seems there are unmatched plans:

2022-05-04T01:30:18.1915772Z ESC[0m[ESC[0mESC[0minfoESC[0m] ESC[0mESC[0mESC[31m- check simplified (tpcds-v1.4/q5) *** FAILED *** (361 milliseconds)ESC[0mESC[0m
2022-05-04T01:30:18.1916346Z ESC[0m[ESC[0mESC[0minfoESC[0m] ESC[0mESC[0mESC[31m  Plans did not match:ESC[0mESC[0m
2022-05-04T01:30:18.1917069Z ESC[0m[ESC[0mESC[0minfoESC[0m] ESC[0mESC[0mESC[31m  last approved simplified plan: /home/runner/work/spark/spark/sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q5/simplified.txtESC[0mESC[0m
2022-05-04T01:30:18.1918103Z ESC[0m[ESC[0mESC[0minfoESC[0m] ESC[0mESC[0mESC[31m  last approved explain plan: /home/runner/work/spark/spark/sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q5/explain.txtESC[0mESC[0m

@viirya
Copy link
Member

viirya commented May 7, 2022

@allisonwang-db I can reproduce it locally by running build/sbt "sql/testOnly *PlanStabilitySuite -- -z (tpcds-v1.4/q5)".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems cosmetic change only in the explain string?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and there is no plan change

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm if CI can pass.

@viirya
Copy link
Member

viirya commented May 9, 2022

CI still doesn't pass:

check simplified (tpcds-v1.4/q4) *** FAILED *** 

@allisonwang-db
Copy link
Contributor Author

Hmm let me try again

@allisonwang-db
Copy link
Contributor Author

This test failure is strange. When I ran it individually (for q4 and q5), it failed but when I ran the entire suite together build/sbt "sql/testOnly *PlanStabilitySuite" the individual test case can pass.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @allisonwang-db . Could you rebase this PR once more, please?

@allisonwang-db allisonwang-db force-pushed the spark-38918-branch-3.2 branch from 15c9ccc to 9652212 Compare May 23, 2022 18:04
@allisonwang-db
Copy link
Contributor Author

@dongjoon-hyun Looks like it still has the same issue:

# Test passed
build/sbt "sql/testOnly *PlanStabilitySuite"

# Test failed for q4 and q5
build/sbt "sql/testOnly *PlanStabilitySuite -- -z (tpcds-v1.4/q5)"

@dongjoon-hyun
Copy link
Member

Thank you for rechecking.

@olaky
Copy link
Contributor

olaky commented Jun 7, 2022

I am facing the same issues here: #36753

@olaky
Copy link
Contributor

olaky commented Jun 7, 2022

So the only change in the plan I can see that makes the test fail is that the last plan node has a source filename in it now, for example

Scan parquet default.web_site [web_site_sk,web_site_id] (PlanStabilitySuite.scala:156)

@viirya
Copy link
Member

viirya commented Jun 10, 2022

Can #36828 help on stablizing PlanStabilitySuite?

@dongjoon-hyun
Copy link
Member

+1 for @viirya 's comment. @cloud-fan found the root cause and is fixing now on that PR.

@dongjoon-hyun
Copy link
Member

We can revisit this PR after merging that PR.

@viirya
Copy link
Member

viirya commented Jun 10, 2022

@allisonwang-db #36828 is merged, can you rebase to trigger CI? Thanks.

… that do not belong to the current relation

This PR updates `ProjectionOverSchema`  to use the outputs of the data source relation to filter the attributes in the nested schema pruning. This is needed because the attributes in the schema do not necessarily belong to the current data source relation. For example, if a filter contains a correlated subquery, then the subquery's children can contain attributes from both the inner query and the outer query. Since the `RewriteSubquery` batch happens after early scan pushdown rules, nested schema pruning can wrongly use the inner query's attributes to prune the outer query data schema, thus causing wrong results and unexpected exceptions.

To fix a bug in `SchemaPruning`.

No

Unit test

Closes apache#36216 from allisonwang-db/spark-38918-nested-column-pruning.

Authored-by: allisonwang-db <[email protected]>
Signed-off-by: Liang-Chi Hsieh <[email protected]>
(cherry picked from commit 150434b)
Signed-off-by: Liang-Chi Hsieh <[email protected]>
(cherry picked from commit 793ba60)
Signed-off-by: allisonwang-db <[email protected]>
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pending CI

dongjoon-hyun pushed a commit that referenced this pull request Jun 10, 2022
…butes that do not belong to the current relation

### What changes were proposed in this pull request?

Backport #36216 to branch-3.2

### Why are the changes needed?

To fix a bug in `SchemaPruning`.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Unit test

Closes #36386 from allisonwang-db/spark-38918-branch-3.2.

Authored-by: allisonwang-db <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Jun 10, 2022

Merged to branch-3.2. Thank you so much, @allisonwang-db , @viirya .

sunchao pushed a commit to sunchao/spark that referenced this pull request Jun 2, 2023
…butes that do not belong to the current relation

### What changes were proposed in this pull request?

Backport apache#36216 to branch-3.2

### Why are the changes needed?

To fix a bug in `SchemaPruning`.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Unit test

Closes apache#36386 from allisonwang-db/spark-38918-branch-3.2.

Authored-by: allisonwang-db <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants