[SPARK-28863][SQL][FOLLOWUP] Make sure optimized plan will not be re-analyzed #30777

cloud-fan · 2020-12-15T07:51:56Z

What changes were proposed in this pull request?

It's a known issue that re-analyzing an optimized plan can lead to various issues. We made several attempts to avoid it from happening, but the current solution AlreadyOptimized is still not 100% safe, as people can inject catalyst rules to call analyzer directly.

This PR proposes a simpler and safer idea: we set the analyzed flag to true after optimization, and analyzer will skip processing plans whose analyzed flag is true.

Why are the changes needed?

make the code simpler and safer

Does this PR introduce any user-facing change?

no

How was this patch tested?

existing tests.

SparkQA · 2020-12-15T08:50:34Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37413/

SparkQA · 2020-12-15T09:19:30Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37413/

SparkQA · 2020-12-15T12:58:21Z

Test build #132812 has finished for PR 30777 at commit bcca0d9.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2020-12-15T13:48:26Z

cc @brkyvz

maropu · 2020-12-20T11:53:26Z

The change looks reasonable to me.

…analyzed ### What changes were proposed in this pull request? It's a known issue that re-analyzing an optimized plan can lead to various issues. We made several attempts to avoid it from happening, but the current solution `AlreadyOptimized` is still not 100% safe, as people can inject catalyst rules to call analyzer directly. This PR proposes a simpler and safer idea: we set the `analyzed` flag to true after optimization, and analyzer will skip processing plans whose `analyzed` flag is true. ### Why are the changes needed? make the code simpler and safer ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? existing tests. Closes #30777 from cloud-fan/ds. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: HyukjinKwon <[email protected]> (cherry picked from commit b4bea1a) Signed-off-by: HyukjinKwon <[email protected]>

HyukjinKwon · 2020-12-21T12:00:44Z

Merged to master and branch-3.1.

@cloud-fan, it has a conflict with branch-3.0. Do you mind opening a backport PR?

brkyvz · 2020-12-21T15:32:31Z

late LGTM. I believe this is a much more robust solution

…analyzed It's a known issue that re-analyzing an optimized plan can lead to various issues. We made several attempts to avoid it from happening, but the current solution `AlreadyOptimized` is still not 100% safe, as people can inject catalyst rules to call analyzer directly. This PR proposes a simpler and safer idea: we set the `analyzed` flag to true after optimization, and analyzer will skip processing plans whose `analyzed` flag is true. make the code simpler and safer no existing tests. Closes apache#30777 from cloud-fan/ds. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: HyukjinKwon <[email protected]> (cherry picked from commit b4bea1a) Signed-off-by: HyukjinKwon <[email protected]>

…e re-analyzed backport #30777 to 3.0 ---------- ### What changes were proposed in this pull request? It's a known issue that re-analyzing an optimized plan can lead to various issues. We made several attempts to avoid it from happening, but the current solution `AlreadyOptimized` is still not 100% safe, as people can inject catalyst rules to call analyzer directly. This PR proposes a simpler and safer idea: we set the `analyzed` flag to true after optimization, and analyzer will skip processing plans whose `analyzed` flag is true. ### Why are the changes needed? make the code simpler and safer ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? existing tests. Closes #30872 from cloud-fan/ds. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: HyukjinKwon <[email protected]>

optimized plan should not be re-analyzed

bcca0d9

github-actions bot added the SQL label Dec 15, 2020

HyukjinKwon approved these changes Dec 21, 2020

View reviewed changes

HyukjinKwon closed this in b4bea1a Dec 21, 2020

cloud-fan mentioned this pull request Dec 21, 2020

[SPARK-28863][SQL][FOLLOWUP][3.0] Make sure optimized plan will not be re-analyzed #30872

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-28863][SQL][FOLLOWUP] Make sure optimized plan will not be re-analyzed #30777

[SPARK-28863][SQL][FOLLOWUP] Make sure optimized plan will not be re-analyzed #30777

Uh oh!

cloud-fan commented Dec 15, 2020

Uh oh!

SparkQA commented Dec 15, 2020

Uh oh!

SparkQA commented Dec 15, 2020

Uh oh!

SparkQA commented Dec 15, 2020

Uh oh!

cloud-fan commented Dec 15, 2020

Uh oh!

maropu commented Dec 20, 2020

Uh oh!

HyukjinKwon commented Dec 21, 2020

Uh oh!

brkyvz commented Dec 21, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[SPARK-28863][SQL][FOLLOWUP] Make sure optimized plan will not be re-analyzed #30777

[SPARK-28863][SQL][FOLLOWUP] Make sure optimized plan will not be re-analyzed #30777

Uh oh!

Conversation

cloud-fan commented Dec 15, 2020

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

SparkQA commented Dec 15, 2020

Uh oh!

SparkQA commented Dec 15, 2020

Uh oh!

SparkQA commented Dec 15, 2020

Uh oh!

cloud-fan commented Dec 15, 2020

Uh oh!

maropu commented Dec 20, 2020

Uh oh!

HyukjinKwon commented Dec 21, 2020

Uh oh!

brkyvz commented Dec 21, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants