Skip to content

Conversation

@sunchao
Copy link
Member

@sunchao sunchao commented Jan 9, 2021

What changes were proposed in this pull request?

This is a backport of #31081 to branch-3.1.

This changes ReplaceTableExec/AtomicReplaceTableExec, and uncaches the target table before it is dropped. In addition, this includes some refactoring by moving the uncacheTable method to DataSourceV2Strategy so that we don't need to pass a Spark session to the v2 exec.

Why are the changes needed?

Similar to SPARK-33492 (#30429). When a table is refreshed, the associated cache should be invalidated to avoid potential incorrect results.

Does this PR introduce any user-facing change?

Yes. Now When a data source v2 is cached (either directly or indirectly), all the relevant caches will be refreshed or invalidated if the table is replaced.

How was this patch tested?

Added a new unit test.

This changes `ReplaceTableExec`/`AtomicReplaceTableExec`, and uncaches the target table before it is dropped. In addition, this includes some refactoring by moving the `uncacheTable` method to `DataSourceV2Strategy` so that we don't need to pass a Spark session to the v2 exec.

Similar to SPARK-33492 (apache#30429). When a table is refreshed, the associated cache should be invalidated to avoid potential incorrect results.

Yes. Now When a data source v2 is cached (either directly or indirectly), all the relevant caches will be refreshed or invalidated if the table is replaced.

Added a new unit test.

Closes apache#31081 from sunchao/SPARK-34039.

Authored-by: Chao Sun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
@github-actions github-actions bot added the SQL label Jan 9, 2021
@sunchao sunchao changed the title [SPARK-34039][SQL] ReplaceTable should invalidate cache [SPARK-34039][SQL][3.1] ReplaceTable should invalidate cache Jan 9, 2021
@SparkQA
Copy link

SparkQA commented Jan 9, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38454/

@SparkQA
Copy link

SparkQA commented Jan 9, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38454/

@SparkQA
Copy link

SparkQA commented Jan 9, 2021

Test build #133865 has finished for PR 31100 at commit 5f601bc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Merged to branch-3.1.

HyukjinKwon pushed a commit that referenced this pull request Jan 10, 2021
### What changes were proposed in this pull request?

This is a backport of #31081 to branch-3.1.

This changes `ReplaceTableExec`/`AtomicReplaceTableExec`, and uncaches the target table before it is dropped. In addition, this includes some refactoring by moving the `uncacheTable` method to `DataSourceV2Strategy` so that we don't need to pass a Spark session to the v2 exec.

### Why are the changes needed?

Similar to SPARK-33492 (#30429). When a table is refreshed, the associated cache should be invalidated to avoid potential incorrect results.

### Does this PR introduce _any_ user-facing change?

Yes. Now When a data source v2 is cached (either directly or indirectly), all the relevant caches will be refreshed or invalidated if the table is replaced.

### How was this patch tested?

Added a new unit test.

Closes #31100 from sunchao/SPARK-34039-branch-3.1.

Authored-by: Chao Sun <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants