[SPARK-35628][SS] RocksDBFileManager - load checkpoint from DFS #32767

xuanyuanking · 2021-06-03T10:54:56Z

What changes were proposed in this pull request?

The implementation for the load operation of RocksDBFileManager.

Why are the changes needed?

Provide the functionality of loading all necessary files for specific checkpoint versions from DFS to the given local directory.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

New UT added.

xuanyuanking · 2021-06-03T10:57:12Z

sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/RocksDBSuite.scala

Per #32582 (comment)

SparkQA · 2021-06-03T11:41:55Z

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43808/

SparkQA · 2021-06-03T13:28:39Z

Test build #139284 has finished for PR 32767 at commit ac47618.

This patch passes all tests.
This patch does not merge cleanly.
This patch adds no public classes.

xuanyuanking · 2021-06-09T07:23:29Z

Rebased this PR based on #32582. It's ready for review now. cc @viirya and @HeartSaVioR

viirya · 2021-06-09T07:38:02Z

Thank you @xuanyuanking. I'll find some time to review this.

SparkQA · 2021-06-09T10:02:22Z

Test build #139555 has finished for PR 32767 at commit 10d11b3.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-06-09T11:24:38Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44081/

SparkQA · 2021-06-09T11:33:55Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44081/

xuanyuanking · 2021-06-10T00:33:13Z

@viirya Great thanks for your help!

xuanyuanking · 2021-06-16T16:03:03Z

To make the RocksDB state store implementation can be reviewed quickly and easily. I just created all the rest PRs to provide us a global perspective. We can review them one by one, and I'll keep updating each of them:
#32928 - [SPARK-35784][SS] Implementation for RocksDB instance
#32933 - [SPARK-35785][SS] Cleanup support for RocksDB instance
#32934 - [SPARK-35788][SS] Metrics support for RocksDB instance

cc @viirya and @HeartSaVioR Thanks for your review.

viirya · 2021-06-16T16:10:30Z

Thanks you @xuanyuanking!

viirya

Overall looks good. Just a few questions and suggestions.

viirya · 2021-06-22T06:33:35Z

core/src/main/scala/org/apache/spark/util/Utils.scala

Looks like this is not the Java doc style we follow.

We have been using one-liner java doc; e.g. https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStore.scala#L50

Oh, I thought it is two lines.

But we'd like to see this multiple lines as below review comments :)

Yea, we'll have multiple lines here in the next commit :)

@xuanyuanking Seems you overwrite previous change? This looks the previous version.

@xuanyuanking friendly reminder

……Sorry I missed this... Updating

viirya · 2021-06-22T06:35:45Z

core/src/main/scala/org/apache/spark/util/Utils.scala

We don't process directory. Could you also mention it in the method doc?

It'd be ideal if we make it clear in method name, like unzipFilesFromFile. (Ideally I'd like to see this also extracts the directory, but let's postpone it till necessary.)

In general we expect unzipping will extract the directories as well. That said, we need to make the behavior very clear to the caller side. I agree this should be mentioned to the java doc, but method name should be also intuitive to expect the actual behavior.

Make sense, method name changed and comment added.

viirya · 2021-06-22T06:36:36Z

core/src/main/scala/org/apache/spark/util/Utils.scala

Hmm, are we sure we don't need to process any error during unzipping?

This looks safe; if there's an exception we may see some files being extracted and the one of output files may be broken, but callers will catch an exception and indicate the output directory is not healthy. If necessary let's document this in javadoc as well.

Yes, we rely on the caller side to address any exceptions. Javadoc added as well.

viirya · 2021-06-22T06:47:26Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala

When it is possible to have existing file in local dir which has same file name but not the same file in DFS?

Or just a safer guard?

A safer guard for checking both file names and file size.

viirya · 2021-06-22T06:49:48Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala

We can do this logInfo after the file size check.

More specifically, we can do the file size check just after copyToLocalFile, and accumulations can be placed later.

Thanks, do the size check right after copyToLocalFile and place the logInfo in the end.

viirya · 2021-06-22T06:50:15Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala

Add filesReused into this log message?

Thanks! Done in the next commit.

viirya · 2021-06-22T06:55:27Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala

Is this only used by tests?

HeartSaVioR

Minor comments.

HeartSaVioR · 2021-06-22T07:09:42Z

core/src/main/scala/org/apache/spark/util/Utils.scala

We have been using one-liner java doc; e.g. https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStore.scala#L50

HeartSaVioR · 2021-06-22T07:14:15Z

core/src/main/scala/org/apache/spark/util/Utils.scala

It'd be ideal if we make it clear in method name, like unzipFilesFromFile. (Ideally I'd like to see this also extracts the directory, but let's postpone it till necessary.)

In general we expect unzipping will extract the directories as well. That said, we need to make the behavior very clear to the caller side. I agree this should be mentioned to the java doc, but method name should be also intuitive to expect the actual behavior.

HeartSaVioR · 2021-06-22T07:23:51Z

core/src/main/scala/org/apache/spark/util/Utils.scala

This looks safe; if there's an exception we may see some files being extracted and the one of output files may be broken, but callers will catch an exception and indicate the output directory is not healthy. If necessary let's document this in javadoc as well.

HeartSaVioR · 2021-06-22T07:24:37Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala

nit: deleted -> delete

Thanks, done in the next commit

HeartSaVioR · 2021-06-22T07:31:47Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala

What if we just remove the all files in localDir? Just would like to know the reason we don't clear the directory but just remove the specific files. Would we need to leverage some remaining files?

Yes. The consideration here is mainly for immutable files like sst/log files. We can avoid IO for the immutable files shared among different versions.

HeartSaVioR · 2021-06-22T07:40:47Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala

More specifically, we can do the file size check just after copyToLocalFile, and accumulations can be placed later.

xuanyuanking · 2021-06-23T10:01:22Z

Great thanks for your detailed review, @viirya @HeartSaVioR. All comments addressed.

SparkQA · 2021-06-23T11:18:33Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44729/

HeartSaVioR

+1
Let's wait for @viirya to have another round of review and do explicit approval.

SparkQA · 2021-06-23T11:29:55Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44729/

SparkQA · 2021-06-23T13:05:52Z

Test build #140201 has finished for PR 32767 at commit 7c79a29.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HeartSaVioR · 2021-06-24T06:18:41Z

@xuanyuanking Oh we need to fix new code conflicts as well.

viirya · 2021-06-24T06:25:55Z

sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/RocksDBSuite.scala

files with same? with same filename?

Ah thanks! Fix done.

viirya · 2021-06-24T06:28:19Z

@xuanyuanking Could you resolve the conflict and the minor comment? Then we can move this forward. Thanks!

xuanyuanking · 2021-06-24T15:12:52Z

Thanks for the review!

SparkQA · 2021-06-24T15:48:44Z

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44801/

SparkQA · 2021-06-24T15:49:58Z

Test build #140270 has finished for PR 32767 at commit fdd0d61.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-06-25T04:32:55Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44831/

SparkQA · 2021-06-25T05:06:07Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44831/

SparkQA · 2021-06-25T06:33:25Z

Test build #140300 has finished for PR 32767 at commit be15054.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-06-25T06:35:49Z

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44836/

SparkQA · 2021-06-25T08:35:34Z

Test build #140305 has finished for PR 32767 at commit 7279d43.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HeartSaVioR · 2021-06-25T09:37:31Z

Jenkins passed. Thanks! Merging to master.

HeartSaVioR · 2021-06-25T09:39:28Z

Thanks @xuanyuanking for the contribution! I merged into master.

HeartSaVioR · 2021-06-25T09:42:01Z

Please also rebase the next PRs into master branch; we can continue reviewing next PR.

xuanyuanking · 2021-06-25T09:46:56Z

Copy that. I'm rebasing now. Thanks for the help!

HeartSaVioR · 2021-06-25T11:26:25Z

UPDATE: we found a consistent break on Scala 2.13 build caused by this. @xuanyuanking is working on the fix so please allow us some time to fix it as follow-up PR instead of reverting this.

xuanyuanking · 2021-06-25T11:34:38Z

Great thanks @HeartSaVioR and @Ngone51. Submitted #33084.

…uild ### What changes were proposed in this pull request? Fix the consistent break on Scala 2.13 build caused by the PR #32767 ### Why are the changes needed? Fix the consistent break. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests. Closes #33084 from xuanyuanking/SPARK-35628-follow. Authored-by: Yuanjian Li <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

github-actions bot added CORE SQL STRUCTURED STREAMING labels Jun 3, 2021

xuanyuanking commented Jun 3, 2021

View reviewed changes

sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/RocksDBSuite.scala Outdated

Copy link

Member Author

xuanyuanking Jun 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per #32582 (comment)

xuanyuanking force-pushed the SPARK-35628 branch from ac47618 to 10d11b3 Compare June 9, 2021 07:22

xuanyuanking changed the title ~~[WIP][SPARK-35628][SS] RocksDBFileManager - load checkpoint from DFS~~ [SPARK-35628][SS] RocksDBFileManager - load checkpoint from DFS Jun 9, 2021

viirya reviewed Jun 22, 2021

View reviewed changes

viirya mentioned this pull request Jun 22, 2021

[SPARK-35784][SS] Implementation for RocksDB instance #32928

Closed

HeartSaVioR reviewed Jun 22, 2021

View reviewed changes

HeartSaVioR approved these changes Jun 23, 2021

View reviewed changes

viirya reviewed Jun 24, 2021

View reviewed changes

viirya approved these changes Jun 24, 2021

View reviewed changes

xuanyuanking force-pushed the SPARK-35628 branch from 7c79a29 to fdd0d61 Compare June 24, 2021 15:12

xuanyuanking added 5 commits June 25, 2021 13:41

implementation for the load path

fda01c4

address comments

4c8fb46

fix

829c756

fix

38b47aa

Address the missing comment

7279d43

xuanyuanking force-pushed the SPARK-35628 branch from be15054 to 7279d43 Compare June 25, 2021 05:42

HeartSaVioR closed this in f2029e7 Jun 25, 2021

xuanyuanking mentioned this pull request Jun 25, 2021

[SPARK-35628][SS][FOLLOW-UP] Fix the consistent break on Scala 2.13 build #33084

Closed

xuanyuanking deleted the SPARK-35628 branch June 25, 2021 14:09

[SPARK-35628][SS] RocksDBFileManager - load checkpoint from DFS #32767

[SPARK-35628][SS] RocksDBFileManager - load checkpoint from DFS #32767

Uh oh!

Conversation

xuanyuanking commented Jun 3, 2021

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jun 3, 2021

Uh oh!

SparkQA commented Jun 3, 2021

Uh oh!

xuanyuanking commented Jun 9, 2021

Uh oh!

viirya commented Jun 9, 2021

Uh oh!

SparkQA commented Jun 9, 2021

Uh oh!

SparkQA commented Jun 9, 2021

Uh oh!

SparkQA commented Jun 9, 2021

Uh oh!

xuanyuanking commented Jun 10, 2021

Uh oh!

xuanyuanking commented Jun 16, 2021

Uh oh!

viirya commented Jun 16, 2021

Uh oh!

viirya left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment