Skip to content

Conversation

@xuanyuanking
Copy link
Member

What changes were proposed in this pull request?

The implementation for the load operation of RocksDBFileManager.

Why are the changes needed?

Provide the functionality of loading all necessary files for specific checkpoint versions from DFS to the given local directory.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

New UT added.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SparkQA
Copy link

SparkQA commented Jun 3, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43808/

@SparkQA
Copy link

SparkQA commented Jun 3, 2021

Test build #139284 has finished for PR 32767 at commit ac47618.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@xuanyuanking xuanyuanking changed the title [WIP][SPARK-35628][SS] RocksDBFileManager - load checkpoint from DFS [SPARK-35628][SS] RocksDBFileManager - load checkpoint from DFS Jun 9, 2021
@xuanyuanking
Copy link
Member Author

Rebased this PR based on #32582. It's ready for review now. cc @viirya and @HeartSaVioR

@viirya
Copy link
Member

viirya commented Jun 9, 2021

Thank you @xuanyuanking. I'll find some time to review this.

@SparkQA
Copy link

SparkQA commented Jun 9, 2021

Test build #139555 has finished for PR 32767 at commit 10d11b3.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 9, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44081/

@SparkQA
Copy link

SparkQA commented Jun 9, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44081/

@xuanyuanking
Copy link
Member Author

@viirya Great thanks for your help!

@xuanyuanking
Copy link
Member Author

To make the RocksDB state store implementation can be reviewed quickly and easily. I just created all the rest PRs to provide us a global perspective. We can review them one by one, and I'll keep updating each of them:
#32928 - [SPARK-35784][SS] Implementation for RocksDB instance
#32933 - [SPARK-35785][SS] Cleanup support for RocksDB instance
#32934 - [SPARK-35788][SS] Metrics support for RocksDB instance

cc @viirya and @HeartSaVioR Thanks for your review.

@viirya
Copy link
Member

viirya commented Jun 16, 2021

Thanks you @xuanyuanking!

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good. Just a few questions and suggestions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is not the Java doc style we follow.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I thought it is two lines.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we'd like to see this multiple lines as below review comments :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, we'll have multiple lines here in the next commit :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xuanyuanking Seems you overwrite previous change? This looks the previous version.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xuanyuanking friendly reminder

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

……Sorry I missed this... Updating

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't process directory. Could you also mention it in the method doc?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be ideal if we make it clear in method name, like unzipFilesFromFile. (Ideally I'd like to see this also extracts the directory, but let's postpone it till necessary.)

In general we expect unzipping will extract the directories as well. That said, we need to make the behavior very clear to the caller side. I agree this should be mentioned to the java doc, but method name should be also intuitive to expect the actual behavior.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense, method name changed and comment added.

Comment on lines +3110 to +3154
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, are we sure we don't need to process any error during unzipping?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks safe; if there's an exception we may see some files being extracted and the one of output files may be broken, but callers will catch an exception and indicate the output directory is not healthy. If necessary let's document this in javadoc as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we rely on the caller side to address any exceptions. Javadoc added as well.

Comment on lines +260 to +265
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When it is possible to have existing file in local dir which has same file name but not the same file in DFS?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or just a safer guard?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A safer guard for checking both file names and file size.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do this logInfo after the file size check.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More specifically, we can do the file size check just after copyToLocalFile, and accumulations can be placed later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, do the size check right after copyToLocalFile and place the logInfo in the end.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add filesReused into this log message?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Done in the next commit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this only used by tests?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

Copy link
Contributor

@HeartSaVioR HeartSaVioR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be ideal if we make it clear in method name, like unzipFilesFromFile. (Ideally I'd like to see this also extracts the directory, but let's postpone it till necessary.)

In general we expect unzipping will extract the directories as well. That said, we need to make the behavior very clear to the caller side. I agree this should be mentioned to the java doc, but method name should be also intuitive to expect the actual behavior.

Comment on lines +3110 to +3154
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks safe; if there's an exception we may see some files being extracted and the one of output files may be broken, but callers will catch an exception and indicate the output directory is not healthy. If necessary let's document this in javadoc as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: deleted -> delete

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, done in the next commit

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we just remove the all files in localDir? Just would like to know the reason we don't clear the directory but just remove the specific files. Would we need to leverage some remaining files?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The consideration here is mainly for immutable files like sst/log files. We can avoid IO for the immutable files shared among different versions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More specifically, we can do the file size check just after copyToLocalFile, and accumulations can be placed later.

@xuanyuanking
Copy link
Member Author

Great thanks for your detailed review, @viirya @HeartSaVioR. All comments addressed.

@SparkQA
Copy link

SparkQA commented Jun 23, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44729/

Copy link
Contributor

@HeartSaVioR HeartSaVioR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1
Let's wait for @viirya to have another round of review and do explicit approval.

@SparkQA
Copy link

SparkQA commented Jun 23, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44729/

@SparkQA
Copy link

SparkQA commented Jun 23, 2021

Test build #140201 has finished for PR 32767 at commit 7c79a29.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HeartSaVioR
Copy link
Contributor

@xuanyuanking Oh we need to fix new code conflicts as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

files with same? with same filename?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah thanks! Fix done.

@viirya
Copy link
Member

viirya commented Jun 24, 2021

@xuanyuanking Could you resolve the conflict and the minor comment? Then we can move this forward. Thanks!

@xuanyuanking
Copy link
Member Author

Thanks for the review!

@SparkQA
Copy link

SparkQA commented Jun 24, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44801/

@SparkQA
Copy link

SparkQA commented Jun 24, 2021

Test build #140270 has finished for PR 32767 at commit fdd0d61.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 25, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44831/

@SparkQA
Copy link

SparkQA commented Jun 25, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44831/

@SparkQA
Copy link

SparkQA commented Jun 25, 2021

Test build #140300 has finished for PR 32767 at commit be15054.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 25, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44836/

@SparkQA
Copy link

SparkQA commented Jun 25, 2021

Test build #140305 has finished for PR 32767 at commit 7279d43.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HeartSaVioR
Copy link
Contributor

Jenkins passed. Thanks! Merging to master.

@HeartSaVioR
Copy link
Contributor

Thanks @xuanyuanking for the contribution! I merged into master.

@HeartSaVioR
Copy link
Contributor

Please also rebase the next PRs into master branch; we can continue reviewing next PR.

@xuanyuanking
Copy link
Member Author

Copy that. I'm rebasing now. Thanks for the help!

@HeartSaVioR
Copy link
Contributor

UPDATE: we found a consistent break on Scala 2.13 build caused by this. @xuanyuanking is working on the fix so please allow us some time to fix it as follow-up PR instead of reverting this.

@xuanyuanking
Copy link
Member Author

Great thanks @HeartSaVioR and @Ngone51. Submitted #33084.

dongjoon-hyun pushed a commit that referenced this pull request Jun 25, 2021
…uild

### What changes were proposed in this pull request?
Fix the consistent break on Scala 2.13 build caused by the PR #32767

### Why are the changes needed?
Fix the consistent break.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Existing tests.

Closes #33084 from xuanyuanking/SPARK-35628-follow.

Authored-by: Yuanjian Li <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
@xuanyuanking xuanyuanking deleted the SPARK-35628 branch June 25, 2021 14:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants