Skip to content

Conversation

@petermaxlee
Copy link
Contributor

What changes were proposed in this pull request?

This patch adds a purge interface to MetadataLog, and an implementation in HDFSMetadataLog. The purge function is currently unused, but I will use it to purge old execution and file source logs in follow-up patches. These changes are required in a production structured streaming job that runs for a long period of time.

How was this patch tested?

Added a unit test case in HDFSMetadataLogSuite.

@petermaxlee
Copy link
Contributor Author

@tdas and @zsxwing can you take a look at this? It's a pretty simple change.


def testManager(basePath: Path, fm: FileManager): Unit = {
/** Basic test case for [[FileManager]] implementation. */
private def testFileManager(basePath: Path, fm: FileManager): Unit = {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed this because initially I thought it's a noun meaning "manager for testing", rather than "to test the file manager".

@jerryshao
Copy link
Contributor

Looks like this is a little similar to this one #13513 .

@SparkQA
Copy link

SparkQA commented Aug 25, 2016

Test build #64403 has finished for PR 14802 at commit 0d9d1e6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zsxwing
Copy link
Member

zsxwing commented Aug 26, 2016

It would be great if we can reuse codes in FileStreamSinkLog for both FileStreamSource and FileStreamSink.

@petermaxlee
Copy link
Contributor Author

@zsxwing yup I plan to consolidate them.

@zsxwing
Copy link
Member

zsxwing commented Aug 26, 2016

@petermaxlee mind to submit the consolidation PR instead when you finish?

@petermaxlee
Copy link
Contributor Author

I can but I'm doing a lot of work in this area and it is a lot more difficult since they have dependencies. It would be better to merge the logically atomic pull requests.

@frreiss
Copy link
Contributor

frreiss commented Aug 26, 2016

LGTM. I have written nearly the exact same thing as part of [https://github.com//pull/14553], but can use this version of the method instead.

@rxin
Copy link
Contributor

rxin commented Aug 26, 2016

Alright I'm going to merge this in master/2.0.

@petermaxlee and @frreiss can you guys work together?

@asfgit asfgit closed this in f64a1dd Aug 26, 2016
asfgit pushed a commit that referenced this pull request Aug 26, 2016
## What changes were proposed in this pull request?
This patch adds a purge interface to MetadataLog, and an implementation in HDFSMetadataLog. The purge function is currently unused, but I will use it to purge old execution and file source logs in follow-up patches. These changes are required in a production structured streaming job that runs for a long period of time.

## How was this patch tested?
Added a unit test case in HDFSMetadataLogSuite.

Author: petermaxlee <[email protected]>

Closes #14802 from petermaxlee/SPARK-17235.

(cherry picked from commit f64a1dd)
Signed-off-by: Reynold Xin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants