Skip to content

Conversation

@jerryshao
Copy link
Contributor

What changes were proposed in this pull request?

Currently with SPARK-15698, FileStreamSource metadata log will be compacted periodically (10 batches by default), this means compacted batch file will contain whole file entries been processed. With the time passed, the compacted batch file will be accumulated to a very large file.

With SPARK-17165, now FileStreamSource doesn't track the aged file entry in memory, but in the log we still keep the full logs, this is not necessary and quite time-consuming during recovery. So here propose to also add file entry purging ability to remove aged file entries.

How was this patch tested?

Unit test added.

@SparkQA
Copy link

SparkQA commented Sep 23, 2016

Test build #65811 has finished for PR 15210 at commit 20a6c4b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zsxwing
Copy link
Member

zsxwing commented Oct 24, 2016

@jerryshao could you close this PR, since most of codes need to be changed after #14553 gets merged?

@jerryshao
Copy link
Contributor Author

Sure, thanks.

@jerryshao jerryshao closed this Oct 25, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants