Skip to content

Conversation

@frreiss
Copy link
Contributor

@frreiss frreiss commented Sep 13, 2016

What changes were proposed in this pull request?

This PR modifies StreamExecution such that it discards metadata for batches that have already been fully processed. I used the purge method that was added as part of SPARK-17235.

How was this patch tested?

I added a test case to verify that old metadata log files are correctly purged.
I also ran the entire regression suite.

@rxin
Copy link
Contributor

rxin commented Sep 17, 2016

Jenkins, test this please.

@rxin
Copy link
Contributor

rxin commented Sep 17, 2016

Looks pretty good to me.

petermaxlee added a commit to petermaxlee/spark that referenced this pull request Sep 17, 2016
[SPARK-17513] [STREAMING] [SQL] Make StreamExecution garbage-collect its metadata
val metadataLogDir = new java.io.File(q.offsetLog.metadataPath.toString)
val logFileNames = metadataLogDir.listFiles().toSeq.map(_.getName())
val toTest = logFileNames.filter(! _.endsWith(".crc")) // Workaround for SPARK-17475
toTest.size == 1 && toTest.head == "2"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually this test case will always pass, even without the change here. I will submit a patch based on yours that fixes the issues here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, didn't notice I had left that "true" at the end of that block of code. @petermaxlee , let me know if you need help preparing the final version for merge.

@SparkQA
Copy link

SparkQA commented Sep 17, 2016

Test build #65525 has finished for PR 15067 at commit 82f5b68.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 17, 2016

Test build #3276 has finished for PR 15067 at commit 82f5b68.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Sep 20, 2016

@frreiss can you close this now?

asfgit pushed a commit that referenced this pull request Sep 20, 2016
## What changes were proposed in this pull request?
This PR modifies StreamExecution such that it discards metadata for batches that have already been fully processed. I used the purge method that was added as part of SPARK-17235.

This is based on work by frreiss in #15067, but fixed the test case along with some typos.

## How was this patch tested?
A new test case in StreamingQuerySuite. The test case would fail without the changes in this pull request.

Author: petermaxlee <[email protected]>
Author: frreiss <[email protected]>

Closes #15126 from petermaxlee/SPARK-17513.
asfgit pushed a commit that referenced this pull request Sep 20, 2016
## What changes were proposed in this pull request?
This PR modifies StreamExecution such that it discards metadata for batches that have already been fully processed. I used the purge method that was added as part of SPARK-17235.

This is based on work by frreiss in #15067, but fixed the test case along with some typos.

## How was this patch tested?
A new test case in StreamingQuerySuite. The test case would fail without the changes in this pull request.

Author: petermaxlee <[email protected]>
Author: frreiss <[email protected]>

Closes #15126 from petermaxlee/SPARK-17513.

(cherry picked from commit be9d57f)
Signed-off-by: Reynold Xin <[email protected]>
@frreiss frreiss closed this Sep 20, 2016
petermaxlee added a commit to petermaxlee/spark that referenced this pull request Sep 20, 2016
## What changes were proposed in this pull request?
This PR modifies StreamExecution such that it discards metadata for batches that have already been fully processed. I used the purge method that was added as part of SPARK-17235.

This is based on work by frreiss in apache#15067, but fixed the test case along with some typos.

## How was this patch tested?
A new test case in StreamingQuerySuite. The test case would fail without the changes in this pull request.

Author: petermaxlee <[email protected]>
Author: frreiss <[email protected]>

Closes apache#15126 from petermaxlee/SPARK-17513.
ghost pushed a commit to dbtsai/spark that referenced this pull request Sep 21, 2016
## What changes were proposed in this pull request?
This PR modifies StreamExecution such that it discards metadata for batches that have already been fully processed. I used the purge method that was added as part of SPARK-17235.

This is a resubmission of 15126, which was based on work by frreiss in apache#15067, but fixed the test case along with some typos.

## How was this patch tested?
A new test case in StreamingQuerySuite. The test case would fail without the changes in this pull request.

Author: petermaxlee <[email protected]>

Closes apache#15166 from petermaxlee/SPARK-17513-2.
asfgit pushed a commit that referenced this pull request Sep 21, 2016
## What changes were proposed in this pull request?
This PR modifies StreamExecution such that it discards metadata for batches that have already been fully processed. I used the purge method that was added as part of SPARK-17235.

This is a resubmission of 15126, which was based on work by frreiss in #15067, but fixed the test case along with some typos.

## How was this patch tested?
A new test case in StreamingQuerySuite. The test case would fail without the changes in this pull request.

Author: petermaxlee <[email protected]>

Closes #15166 from petermaxlee/SPARK-17513-2.

(cherry picked from commit 976f3b1)
Signed-off-by: Reynold Xin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants