Skip to content

Conversation

@petermaxlee
Copy link
Contributor

What changes were proposed in this pull request?

This PR modifies StreamExecution such that it discards metadata for batches that have already been fully processed. I used the purge method that was added as part of SPARK-17235.

This is a resubmission of 15126, which was based on work by frreiss in #15067, but fixed the test case along with some typos.

How was this patch tested?

A new test case in StreamingQuerySuite. The test case would fail without the changes in this pull request.

## What changes were proposed in this pull request?
This PR modifies StreamExecution such that it discards metadata for batches that have already been fully processed. I used the purge method that was added as part of SPARK-17235.

This is based on work by frreiss in apache#15067, but fixed the test case along with some typos.

## How was this patch tested?
A new test case in StreamingQuerySuite. The test case would fail without the changes in this pull request.

Author: petermaxlee <[email protected]>
Author: frreiss <[email protected]>

Closes apache#15126 from petermaxlee/SPARK-17513.
@petermaxlee petermaxlee changed the title [SPARK-17513] [STREAMING] [SQL] Make StreamExecution garbage-collect its metadata [SPARK-17513][SQL] Make StreamExecution garbage-collect its metadata Sep 20, 2016
@SparkQA
Copy link

SparkQA commented Sep 20, 2016

Test build #65674 has finished for PR 15166 at commit 5e6113c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

AssertOnQuery("metadata log should contain only one file") { q =>
val metadataLogDir = new java.io.File(q.offsetLog.metadataPath.toString)
val logFileNames = metadataLogDir.listFiles().toSeq.map(_.getName())
val toTest = logFileNames.filter(! _.endsWith(".crc")) // Workaround for SPARK-17475
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the space between ! and _ intentionally added? I saw other similar code not having a space.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @frreiss added this to be more obvious. I don't really have a preference here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way is fine with me.

val logFileNames = metadataLogDir.listFiles().toSeq.map(_.getName())
val toTest = logFileNames.filter(! _.endsWith(".crc")) // Workaround for SPARK-17475
assert(toTest.size == 1 && toTest.head == "2")
true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line ("true") shouldn't be here. It makes the Assert always pass, even when the condition on the previous line isn't satisfied.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It still fails. There was an assert there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes. The previous like (146) should be just toTest.size == 1 && toTest.head == "2", with no "assert".

@rxin
Copy link
Contributor

rxin commented Sep 21, 2016

merging in master/2.0. Thanks.

@asfgit asfgit closed this in 976f3b1 Sep 21, 2016
asfgit pushed a commit that referenced this pull request Sep 21, 2016
## What changes were proposed in this pull request?
This PR modifies StreamExecution such that it discards metadata for batches that have already been fully processed. I used the purge method that was added as part of SPARK-17235.

This is a resubmission of 15126, which was based on work by frreiss in #15067, but fixed the test case along with some typos.

## How was this patch tested?
A new test case in StreamingQuerySuite. The test case would fail without the changes in this pull request.

Author: petermaxlee <[email protected]>

Closes #15166 from petermaxlee/SPARK-17513-2.

(cherry picked from commit 976f3b1)
Signed-off-by: Reynold Xin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants