-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-17513] [STREAMING] [SQL] Make StreamExecution garbage-collect its metadata #15067
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Jenkins, test this please. |
|
Looks pretty good to me. |
[SPARK-17513] [STREAMING] [SQL] Make StreamExecution garbage-collect its metadata
| val metadataLogDir = new java.io.File(q.offsetLog.metadataPath.toString) | ||
| val logFileNames = metadataLogDir.listFiles().toSeq.map(_.getName()) | ||
| val toTest = logFileNames.filter(! _.endsWith(".crc")) // Workaround for SPARK-17475 | ||
| toTest.size == 1 && toTest.head == "2" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually this test case will always pass, even without the change here. I will submit a patch based on yours that fixes the issues here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, didn't notice I had left that "true" at the end of that block of code. @petermaxlee , let me know if you need help preparing the final version for merge.
|
Test build #65525 has finished for PR 15067 at commit
|
|
Test build #3276 has finished for PR 15067 at commit
|
|
@frreiss can you close this now? |
## What changes were proposed in this pull request? This PR modifies StreamExecution such that it discards metadata for batches that have already been fully processed. I used the purge method that was added as part of SPARK-17235. This is based on work by frreiss in #15067, but fixed the test case along with some typos. ## How was this patch tested? A new test case in StreamingQuerySuite. The test case would fail without the changes in this pull request. Author: petermaxlee <[email protected]> Author: frreiss <[email protected]> Closes #15126 from petermaxlee/SPARK-17513.
## What changes were proposed in this pull request? This PR modifies StreamExecution such that it discards metadata for batches that have already been fully processed. I used the purge method that was added as part of SPARK-17235. This is based on work by frreiss in #15067, but fixed the test case along with some typos. ## How was this patch tested? A new test case in StreamingQuerySuite. The test case would fail without the changes in this pull request. Author: petermaxlee <[email protected]> Author: frreiss <[email protected]> Closes #15126 from petermaxlee/SPARK-17513. (cherry picked from commit be9d57f) Signed-off-by: Reynold Xin <[email protected]>
## What changes were proposed in this pull request? This PR modifies StreamExecution such that it discards metadata for batches that have already been fully processed. I used the purge method that was added as part of SPARK-17235. This is based on work by frreiss in apache#15067, but fixed the test case along with some typos. ## How was this patch tested? A new test case in StreamingQuerySuite. The test case would fail without the changes in this pull request. Author: petermaxlee <[email protected]> Author: frreiss <[email protected]> Closes apache#15126 from petermaxlee/SPARK-17513.
## What changes were proposed in this pull request? This PR modifies StreamExecution such that it discards metadata for batches that have already been fully processed. I used the purge method that was added as part of SPARK-17235. This is a resubmission of 15126, which was based on work by frreiss in apache#15067, but fixed the test case along with some typos. ## How was this patch tested? A new test case in StreamingQuerySuite. The test case would fail without the changes in this pull request. Author: petermaxlee <[email protected]> Closes apache#15166 from petermaxlee/SPARK-17513-2.
## What changes were proposed in this pull request? This PR modifies StreamExecution such that it discards metadata for batches that have already been fully processed. I used the purge method that was added as part of SPARK-17235. This is a resubmission of 15126, which was based on work by frreiss in #15067, but fixed the test case along with some typos. ## How was this patch tested? A new test case in StreamingQuerySuite. The test case would fail without the changes in this pull request. Author: petermaxlee <[email protected]> Closes #15166 from petermaxlee/SPARK-17513-2. (cherry picked from commit 976f3b1) Signed-off-by: Reynold Xin <[email protected]>
What changes were proposed in this pull request?
This PR modifies StreamExecution such that it discards metadata for batches that have already been fully processed. I used the
purgemethod that was added as part of SPARK-17235.How was this patch tested?
I added a test case to verify that old metadata log files are correctly purged.
I also ran the entire regression suite.