-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-17513][SQL] Make StreamExecution garbage-collect its metadata #15166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
## What changes were proposed in this pull request? This PR modifies StreamExecution such that it discards metadata for batches that have already been fully processed. I used the purge method that was added as part of SPARK-17235. This is based on work by frreiss in apache#15067, but fixed the test case along with some typos. ## How was this patch tested? A new test case in StreamingQuerySuite. The test case would fail without the changes in this pull request. Author: petermaxlee <[email protected]> Author: frreiss <[email protected]> Closes apache#15126 from petermaxlee/SPARK-17513.
|
Test build #65674 has finished for PR 15166 at commit
|
| AssertOnQuery("metadata log should contain only one file") { q => | ||
| val metadataLogDir = new java.io.File(q.offsetLog.metadataPath.toString) | ||
| val logFileNames = metadataLogDir.listFiles().toSeq.map(_.getName()) | ||
| val toTest = logFileNames.filter(! _.endsWith(".crc")) // Workaround for SPARK-17475 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the space between ! and _ intentionally added? I saw other similar code not having a space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @frreiss added this to be more obvious. I don't really have a preference here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either way is fine with me.
| val logFileNames = metadataLogDir.listFiles().toSeq.map(_.getName()) | ||
| val toTest = logFileNames.filter(! _.endsWith(".crc")) // Workaround for SPARK-17475 | ||
| assert(toTest.size == 1 && toTest.head == "2") | ||
| true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line ("true") shouldn't be here. It makes the Assert always pass, even when the condition on the previous line isn't satisfied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It still fails. There was an assert there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes. The previous like (146) should be just toTest.size == 1 && toTest.head == "2", with no "assert".
|
merging in master/2.0. Thanks. |
## What changes were proposed in this pull request? This PR modifies StreamExecution such that it discards metadata for batches that have already been fully processed. I used the purge method that was added as part of SPARK-17235. This is a resubmission of 15126, which was based on work by frreiss in #15067, but fixed the test case along with some typos. ## How was this patch tested? A new test case in StreamingQuerySuite. The test case would fail without the changes in this pull request. Author: petermaxlee <[email protected]> Closes #15166 from petermaxlee/SPARK-17513-2. (cherry picked from commit 976f3b1) Signed-off-by: Reynold Xin <[email protected]>
What changes were proposed in this pull request?
This PR modifies StreamExecution such that it discards metadata for batches that have already been fully processed. I used the purge method that was added as part of SPARK-17235.
This is a resubmission of 15126, which was based on work by frreiss in #15067, but fixed the test case along with some typos.
How was this patch tested?
A new test case in StreamingQuerySuite. The test case would fail without the changes in this pull request.