-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-18721][SS]Fix ForeachSink with watermark + append #16160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| class ForeachSink[T : Encoder](writer: ForeachWriter[T]) extends Sink with Serializable { | ||
|
|
||
| override def addBatch(batchId: Long, data: DataFrame): Unit = { | ||
| // TODO: Refine this method when SPARK-16264 is resolved; see comments below. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SPARK-16264 was resolved as Won't Fix. So I removed it from the comment.
tdas
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs a bit more work with the tests
| } | ||
| } | ||
|
|
||
| test("foreach with watermark: append") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this test that is not covered in the previous test "watermark + complete"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tdas As no eviction with complete mode, it will always output all data. So basically, the test "watermark + complete" is not super helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually complete mode should NOT work when watermark is enabled!! Why does this query still work? Thats material for different PR. So I approve this change in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NVM. watermark is a noop in complete mode. false alarm.
| } finally { | ||
| query.stop() | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont see a test that verifies whether the metrics are correct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a simple test for metrics
|
LGTM. |
|
Test build #69698 has finished for PR 16160 at commit
|
|
Test build #69697 has finished for PR 16160 at commit
|
|
Forgot that |
|
Test build #69706 has finished for PR 16160 at commit
|
## What changes were proposed in this pull request?
Right now ForeachSink creates a new physical plan, so StreamExecution cannot retrieval metrics and watermark.
This PR changes ForeachSink to manually convert InternalRows to objects without creating a new plan.
## How was this patch tested?
`test("foreach with watermark: append")`.
Author: Shixiong Zhu <[email protected]>
Closes #16160 from zsxwing/SPARK-18721.
(cherry picked from commit 7863c62)
Signed-off-by: Tathagata Das <[email protected]>
## What changes were proposed in this pull request?
Right now ForeachSink creates a new physical plan, so StreamExecution cannot retrieval metrics and watermark.
This PR changes ForeachSink to manually convert InternalRows to objects without creating a new plan.
## How was this patch tested?
`test("foreach with watermark: append")`.
Author: Shixiong Zhu <[email protected]>
Closes apache#16160 from zsxwing/SPARK-18721.
## What changes were proposed in this pull request?
Right now ForeachSink creates a new physical plan, so StreamExecution cannot retrieval metrics and watermark.
This PR changes ForeachSink to manually convert InternalRows to objects without creating a new plan.
## How was this patch tested?
`test("foreach with watermark: append")`.
Author: Shixiong Zhu <[email protected]>
Closes apache#16160 from zsxwing/SPARK-18721.
What changes were proposed in this pull request?
Right now ForeachSink creates a new physical plan, so StreamExecution cannot retrieval metrics and watermark.
This PR changes ForeachSink to manually convert InternalRows to objects without creating a new plan.
How was this patch tested?
test("foreach with watermark: append").