Skip to content

Conversation

@zsxwing
Copy link
Member

@zsxwing zsxwing commented Dec 5, 2016

What changes were proposed in this pull request?

Right now ForeachSink creates a new physical plan, so StreamExecution cannot retrieval metrics and watermark.

This PR changes ForeachSink to manually convert InternalRows to objects without creating a new plan.

How was this patch tested?

test("foreach with watermark: append").

class ForeachSink[T : Encoder](writer: ForeachWriter[T]) extends Sink with Serializable {

override def addBatch(batchId: Long, data: DataFrame): Unit = {
// TODO: Refine this method when SPARK-16264 is resolved; see comments below.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SPARK-16264 was resolved as Won't Fix. So I removed it from the comment.

Copy link
Contributor

@tdas tdas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a bit more work with the tests

}
}

test("foreach with watermark: append") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this test that is not covered in the previous test "watermark + complete"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tdas As no eviction with complete mode, it will always output all data. So basically, the test "watermark + complete" is not super helpful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually complete mode should NOT work when watermark is enabled!! Why does this query still work? Thats material for different PR. So I approve this change in this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NVM. watermark is a noop in complete mode. false alarm.

} finally {
query.stop()
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont see a test that verifies whether the metrics are correct

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a simple test for metrics

@tdas
Copy link
Contributor

tdas commented Dec 6, 2016

LGTM.

@SparkQA
Copy link

SparkQA commented Dec 6, 2016

Test build #69698 has finished for PR 16160 at commit ac9009f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 6, 2016

Test build #69697 has finished for PR 16160 at commit 50cf3e6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zsxwing
Copy link
Member Author

zsxwing commented Dec 6, 2016

Forgot that lastProgress may be the batch without data. Updated the test to drop no data progress.

@SparkQA
Copy link

SparkQA commented Dec 6, 2016

Test build #69706 has finished for PR 16160 at commit 3a7afe7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@asfgit asfgit closed this in 7863c62 Dec 6, 2016
asfgit pushed a commit that referenced this pull request Dec 6, 2016
## What changes were proposed in this pull request?

Right now ForeachSink creates a new physical plan, so StreamExecution cannot retrieval metrics and watermark.

This PR changes ForeachSink to manually convert InternalRows to objects without creating a new plan.

## How was this patch tested?

`test("foreach with watermark: append")`.

Author: Shixiong Zhu <[email protected]>

Closes #16160 from zsxwing/SPARK-18721.

(cherry picked from commit 7863c62)
Signed-off-by: Tathagata Das <[email protected]>
@zsxwing zsxwing deleted the SPARK-18721 branch December 6, 2016 04:45
robert3005 pushed a commit to palantir/spark that referenced this pull request Dec 15, 2016
## What changes were proposed in this pull request?

Right now ForeachSink creates a new physical plan, so StreamExecution cannot retrieval metrics and watermark.

This PR changes ForeachSink to manually convert InternalRows to objects without creating a new plan.

## How was this patch tested?

`test("foreach with watermark: append")`.

Author: Shixiong Zhu <[email protected]>

Closes apache#16160 from zsxwing/SPARK-18721.
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
## What changes were proposed in this pull request?

Right now ForeachSink creates a new physical plan, so StreamExecution cannot retrieval metrics and watermark.

This PR changes ForeachSink to manually convert InternalRows to objects without creating a new plan.

## How was this patch tested?

`test("foreach with watermark: append")`.

Author: Shixiong Zhu <[email protected]>

Closes apache#16160 from zsxwing/SPARK-18721.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants