Skip to content

Conversation

@brkyvz
Copy link
Contributor

@brkyvz brkyvz commented Jan 27, 2017

What changes were proposed in this pull request?

In StructuredStreaming, if a new trigger was skipped because no new data arrived, we suddenly report nothing for the metrics stateOperator. We could however easily report the metrics from lastExecution to ensure continuity of metrics.

How was this patch tested?

Regression test in StreamingQueryStatusAndProgressSuite

// Should emit new progresses every 10 ms, but we could be facing a slow Jenkins
eventually(timeout(1 minute)) {
val nextProgress = query.lastProgress
assert(nextProgress.timestamp !== progress.timestamp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explicitly verify that this progress has no data?


/**
* Extract statistics about stateful operators from the executed query plan.
* SPARK-19378: Still report stateOperator metrics even though no data was processed while
Copy link
Contributor

@tdas tdas Jan 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does not make sense to have jira numbers in a methods scala docs. Just state what it does.

Copy link
Contributor

@tdas tdas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach looks good, but need some cleanup and better test.


/**
* Extract statistics about event time from the executed query plan.
* SPARK-19378: Still report eventTime metrics even though no data was processed while
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

val nextProgress = query.lastProgress
assert(nextProgress.timestamp !== progress.timestamp)
assert(progress.eventTime.size() > 1)
assert(progress.stateOperators.length > 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are not verifying that that the metric values are as expected.

* SPARK-19378: Still report eventTime metrics even though no data was processed while
* reporting progress.
*/
private def extractEventTimeStats(watermarkTs: Map[String, String]): Map[String, String] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it does not make sense for this method to take this watermarkTs as a param. its not extracting event time states from watermark ts, its just appending it. Then why not just return empty map, and do the appending outside? Or do the extraction of watermark inside the function as well.

@brkyvz
Copy link
Contributor Author

brkyvz commented Jan 27, 2017

@tdas Addressed

@SparkQA
Copy link

SparkQA commented Jan 27, 2017

Test build #72061 has finished for PR 16716 at commit 55e3d36.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 27, 2017

Test build #72059 has finished for PR 16716 at commit 884a789.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val nextProgress = query.lastProgress
assert(nextProgress.timestamp !== progress.timestamp)
assert(nextProgress.numInputRows === 0)
assert(nextProgress.eventTime.get("min") === "2017-01-26 01:00:00")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not make sense. if there is no data in the last trigger, the min, max, avg timestamps cannot be different.
and what about the watermark?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh shoot. I should definitely leave those out because they are trigger specific right?
I should only keep the stateOperator part

@SparkQA
Copy link

SparkQA commented Jan 27, 2017

Test build #72064 has finished for PR 16716 at commit e23de4a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

assert(nextProgress.timestamp !== progress.timestamp)
assert(nextProgress.numInputRows === 0)
assert(nextProgress.stateOperators.head.numRowsTotal === 2)
assert(nextProgress.stateOperators.head.numRowsTotal === 2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this line twice?

@brkyvz
Copy link
Contributor Author

brkyvz commented Jan 30, 2017

@tdas addressed

@SparkQA
Copy link

SparkQA commented Jan 30, 2017

Test build #72168 has finished for PR 16716 at commit 0b9fcfc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@asfgit asfgit closed this in 081b7ad Feb 1, 2017
asfgit pushed a commit that referenced this pull request Feb 1, 2017
…trics even if there is no new data in trigger

In StructuredStreaming, if a new trigger was skipped because no new data arrived, we suddenly report nothing for the metrics `stateOperator`. We could however easily report the metrics from `lastExecution` to ensure continuity of metrics.

Regression test in `StreamingQueryStatusAndProgressSuite`

Author: Burak Yavuz <[email protected]>

Closes #16716 from brkyvz/state-agg.

(cherry picked from commit 081b7ad)
Signed-off-by: Tathagata Das <[email protected]>
cmonkey pushed a commit to cmonkey/spark that referenced this pull request Feb 15, 2017
…trics even if there is no new data in trigger

## What changes were proposed in this pull request?

In StructuredStreaming, if a new trigger was skipped because no new data arrived, we suddenly report nothing for the metrics `stateOperator`. We could however easily report the metrics from `lastExecution` to ensure continuity of metrics.

## How was this patch tested?

Regression test in `StreamingQueryStatusAndProgressSuite`

Author: Burak Yavuz <[email protected]>

Closes apache#16716 from brkyvz/state-agg.
@brkyvz brkyvz deleted the state-agg branch February 3, 2019 20:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants