[SPARK-20209][SS] Execute next trigger immediately if previous batch took longer than trigger interval #17525

tdas · 2017-04-04T06:53:25Z

What changes were proposed in this pull request?

For large trigger intervals (e.g. 10 minutes), if a batch takes 11 minutes, then it will wait for 9 mins before starting the next batch. This does not make sense. The processing time based trigger policy should be to do process batches as fast as possible, but no faster than 1 in every trigger interval. If batches are taking longer than trigger interval anyways, then no point waiting extra trigger interval.

In this PR, I modified the ProcessingTimeExecutor to do so. Another minor change I did was to extract our StreamManualClock into a separate class so that it can be used outside subclasses of StreamTest. For example, ProcessingTimeExecutorSuite does not need to create any context for testing, just needs the StreamManualClock.

How was this patch tested?

Added new unit tests to comprehensively test this behavior.

SparkQA · 2017-04-04T06:57:32Z

Test build #75504 has started for PR 17525 at commit 50f0195.

brkyvz · 2017-04-04T17:23:16Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/TriggerExecutor.scala

   */
  def nextBatchTime(now: Long): Long = {
-    now / intervalMs * intervalMs + intervalMs
+    if (intervalMs == 0) now else now / intervalMs * intervalMs + intervalMs


the doc seems wrong btw, mind fixing it? nextBatchTime(nextBatchTime(0)) = 100 or am I understanding it wrong?

spoken offline, this isnt wrong.

brkyvz · 2017-04-04T17:24:19Z

sql/core/src/test/scala/org/apache/spark/sql/streaming/util/StreamManualClock.scala

+    }
+  }
+
+  def isStreamWaitingAt(time: Long): Boolean = synchronized {


mind adding docs on when these should be used?

SparkQA · 2017-04-04T18:35:04Z

Test build #75514 has finished for PR 17525 at commit 6ec51cf.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-04-04T19:15:30Z

Test build #75520 has finished for PR 17525 at commit 32b389e.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-04-04T20:47:22Z

Test build #75521 has finished for PR 17525 at commit 7614de5.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

tdas · 2017-04-04T23:55:32Z

sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala

    testStream(mapped, OutputMode.Complete)(
-      StartStream(ProcessingTime(100), triggerClock = clock),
-      AssertStreamExecThreadToWaitForClock(),
+      StartStream(ProcessingTime(1000), triggerClock = clock),


This test needed fixing because this manual clock test was configured such that first batch takes > 100 ms even though the trigger interval was 100 ms. This caused additional batch to be automatically executed without waiting for the manual clock to be advance, thus breaking certain assumptions in the test.

SparkQA · 2017-04-05T02:07:30Z

Test build #75524 has finished for PR 17525 at commit 2182a33.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class AssertStreamExecThreadIsWaitingForTime(targetTime: Long)
case class AssertClockTime(time: Long)

brkyvz · 2017-04-05T02:47:10Z

...re/src/test/scala/org/apache/spark/sql/execution/streaming/ProcessingTimeExecutorSuite.scala

+    clockIncrementInTrigger = 1500
+    manualClock.setTime(2000)
+    eventually {
+      assert(lastTriggerTime === 3500)


it was hard to understand that the test is actually testing that this value is 3500 instead of 2000. Could you add a quick comment?

brkyvz · 2017-04-05T02:50:53Z

left one minor comment, otherwise LGTM

brkyvz · 2017-04-05T03:09:47Z

Thanks for the change. It is easier to understand when things are being triggered now. LGTM.

SparkQA · 2017-04-05T05:24:51Z

Test build #75527 has finished for PR 17525 at commit 4e1d898.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

…took longer than trigger interval ## What changes were proposed in this pull request? For large trigger intervals (e.g. 10 minutes), if a batch takes 11 minutes, then it will wait for 9 mins before starting the next batch. This does not make sense. The processing time based trigger policy should be to do process batches as fast as possible, but no faster than 1 in every trigger interval. If batches are taking longer than trigger interval anyways, then no point waiting extra trigger interval. In this PR, I modified the ProcessingTimeExecutor to do so. Another minor change I did was to extract our StreamManualClock into a separate class so that it can be used outside subclasses of StreamTest. For example, ProcessingTimeExecutorSuite does not need to create any context for testing, just needs the StreamManualClock. ## How was this patch tested? Added new unit tests to comprehensively test this behavior. Author: Tathagata Das <[email protected]> Closes apache#17525 from tdas/SPARK-20209. (cherry picked from commit dad499f)

Removed delay from trigger executor

50f0195

brkyvz reviewed Apr 4, 2017

View reviewed changes

Updated docs

6ec51cf

Fix compilation

32b389e

Fix style

7614de5

tdas commented Apr 4, 2017

View reviewed changes

Fixed failing test

2182a33

brkyvz reviewed Apr 5, 2017

View reviewed changes

Improved test

4e1d898

asfgit closed this in dad499f Apr 5, 2017

[SPARK-20209][SS] Execute next trigger immediately if previous batch took longer than trigger interval #17525

[SPARK-20209][SS] Execute next trigger immediately if previous batch took longer than trigger interval #17525

Uh oh!

Conversation

tdas commented Apr 4, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Apr 4, 2017

Uh oh!

brkyvz Apr 4, 2017

Choose a reason for hiding this comment

Uh oh!

tdas Apr 4, 2017

Choose a reason for hiding this comment

Uh oh!

brkyvz Apr 4, 2017

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Apr 4, 2017

Uh oh!

SparkQA commented Apr 4, 2017

Uh oh!

SparkQA commented Apr 4, 2017

Uh oh!

tdas Apr 4, 2017

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Apr 5, 2017

Uh oh!

brkyvz Apr 5, 2017

Choose a reason for hiding this comment

Uh oh!

brkyvz commented Apr 5, 2017

Uh oh!

brkyvz commented Apr 5, 2017

Uh oh!

SparkQA commented Apr 5, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tdas commented Apr 4, 2017 •

edited

Loading