Skip to content

Conversation

@tdas
Copy link
Contributor

@tdas tdas commented Apr 4, 2017

What changes were proposed in this pull request?

For large trigger intervals (e.g. 10 minutes), if a batch takes 11 minutes, then it will wait for 9 mins before starting the next batch. This does not make sense. The processing time based trigger policy should be to do process batches as fast as possible, but no faster than 1 in every trigger interval. If batches are taking longer than trigger interval anyways, then no point waiting extra trigger interval.

In this PR, I modified the ProcessingTimeExecutor to do so. Another minor change I did was to extract our StreamManualClock into a separate class so that it can be used outside subclasses of StreamTest. For example, ProcessingTimeExecutorSuite does not need to create any context for testing, just needs the StreamManualClock.

How was this patch tested?

Added new unit tests to comprehensively test this behavior.

@SparkQA
Copy link

SparkQA commented Apr 4, 2017

Test build #75504 has started for PR 17525 at commit 50f0195.

*/
def nextBatchTime(now: Long): Long = {
now / intervalMs * intervalMs + intervalMs
if (intervalMs == 0) now else now / intervalMs * intervalMs + intervalMs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the doc seems wrong btw, mind fixing it? nextBatchTime(nextBatchTime(0)) = 100 or am I understanding it wrong?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spoken offline, this isnt wrong.

}
}

def isStreamWaitingAt(time: Long): Boolean = synchronized {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mind adding docs on when these should be used?

@SparkQA
Copy link

SparkQA commented Apr 4, 2017

Test build #75514 has finished for PR 17525 at commit 6ec51cf.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 4, 2017

Test build #75520 has finished for PR 17525 at commit 32b389e.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 4, 2017

Test build #75521 has finished for PR 17525 at commit 7614de5.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

testStream(mapped, OutputMode.Complete)(
StartStream(ProcessingTime(100), triggerClock = clock),
AssertStreamExecThreadToWaitForClock(),
StartStream(ProcessingTime(1000), triggerClock = clock),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test needed fixing because this manual clock test was configured such that first batch takes > 100 ms even though the trigger interval was 100 ms. This caused additional batch to be automatically executed without waiting for the manual clock to be advance, thus breaking certain assumptions in the test.

@SparkQA
Copy link

SparkQA commented Apr 5, 2017

Test build #75524 has finished for PR 17525 at commit 2182a33.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class AssertStreamExecThreadIsWaitingForTime(targetTime: Long)
  • case class AssertClockTime(time: Long)

clockIncrementInTrigger = 1500
manualClock.setTime(2000)
eventually {
assert(lastTriggerTime === 3500)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was hard to understand that the test is actually testing that this value is 3500 instead of 2000. Could you add a quick comment?

@brkyvz
Copy link
Contributor

brkyvz commented Apr 5, 2017

left one minor comment, otherwise LGTM

@brkyvz
Copy link
Contributor

brkyvz commented Apr 5, 2017

Thanks for the change. It is easier to understand when things are being triggered now. LGTM.

@SparkQA
Copy link

SparkQA commented Apr 5, 2017

Test build #75527 has finished for PR 17525 at commit 4e1d898.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@asfgit asfgit closed this in dad499f Apr 5, 2017
cenyuhai pushed a commit to cenyuhai/spark that referenced this pull request Oct 8, 2017
…took longer than trigger interval

## What changes were proposed in this pull request?

For large trigger intervals (e.g. 10 minutes), if a batch takes 11 minutes, then it will wait for 9 mins before starting the next batch. This does not make sense. The processing time based trigger policy should be to do process batches as fast as possible, but no faster than 1 in every trigger interval. If batches are taking longer than trigger interval anyways, then no point waiting extra trigger interval.

In this PR, I modified the ProcessingTimeExecutor to do so. Another minor change I did was to extract our StreamManualClock into a separate class so that it can be used outside subclasses of StreamTest. For example, ProcessingTimeExecutorSuite does not need to create any context for testing, just needs the StreamManualClock.

## How was this patch tested?
Added new unit tests to comprehensively test this behavior.

Author: Tathagata Das <[email protected]>

Closes apache#17525 from tdas/SPARK-20209.

(cherry picked from commit dad499f)
cenyuhai pushed a commit to cenyuhai/spark that referenced this pull request Oct 8, 2017
…took longer than trigger interval

## What changes were proposed in this pull request?

For large trigger intervals (e.g. 10 minutes), if a batch takes 11 minutes, then it will wait for 9 mins before starting the next batch. This does not make sense. The processing time based trigger policy should be to do process batches as fast as possible, but no faster than 1 in every trigger interval. If batches are taking longer than trigger interval anyways, then no point waiting extra trigger interval.

In this PR, I modified the ProcessingTimeExecutor to do so. Another minor change I did was to extract our StreamManualClock into a separate class so that it can be used outside subclasses of StreamTest. For example, ProcessingTimeExecutorSuite does not need to create any context for testing, just needs the StreamManualClock.

## How was this patch tested?
Added new unit tests to comprehensively test this behavior.

Author: Tathagata Das <[email protected]>

Closes apache#17525 from tdas/SPARK-20209.

(cherry picked from commit dad499f)
cenyuhai pushed a commit to cenyuhai/spark that referenced this pull request Oct 8, 2017
…took longer than trigger interval

## What changes were proposed in this pull request?

For large trigger intervals (e.g. 10 minutes), if a batch takes 11 minutes, then it will wait for 9 mins before starting the next batch. This does not make sense. The processing time based trigger policy should be to do process batches as fast as possible, but no faster than 1 in every trigger interval. If batches are taking longer than trigger interval anyways, then no point waiting extra trigger interval.

In this PR, I modified the ProcessingTimeExecutor to do so. Another minor change I did was to extract our StreamManualClock into a separate class so that it can be used outside subclasses of StreamTest. For example, ProcessingTimeExecutorSuite does not need to create any context for testing, just needs the StreamManualClock.

## How was this patch tested?
Added new unit tests to comprehensively test this behavior.

Author: Tathagata Das <[email protected]>

Closes apache#17525 from tdas/SPARK-20209.

(cherry picked from commit dad499f)
cenyuhai pushed a commit to cenyuhai/spark that referenced this pull request Oct 8, 2017
…took longer than trigger interval

## What changes were proposed in this pull request?

For large trigger intervals (e.g. 10 minutes), if a batch takes 11 minutes, then it will wait for 9 mins before starting the next batch. This does not make sense. The processing time based trigger policy should be to do process batches as fast as possible, but no faster than 1 in every trigger interval. If batches are taking longer than trigger interval anyways, then no point waiting extra trigger interval.

In this PR, I modified the ProcessingTimeExecutor to do so. Another minor change I did was to extract our StreamManualClock into a separate class so that it can be used outside subclasses of StreamTest. For example, ProcessingTimeExecutorSuite does not need to create any context for testing, just needs the StreamManualClock.

## How was this patch tested?
Added new unit tests to comprehensively test this behavior.

Author: Tathagata Das <[email protected]>

Closes apache#17525 from tdas/SPARK-20209.

(cherry picked from commit dad499f)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants