[SPARK-9805] [MLLIB] [PYTHON] [STREAMING] Added _eventually for ml streaming pyspark tests #8087

jkbradley · 2015-08-10T23:34:05Z

Recently, PySpark ML streaming tests have been flaky, most likely because of the batches not being processed in time. Proposal: Replace the use of _ssc_wait (which waits for a fixed amount of time) with a method which waits for a fixed amount of time but can terminate early based on a termination condition method. With this, we can extend the waiting period (to make tests less flaky) but also stop early when possible (making tests faster on average, which I verified locally).

CC: @mengxr @tdas @freeman-lab

tdas · 2015-08-10T23:49:26Z

python/pyspark/mllib/tests.py

There is still a slight possibility that between the last time term_check() is called in the _ssc_wait_checked, and next time its called in this method, another batch may have been processed, which which fail the test unnecessarily. So a better approach would be for the _ssc_wait_checked method to return True if the term_check() has succeeded within the timeout, otherwise return false. Then there is not need to check term_check() once again.

For these tests, they should pass whenever all batches have been processed, so the current setup should be safe. I'm actually thinking of copying the checks so that assertions print out more useful error messages. (I don't see a great way to avoid copying the checks if I want them for both early stopping & useful error messages.)

SparkQA · 2015-08-11T00:03:31Z

Test build #40354 has finished for PR 8087 at commit 3fb7c0c.

This patch fails Python style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-08-11T01:14:35Z

Test build #40357 has finished for PR 8087 at commit ef49b2b.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

jkbradley · 2015-08-11T18:52:16Z

Jenkins test this please

SparkQA · 2015-08-11T19:10:20Z

Test build #40495 has finished for PR 8087 at commit ff1ee1b.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-08-11T20:42:29Z

Test build #40502 has finished for PR 8087 at commit afbe8b1.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jkbradley · 2015-08-11T21:23:13Z

Yay it passed! If this looks reasonable, I'll make similar changes for the other streaming ML pyspark tests.

freeman-lab · 2015-08-11T21:57:59Z

Nice! I think this is a solid strategy. Maybe in the next round of changes make that 20.0, which will presumably be used throughout, a var shared by all the tests?

tdas · 2015-08-11T22:06:44Z

I think you can make a generic equivalent of scalatest eventually in python. That takes care of failing with timeout and providing meaningful last error message.

def eventually(timeout, condition, errorMessage)

# condition: function that must return boolean
# errorMessage: can be a string, or a function that returns a string, it invoked if there is a timeout.

Then thats solves the problem I alluded to earlier about a possible race condition.

jkbradley · 2015-08-12T00:55:12Z

@tdas Sure, I can do that. I don't think the race condition matters for ML tests (or if it does, then the test was written incorrectly), but that does clarify semantics. I guess I'll have to duplicate the check code no matter what to get nice error messages.

jkbradley · 2015-08-12T02:28:36Z

Actually, I'm going to switch the design to instead:

accept a single check method which will use assertions
catch AssertionErrors when deciding whether we can terminate
throw the last caught AssertionError upon timeout

That will allow us to (a) avoid copying the set of checks and (b) take advantage of the many assertion variants, including approximate equality.

AFAIK, the overhead in catching errors should be negligible compared to the time for the tests. (Correct me if I'm wrong here.)

SparkQA · 2015-08-12T03:10:15Z

Test build #40578 has finished for PR 8087 at commit 48f43c8.

This patch fails Python style tests.
This patch merges cleanly.
This patch adds no public classes.

jkbradley · 2015-08-12T03:46:10Z

Jenkins test this please

mengxr · 2015-08-12T04:24:19Z

What if condition requires at least one batch to work correctly? This is not the case for streaming ML algorithms, but I'm not sure for other streaming unit tests.

jkbradley · 2015-08-12T06:25:29Z

Yeah, I should document that. I made sure to make condition() work for those cases (e.g., checking result array length instead of the values in the result array which might not yet exist).

SparkQA · 2015-08-12T06:53:04Z

Test build #40598 has finished for PR 8087 at commit 5e49327.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-08-12T07:50:48Z

Test build #1474 has finished for PR 8087 at commit 3717fc4.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

jkbradley · 2015-08-12T17:51:43Z

Working on improvements...

jkbradley · 2015-08-12T22:31:11Z

OK everyone, I think that should fix things...but we'll wait and see. I changed the logic of eventually to support the 2 types of tests: ones which have a simple condition to check and cannot stop early, and ones which can stop early if all batches have been processed.

SparkQA · 2015-08-12T22:35:24Z

Test build #40678 has finished for PR 8087 at commit 002e838.

This patch fails Python style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-08-12T23:27:21Z

Test build #40688 has finished for PR 8087 at commit 2897833.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

mengxr · 2015-08-13T04:41:42Z

LGTM. @tdas Do you want to make a final pass?

…sec from 20

jkbradley · 2015-08-13T21:56:49Z

Increasing timing in the spirit of robustness...and testing again for fun.

jkbradley · 2015-08-13T21:57:01Z

But yeah @tdas I'll wait for your final OK

SparkQA · 2015-08-13T22:29:02Z

Test build #40816 has finished for PR 8087 at commit a4c3f1e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

tdas · 2015-08-14T20:41:39Z

LGTM!

jkbradley · 2015-08-16T01:47:46Z

OK, I'll merge this with master and branch-1.5 then. Thanks for reviewing, everyone!

…reaming pyspark tests Recently, PySpark ML streaming tests have been flaky, most likely because of the batches not being processed in time. Proposal: Replace the use of _ssc_wait (which waits for a fixed amount of time) with a method which waits for a fixed amount of time but can terminate early based on a termination condition method. With this, we can extend the waiting period (to make tests less flaky) but also stop early when possible (making tests faster on average, which I verified locally). CC: mengxr tdas freeman-lab Author: Joseph K. Bradley <[email protected]> Closes #8087 from jkbradley/streaming-ml-tests. (cherry picked from commit 1db7179) Signed-off-by: Joseph K. Bradley <[email protected]>

…reaming pyspark tests Recently, PySpark ML streaming tests have been flaky, most likely because of the batches not being processed in time. Proposal: Replace the use of _ssc_wait (which waits for a fixed amount of time) with a method which waits for a fixed amount of time but can terminate early based on a termination condition method. With this, we can extend the waiting period (to make tests less flaky) but also stop early when possible (making tests faster on average, which I verified locally). CC: mengxr tdas freeman-lab Author: Joseph K. Bradley <[email protected]> Closes apache#8087 from jkbradley/streaming-ml-tests.

jkbradley added 2 commits August 10, 2015 16:27

added _ssc_wait_checked for ml streaming tests

421e68d

reverted small fix to make wip review easier

3c171b0

tdas reviewed Aug 10, 2015
View reviewed changes

something like this

3fb7c0c

style fix

ef49b2b

fix to termCheck

ff1ee1b

small fix

afbe8b1

removed ssc_wait and replaced it with eventually

48f43c8

jkbradley changed the title ~~[WIP] [SPARK-9805] [MLLIB] [PYTHON] [STREAMING] Added _ssc_wait_checked for ml streaming pyspark tests~~ [SPARK-9805] [MLLIB] [PYTHON] [STREAMING] Added _ssc_wait_checked for ml streaming pyspark tests Aug 12, 2015

style fixes

3717fc4

warning about condition

5e49327

probably improved tests

002e838

python style fixes

2897833

jkbradley changed the title ~~[SPARK-9805] [MLLIB] [PYTHON] [STREAMING] Added _ssc_wait_checked for ml streaming pyspark tests~~ [SPARK-9805] [MLLIB] [PYTHON] [STREAMING] Added _eventually for ml streaming pyspark tests Aug 13, 2015

increased default pyspark ml streaming test eventually timeout to 30 …

a4c3f1e

…sec from 20

asfgit closed this in 1db7179 Aug 16, 2015

jkbradley deleted the streaming-ml-tests branch August 16, 2015 01:53

[SPARK-9805] [MLLIB] [PYTHON] [STREAMING] Added _eventually for ml streaming pyspark tests #8087

[SPARK-9805] [MLLIB] [PYTHON] [STREAMING] Added _eventually for ml streaming pyspark tests #8087

Uh oh!

Conversation

jkbradley commented Aug 10, 2015

Uh oh!

tdas Aug 10, 2015

Choose a reason for hiding this comment

Uh oh!

jkbradley Aug 10, 2015

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Aug 11, 2015

Uh oh!

SparkQA commented Aug 11, 2015

Uh oh!

jkbradley commented Aug 11, 2015

Uh oh!

SparkQA commented Aug 11, 2015

Uh oh!

SparkQA commented Aug 11, 2015

Uh oh!

jkbradley commented Aug 11, 2015

Uh oh!

freeman-lab commented Aug 11, 2015

Uh oh!

tdas commented Aug 11, 2015

Uh oh!

jkbradley commented Aug 12, 2015

Uh oh!

jkbradley commented Aug 12, 2015

Uh oh!

SparkQA commented Aug 12, 2015

Uh oh!

jkbradley commented Aug 12, 2015

Uh oh!

mengxr commented Aug 12, 2015

Uh oh!

jkbradley commented Aug 12, 2015

Uh oh!

SparkQA commented Aug 12, 2015

Uh oh!

SparkQA commented Aug 12, 2015

Uh oh!

jkbradley commented Aug 12, 2015

Uh oh!

jkbradley commented Aug 12, 2015

Uh oh!

SparkQA commented Aug 12, 2015

Uh oh!

SparkQA commented Aug 12, 2015

Uh oh!

mengxr commented Aug 13, 2015

Uh oh!

jkbradley commented Aug 13, 2015

Uh oh!

jkbradley commented Aug 13, 2015

Uh oh!

SparkQA commented Aug 13, 2015

Uh oh!

tdas commented Aug 14, 2015

Uh oh!

jkbradley commented Aug 16, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants