Skip to content

Commit 8503aa3

Browse files
viiryaHyukjinKwon
authored andcommitted
[SPARK-26646][TEST][PYSPARK] Fix flaky test: pyspark.mllib.tests.test_streaming_algorithms StreamingLogisticRegressionWithSGDTests.test_training_and_prediction
## What changes were proposed in this pull request? The test pyspark.mllib.tests.test_streaming_algorithms StreamingLogisticRegressionWithSGDTests.test_training_and_prediction looks sometimes flaky. ``` ====================================================================== FAIL: test_training_and_prediction (pyspark.mllib.tests.test_streaming_algorithms.StreamingLogisticRegressionWithSGDTests) Test that the model improves on toy data with no. of batches ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/mllib/tests/test_streaming_algorithms.py", line 367, in test_training_and_prediction self._eventually(condition, timeout=60.0) File "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/mllib/tests/test_streaming_algorithms.py", line 69, in _eventually lastValue = condition() File "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/mllib/tests/test_streaming_algorithms.py", line 362, in condition self.assertGreater(errors[1] - errors[-1], 0.3) AssertionError: -0.070000000000000062 not greater than 0.3 ---------------------------------------------------------------------- Ran 13 tests in 198.327s FAILED (failures=1, skipped=1) Had test failures in pyspark.mllib.tests.test_streaming_algorithms with python3.4; see logs ``` The predict stream can possibly be consumed to the end before the input stream. When it happens, the model improvement is not high as expected and causes test failed. This patch tries to increase number of batches of streams. This won't increase test time because we have a timeout there. ## How was this patch tested? Manually test. Closes #23586 from viirya/SPARK-26646. Authored-by: Liang-Chi Hsieh <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
1 parent 34db5f5 commit 8503aa3

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

python/pyspark/mllib/tests/test_streaming_algorithms.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -334,7 +334,7 @@ def test_training_and_prediction(self):
334334
"""Test that the model improves on toy data with no. of batches"""
335335
input_batches = [
336336
self.sc.parallelize(self.generateLogisticInput(0, 1.5, 100, 42 + i))
337-
for i in range(20)]
337+
for i in range(40)]
338338
predict_batches = [
339339
b.map(lambda lp: (lp.label, lp.features)) for b in input_batches]
340340

0 commit comments

Comments
 (0)