[SPARK-21523][ML] update breeze to 0.13.2 for an emergency bugfix in strong wolfe line search #18797

WeichenXu123 · 2017-08-01T17:20:41Z

What changes were proposed in this pull request?

Update breeze to 0.13.1 for an emergency bugfix in strong wolfe line search
scalanlp/breeze#651

How was this patch tested?

N/A

srowen · 2017-08-01T17:46:20Z

@sethah mentioned the potential impact of this so yeah I agree we should get this in and to 2.2

sethah · 2017-08-01T17:52:15Z

Can you change the title? Upgrade to 0.13.2.

SparkQA · 2017-08-01T19:45:36Z

Test build #80125 has finished for PR 18797 at commit 00a1287.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2017-08-02T09:14:25Z

Ah, looks like some legitimate failures related to the change we pulled in. Probably just needs some test adjustments

WeichenXu123 · 2017-08-02T17:56:53Z

Strange thing, the code failed this require at
https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/StrongWolfe.scala#L73
in the three case:
org.apache.spark.ml.regression.AFTSurvivalRegressionSuite.should support all NumericType labels, and not support other types
org.apache.spark.ml.regression.AFTSurvivalRegressionSuite.should support all NumericType censors, and not support other types
org.apache.spark.mllib.optimization.LBFGSSuite.The convergence criteria should work as we expect

@srowen @sethah Do you know the reason ? The require is almost impossible to fail. Is there some bug in the three testcases?

srowen · 2017-08-02T21:02:34Z

The actual failure in the first two cases looks like it must be related:

sbt.ForkMain$ForkError: java.lang.IllegalArgumentException: requirement failed: init value should <= bound
	at scala.Predef$.require(Predef.scala:224)
	at breeze.optimize.StrongWolfeLineSearch.minimizeWithBound(StrongWolfe.scala:73)
	at breeze.optimize.StrongWolfeLineSearch.minimize(StrongWolfe.scala:62)
	at breeze.optimize.LBFGS.determineStepSize(LBFGS.scala:76)
	at breeze.optimize.LBFGS.determineStepSize(LBFGS.scala:39)
	at breeze.optimize.FirstOrderMinimizer$$anonfun$infiniteIterations$1.apply(FirstOrderMinimizer.scala:64)
	at breeze.optimize.FirstOrderMinimizer$$anonfun$infiniteIterations$1.apply(FirstOrderMinimizer.scala:62)
	at scala.collection.Iterator$$anon$7.next(Iterator.scala:129)
	at breeze.util.IteratorImplicits$RichIterator$$anon$2.next(Implicits.scala:71)
	at org.apache.spark.ml.regression.AFTSurvivalRegression.fit(AFTSurvivalRegression.scala:263)
	at org.apache.spark.ml.regression.AFTSurvivalRegression.fit(AFTSurvivalRegression.scala:128)

The last one looks like it's just expecting a different convergence sequence. I don't know much about it but didn't seem too odd at first glance. If I have time later this week I'll try to rerun locally and see if the test fixes look easy.

WeichenXu123 · 2017-08-02T21:18:06Z

@srowen Yeah, the third case is another problem (I think we can simply change the assert statement to be assert(lossLBFGS3.length == 6) in testcase)
I am curious about the first two cases, why trigger the require fail ? By default the bound is always Double.infinite and the require will always pass.
I am busy currently and haven't check deeper. So if you can tell the possible reason can help saving time.

srowen · 2017-08-03T10:36:57Z

The only number that is <= Double.PositiveInfinity is Double.NaN, because it has no ordering at all with respect to anything. So init must be NaN somehow.

It's called from LBFGS in Breeze, where the value is if(state.iter == 0.0) 1.0/norm(dir) else 1.0, that should only happen if norm(dir) is NaN, which should only happen if the dir vector has a NaN element. And then so on, but I'm not seeing how the arguments from the Spark code cause this. The initial params are all 0.

It might still be a Breeze issue that's just now uncovered, but, haven't proven that yet.

srowen · 2017-08-03T11:47:16Z

I've figured out the problem, and pretty sure it's a problem in the AFT test that was hidden until now. It runs AFTSurvivlaRegression on this input:

+--------+-----+------+------+
|features|label|censor|weight|
+--------+-----+------+------+
|   [0.0]|  0.0|   0.0|   1.0|
|   [1.0]|  1.0|   0.0|   1.0|
|   [2.0]|  2.0|   0.0|   1.0|
|   [3.0]|  3.0|   0.0|   1.0|
|   [4.0]|  4.0|   0.0|   0.0|
+--------+-----+------+------+

The problem is one label is 0, but this is interpreted as a time to failure (I believe?). Somewhere the code takes the log of this value, gets -Infinity, quickly gets NaN in an expression, and eventually causes the error per above.

I think we can just modify the test but wanted to see if that makes sense to @yanboliang @zhengruifeng @BenFradet who have touched the AFT code?

WeichenXu123 · 2017-08-03T17:46:49Z

Thanks! Waiting AFT testcode author to figure out how to modify the testcase.

BenFradet · 2017-08-03T18:33:15Z

@srowen there shouldn't be any issue with removing the first row of the test data afaict.

srowen · 2017-08-03T18:35:27Z

Yeah, the only issue is that the test set is generated and used in several tests. Maybe we can just see if changing it works for all callers.

srowen · 2017-08-04T13:18:44Z

aft.txt

@WeichenXu123 it does appear that the tests pass with the following tiny changes -- I can make a proper PR on your branch if needed, but I'm lazy and just attached the diff since it's 3 lines.

WeichenXu123 · 2017-08-04T17:35:33Z

@srowen Great! thanks!

srowen · 2017-08-04T17:41:40Z

@WeichenXu123 there is one more change you'll need, in AFTSurvivalRegressionSuite.scala to also remove a datum with label 0

srowen

Pending tests, looks OK. This change passed MLlib tests for me.

SparkQA · 2017-08-04T19:43:17Z

Test build #80257 has finished for PR 18797 at commit af82dfc.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-08-04T19:49:36Z

Test build #80258 has finished for PR 18797 at commit be57175.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-08-04T20:19:52Z

Test build #80259 has finished for PR 18797 at commit 75c6dd2.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2017-08-05T10:32:34Z

@WeichenXu123 looks like we need one other tiny tweak to the Python equivalents. There I think it's easiest to convert a 0 to a tiny value:

diff --git a/python/pyspark/ml/regression.py b/python/pyspark/ml/regression.py
index f0ff7a5f59..37fd48c289 100644
--- a/python/pyspark/ml/regression.py
+++ b/python/pyspark/ml/regression.py
@@ -1118,7 +1118,7 @@ class AFTSurvivalRegression(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredi
     >>> from pyspark.ml.linalg import Vectors
     >>> df = spark.createDataFrame([
     ...     (1.0, Vectors.dense(1.0), 1.0),
-    ...     (0.0, Vectors.sparse(1, [], []), 0.0)], ["label", "features", "censor"])
+    ...     (1e-40, Vectors.sparse(1, [], []), 0.0)], ["label", "features", "censor"])
     >>> aftsr = AFTSurvivalRegression()
     >>> model = aftsr.fit(df)
     >>> model.predict(Vectors.dense(6.3))
@@ -1126,12 +1126,12 @@ class AFTSurvivalRegression(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredi
     >>> model.predictQuantiles(Vectors.dense(6.3))
     DenseVector([0.0101, 0.0513, 0.1054, 0.2877, 0.6931, 1.3863, 2.3026, 2.9957, 4.6052])
     >>> model.transform(df).show()
-    +-----+---------+------+----------+
-    |label| features|censor|prediction|
-    +-----+---------+------+----------+
-    |  1.0|    [1.0]|   1.0|       1.0|
-    |  0.0|(1,[],[])|   0.0|       1.0|
-    +-----+---------+------+----------+
+    +-------+---------+------+----------+
+    |  label| features|censor|prediction|
+    +-------+---------+------+----------+
+    |    1.0|    [1.0]|   1.0|       1.0|
+    |1.0E-40|(1,[],[])|   0.0|       1.0|
+    +-------+---------+------+----------+
     ...
     >>> aftsr_path = temp_path + "/aftsr"
     >>> aftsr.save(aftsr_path)

yanboliang · 2017-08-05T13:38:53Z

@srowen @WeichenXu123
It make sense to remove the datum with label 0, as we compute log(label) which may lead to -Infinity and eventually causes the error. Thanks for catching this.
BTW, what do you think add check when fitting the AFT survival regression model?

def add(data: AFTPoint): this.type = {
    val xi = data.features
    val ti = data.label
    val delta = data.censor

    require(ti > 0.0, "The lifetime or label should be  greater than 0.")
    ......
}

yanboliang · 2017-08-05T13:46:10Z

mllib/src/test/scala/org/apache/spark/mllib/optimization/LBFGSSuite.scala

If no theoretically guaranteed, why we need to keep this test? I remember we change here multiple times when we did breeze upgrade. What do you think about just removing this line? We never check the number of iterations at other test suites. @srowen @WeichenXu123

OK by me. You could also make it a range. Or something really basic like "> 0".

srowen · 2017-08-05T14:07:23Z

Yes it seems like that should be checked somewhere. It might be rational to include it here as a double check that the newly uncovered issue that needs to be fixed to upgrade breeze is gone. But could happen in another change too.

SparkQA · 2017-08-08T01:52:53Z

Test build #80369 has finished for PR 18797 at commit fbf1677.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-08-08T07:04:51Z

Test build #80381 has finished for PR 18797 at commit 5063758.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-08-08T10:43:46Z

Test build #3884 has finished for PR 18797 at commit 5063758.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

…strong wolfe line search ## What changes were proposed in this pull request? Update breeze to 0.13.1 for an emergency bugfix in strong wolfe line search scalanlp/breeze#651 ## How was this patch tested? N/A Author: WeichenXu <[email protected]> Closes #18797 from WeichenXu123/update-breeze. (cherry picked from commit b35660d) Signed-off-by: Yanbo Liang <[email protected]>

yanboliang · 2017-08-09T06:45:53Z

Merged into master and branch-2.2. Thanks for all.

…strong wolfe line search ## What changes were proposed in this pull request? Update breeze to 0.13.1 for an emergency bugfix in strong wolfe line search scalanlp/breeze#651 ## How was this patch tested? N/A Author: WeichenXu <[email protected]> Closes apache#18797 from WeichenXu123/update-breeze. (cherry picked from commit b35660d) Signed-off-by: Yanbo Liang <[email protected]>

…strong wolfe line search ## What changes were proposed in this pull request? Update breeze to 0.13.1 for an emergency bugfix in strong wolfe line search scalanlp/breeze#651 ## How was this patch tested? N/A Author: WeichenXu <[email protected]> Closes apache#18797 from WeichenXu123/update-breeze. (cherry picked from commit b35660d)

WeichenXu123 changed the title ~~[SPARK-21523] update breeze to 0.13.1 for an emergency bugfix in strong wolfe line search~~ [SPARK-21523][ML] update breeze to 0.13.1 for an emergency bugfix in strong wolfe line search Aug 1, 2017

WeichenXu123 changed the title ~~[SPARK-21523][ML] update breeze to 0.13.1 for an emergency bugfix in strong wolfe line search~~ [SPARK-21523][ML] update breeze to 0.13.2 for an emergency bugfix in strong wolfe line search Aug 1, 2017

srowen approved these changes Aug 2, 2017

View reviewed changes

WeichenXu123 force-pushed the update-breeze branch from af82dfc to be57175 Compare August 4, 2017 17:37

srowen approved these changes Aug 4, 2017

View reviewed changes

yanboliang reviewed Aug 5, 2017

View reviewed changes

WeichenXu123 added 2 commits August 7, 2017 22:54

init pr

23e2ed7

fix testcase

425fd41

WeichenXu123 added 2 commits August 7, 2017 22:54

fix aft testcase

c9ae023

update aft py testcase

5063758

WeichenXu123 force-pushed the update-breeze branch from fbf1677 to 5063758 Compare August 8, 2017 05:55

srowen approved these changes Aug 8, 2017

View reviewed changes

asfgit closed this in b35660d Aug 9, 2017

WeichenXu123 mentioned this pull request Jan 31, 2018

[SPARK-23112][DOC] Update ML migration guide with breaking and behavior changes. #20421

Closed

WeichenXu123 deleted the update-breeze branch January 31, 2018 18:37

[SPARK-21523][ML] update breeze to 0.13.2 for an emergency bugfix in strong wolfe line search #18797

[SPARK-21523][ML] update breeze to 0.13.2 for an emergency bugfix in strong wolfe line search #18797

Uh oh!

Conversation

WeichenXu123 commented Aug 1, 2017

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

srowen commented Aug 1, 2017

Uh oh!

sethah commented Aug 1, 2017

Uh oh!

SparkQA commented Aug 1, 2017

Uh oh!

srowen commented Aug 2, 2017

Uh oh!

WeichenXu123 commented Aug 2, 2017

Uh oh!

srowen commented Aug 2, 2017

Uh oh!

WeichenXu123 commented Aug 2, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

srowen commented Aug 3, 2017

Uh oh!

srowen commented Aug 3, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

WeichenXu123 commented Aug 3, 2017

Uh oh!

BenFradet commented Aug 3, 2017

Uh oh!

srowen commented Aug 3, 2017

Uh oh!

srowen commented Aug 4, 2017

Uh oh!

WeichenXu123 commented Aug 4, 2017

Uh oh!

srowen commented Aug 4, 2017

Uh oh!

srowen left a comment

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Aug 4, 2017

Uh oh!

SparkQA commented Aug 4, 2017

Uh oh!

SparkQA commented Aug 4, 2017

Uh oh!

srowen commented Aug 5, 2017

Uh oh!

yanboliang commented Aug 5, 2017

Uh oh!

yanboliang Aug 5, 2017

Choose a reason for hiding this comment

Uh oh!

srowen Aug 5, 2017

Choose a reason for hiding this comment

Uh oh!

srowen commented Aug 5, 2017

Uh oh!

SparkQA commented Aug 8, 2017

Uh oh!

SparkQA commented Aug 8, 2017

Uh oh!

SparkQA commented Aug 8, 2017

Uh oh!

yanboliang commented Aug 9, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

WeichenXu123 commented Aug 2, 2017 •

edited

Loading

srowen commented Aug 3, 2017 •

edited

Loading