Skip to content

Conversation

@WeichenXu123
Copy link
Contributor

What changes were proposed in this pull request?

Update breeze to 0.13.1 for an emergency bugfix in strong wolfe line search
scalanlp/breeze#651

How was this patch tested?

N/A

@WeichenXu123 WeichenXu123 changed the title [SPARK-21523] update breeze to 0.13.1 for an emergency bugfix in strong wolfe line search [SPARK-21523][ML] update breeze to 0.13.1 for an emergency bugfix in strong wolfe line search Aug 1, 2017
@srowen
Copy link
Member

srowen commented Aug 1, 2017

@sethah mentioned the potential impact of this so yeah I agree we should get this in and to 2.2

@sethah
Copy link
Contributor

sethah commented Aug 1, 2017

Can you change the title? Upgrade to 0.13.2.

@WeichenXu123 WeichenXu123 changed the title [SPARK-21523][ML] update breeze to 0.13.1 for an emergency bugfix in strong wolfe line search [SPARK-21523][ML] update breeze to 0.13.2 for an emergency bugfix in strong wolfe line search Aug 1, 2017
@SparkQA
Copy link

SparkQA commented Aug 1, 2017

Test build #80125 has finished for PR 18797 at commit 00a1287.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Aug 2, 2017

Ah, looks like some legitimate failures related to the change we pulled in. Probably just needs some test adjustments

@WeichenXu123
Copy link
Contributor Author

Strange thing, the code failed this require at
https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/StrongWolfe.scala#L73
in the three case:
org.apache.spark.ml.regression.AFTSurvivalRegressionSuite.should support all NumericType labels, and not support other types
org.apache.spark.ml.regression.AFTSurvivalRegressionSuite.should support all NumericType censors, and not support other types
org.apache.spark.mllib.optimization.LBFGSSuite.The convergence criteria should work as we expect

@srowen @sethah Do you know the reason ? The require is almost impossible to fail. Is there some bug in the three testcases?

@srowen
Copy link
Member

srowen commented Aug 2, 2017

The actual failure in the first two cases looks like it must be related:

sbt.ForkMain$ForkError: java.lang.IllegalArgumentException: requirement failed: init value should <= bound
	at scala.Predef$.require(Predef.scala:224)
	at breeze.optimize.StrongWolfeLineSearch.minimizeWithBound(StrongWolfe.scala:73)
	at breeze.optimize.StrongWolfeLineSearch.minimize(StrongWolfe.scala:62)
	at breeze.optimize.LBFGS.determineStepSize(LBFGS.scala:76)
	at breeze.optimize.LBFGS.determineStepSize(LBFGS.scala:39)
	at breeze.optimize.FirstOrderMinimizer$$anonfun$infiniteIterations$1.apply(FirstOrderMinimizer.scala:64)
	at breeze.optimize.FirstOrderMinimizer$$anonfun$infiniteIterations$1.apply(FirstOrderMinimizer.scala:62)
	at scala.collection.Iterator$$anon$7.next(Iterator.scala:129)
	at breeze.util.IteratorImplicits$RichIterator$$anon$2.next(Implicits.scala:71)
	at org.apache.spark.ml.regression.AFTSurvivalRegression.fit(AFTSurvivalRegression.scala:263)
	at org.apache.spark.ml.regression.AFTSurvivalRegression.fit(AFTSurvivalRegression.scala:128)

The last one looks like it's just expecting a different convergence sequence. I don't know much about it but didn't seem too odd at first glance. If I have time later this week I'll try to rerun locally and see if the test fixes look easy.

@WeichenXu123
Copy link
Contributor Author

WeichenXu123 commented Aug 2, 2017

@srowen Yeah, the third case is another problem (I think we can simply change the assert statement to be assert(lossLBFGS3.length == 6) in testcase)
I am curious about the first two cases, why trigger the require fail ? By default the bound is always Double.infinite and the require will always pass.
I am busy currently and haven't check deeper. So if you can tell the possible reason can help saving time.

@srowen
Copy link
Member

srowen commented Aug 3, 2017

The only number that is <= Double.PositiveInfinity is Double.NaN, because it has no ordering at all with respect to anything. So init must be NaN somehow.

It's called from LBFGS in Breeze, where the value is if(state.iter == 0.0) 1.0/norm(dir) else 1.0, that should only happen if norm(dir) is NaN, which should only happen if the dir vector has a NaN element. And then so on, but I'm not seeing how the arguments from the Spark code cause this. The initial params are all 0.

It might still be a Breeze issue that's just now uncovered, but, haven't proven that yet.

@srowen
Copy link
Member

srowen commented Aug 3, 2017

I've figured out the problem, and pretty sure it's a problem in the AFT test that was hidden until now. It runs AFTSurvivlaRegression on this input:

+--------+-----+------+------+
|features|label|censor|weight|
+--------+-----+------+------+
|   [0.0]|  0.0|   0.0|   1.0|
|   [1.0]|  1.0|   0.0|   1.0|
|   [2.0]|  2.0|   0.0|   1.0|
|   [3.0]|  3.0|   0.0|   1.0|
|   [4.0]|  4.0|   0.0|   0.0|
+--------+-----+------+------+

The problem is one label is 0, but this is interpreted as a time to failure (I believe?). Somewhere the code takes the log of this value, gets -Infinity, quickly gets NaN in an expression, and eventually causes the error per above.

I think we can just modify the test but wanted to see if that makes sense to @yanboliang @zhengruifeng @BenFradet who have touched the AFT code?

@WeichenXu123
Copy link
Contributor Author

Thanks! Waiting AFT testcode author to figure out how to modify the testcase.

@BenFradet
Copy link
Contributor

@srowen there shouldn't be any issue with removing the first row of the test data afaict.

@srowen
Copy link
Member

srowen commented Aug 3, 2017

Yeah, the only issue is that the test set is generated and used in several tests. Maybe we can just see if changing it works for all callers.

@srowen
Copy link
Member

srowen commented Aug 4, 2017

aft.txt

@WeichenXu123 it does appear that the tests pass with the following tiny changes -- I can make a proper PR on your branch if needed, but I'm lazy and just attached the diff since it's 3 lines.

@WeichenXu123
Copy link
Contributor Author

@srowen Great! thanks!

@srowen
Copy link
Member

srowen commented Aug 4, 2017

@WeichenXu123 there is one more change you'll need, in AFTSurvivalRegressionSuite.scala to also remove a datum with label 0

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pending tests, looks OK. This change passed MLlib tests for me.

@SparkQA
Copy link

SparkQA commented Aug 4, 2017

Test build #80257 has finished for PR 18797 at commit af82dfc.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 4, 2017

Test build #80258 has finished for PR 18797 at commit be57175.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 4, 2017

Test build #80259 has finished for PR 18797 at commit 75c6dd2.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Aug 5, 2017

@WeichenXu123 looks like we need one other tiny tweak to the Python equivalents. There I think it's easiest to convert a 0 to a tiny value:

diff --git a/python/pyspark/ml/regression.py b/python/pyspark/ml/regression.py
index f0ff7a5f59..37fd48c289 100644
--- a/python/pyspark/ml/regression.py
+++ b/python/pyspark/ml/regression.py
@@ -1118,7 +1118,7 @@ class AFTSurvivalRegression(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredi
     >>> from pyspark.ml.linalg import Vectors
     >>> df = spark.createDataFrame([
     ...     (1.0, Vectors.dense(1.0), 1.0),
-    ...     (0.0, Vectors.sparse(1, [], []), 0.0)], ["label", "features", "censor"])
+    ...     (1e-40, Vectors.sparse(1, [], []), 0.0)], ["label", "features", "censor"])
     >>> aftsr = AFTSurvivalRegression()
     >>> model = aftsr.fit(df)
     >>> model.predict(Vectors.dense(6.3))
@@ -1126,12 +1126,12 @@ class AFTSurvivalRegression(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredi
     >>> model.predictQuantiles(Vectors.dense(6.3))
     DenseVector([0.0101, 0.0513, 0.1054, 0.2877, 0.6931, 1.3863, 2.3026, 2.9957, 4.6052])
     >>> model.transform(df).show()
-    +-----+---------+------+----------+
-    |label| features|censor|prediction|
-    +-----+---------+------+----------+
-    |  1.0|    [1.0]|   1.0|       1.0|
-    |  0.0|(1,[],[])|   0.0|       1.0|
-    +-----+---------+------+----------+
+    +-------+---------+------+----------+
+    |  label| features|censor|prediction|
+    +-------+---------+------+----------+
+    |    1.0|    [1.0]|   1.0|       1.0|
+    |1.0E-40|(1,[],[])|   0.0|       1.0|
+    +-------+---------+------+----------+
     ...
     >>> aftsr_path = temp_path + "/aftsr"
     >>> aftsr.save(aftsr_path)

@yanboliang
Copy link
Contributor

@srowen @WeichenXu123
It make sense to remove the datum with label 0, as we compute log(label) which may lead to -Infinity and eventually causes the error. Thanks for catching this.
BTW, what do you think add check when fitting the AFT survival regression model?

def add(data: AFTPoint): this.type = {
    val xi = data.features
    val ti = data.label
    val delta = data.censor

    require(ti > 0.0, "The lifetime or label should be  greater than 0.")
    ......
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If no theoretically guaranteed, why we need to keep this test? I remember we change here multiple times when we did breeze upgrade. What do you think about just removing this line? We never check the number of iterations at other test suites. @srowen @WeichenXu123

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK by me. You could also make it a range. Or something really basic like "> 0".

@srowen
Copy link
Member

srowen commented Aug 5, 2017

Yes it seems like that should be checked somewhere. It might be rational to include it here as a double check that the newly uncovered issue that needs to be fixed to upgrade breeze is gone. But could happen in another change too.

@SparkQA
Copy link

SparkQA commented Aug 8, 2017

Test build #80369 has finished for PR 18797 at commit fbf1677.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 8, 2017

Test build #80381 has finished for PR 18797 at commit 5063758.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 8, 2017

Test build #3884 has finished for PR 18797 at commit 5063758.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

asfgit pushed a commit that referenced this pull request Aug 9, 2017
…strong wolfe line search

## What changes were proposed in this pull request?

Update breeze to 0.13.1 for an emergency bugfix in strong wolfe line search
scalanlp/breeze#651

## How was this patch tested?

N/A

Author: WeichenXu <[email protected]>

Closes #18797 from WeichenXu123/update-breeze.

(cherry picked from commit b35660d)
Signed-off-by: Yanbo Liang <[email protected]>
@yanboliang
Copy link
Contributor

Merged into master and branch-2.2. Thanks for all.

@asfgit asfgit closed this in b35660d Aug 9, 2017
@WeichenXu123 WeichenXu123 deleted the update-breeze branch January 31, 2018 18:37
MatthewRBruce pushed a commit to Shopify/spark that referenced this pull request Jul 31, 2018
…strong wolfe line search

## What changes were proposed in this pull request?

Update breeze to 0.13.1 for an emergency bugfix in strong wolfe line search
scalanlp/breeze#651

## How was this patch tested?

N/A

Author: WeichenXu <[email protected]>

Closes apache#18797 from WeichenXu123/update-breeze.

(cherry picked from commit b35660d)
Signed-off-by: Yanbo Liang <[email protected]>
vatsalmevada pushed a commit to TIBCOSoftware/snappy-spark that referenced this pull request Apr 15, 2019
…strong wolfe line search

## What changes were proposed in this pull request?

Update breeze to 0.13.1 for an emergency bugfix in strong wolfe line search
scalanlp/breeze#651

## How was this patch tested?

N/A

Author: WeichenXu <[email protected]>

Closes apache#18797 from WeichenXu123/update-breeze.

(cherry picked from commit b35660d)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants