Skip to content

Commit 59a49db

Browse files
DB Tsaimengxr
authored andcommitted
[SPARK-4887][MLlib] Fix a bad unittest in LogisticRegressionSuite
The original test doesn't make sense since if you step in, the lossSum is already NaN, and the coefficients are diverging. That's because the step size is too large for SGD, so it doesn't work. The correct behavior is that you should get smaller coefficients than the one without regularization. Comparing the values using 20000.0 relative error doesn't make sense as well. Author: DB Tsai <[email protected]> Closes #3735 from dbtsai/mlortestfix and squashes the following commits: b1a3c42 [DB Tsai] first commit
1 parent 3720057 commit 59a49db

File tree

1 file changed

+4
-3
lines changed

1 file changed

+4
-3
lines changed

mllib/src/test/scala/org/apache/spark/mllib/classification/LogisticRegressionSuite.scala

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -178,15 +178,16 @@ class LogisticRegressionSuite extends FunSuite with MLlibTestSparkContext with M
178178
// Use half as many iterations as the previous test.
179179
val lr = new LogisticRegressionWithSGD().setIntercept(true)
180180
lr.optimizer.
181-
setStepSize(10.0).
181+
setStepSize(1.0).
182182
setNumIterations(10).
183183
setRegParam(1.0)
184184

185185
val model = lr.run(testRDD, initialWeights)
186186

187187
// Test the weights
188-
assert(model.weights(0) ~== -430000.0 relTol 20000.0)
189-
assert(model.intercept ~== 370000.0 relTol 20000.0)
188+
// With regularization, the resulting weights will be smaller.
189+
assert(model.weights(0) ~== -0.14 relTol 0.02)
190+
assert(model.intercept ~== 0.25 relTol 0.02)
190191

191192
val validationData = LogisticRegressionSuite.generateLogisticInput(A, B, nPoints, 17)
192193
val validationRDD = sc.parallelize(validationData, 2)

0 commit comments

Comments
 (0)