[SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hinge loss for LinearSVC #17862

hhbyyh · 2017-05-04T19:30:30Z

What changes were proposed in this pull request?

jira: https://issues.apache.org/jira/browse/SPARK-20602

Currently LinearSVC in Spark only supports OWLQN as the optimizer ( check https://issues.apache.org/jira/browse/SPARK-14709). I made comparison between LBFGS and OWLQN on several public dataset and found LBFGS converges much faster for LinearSVC in most cases.
The following table presents the number of training iterations and f1 score of both optimizers until convergence

data source: https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html
training code: new LinearSVC().setMaxIter(10000).setTol(1e-6) with master=local[1]
LBFGS requires far less iterations in most cases and probably is a better default optimizer.

How was this patch tested?

strengthen existing unit tests

hhbyyh · 2017-05-04T19:35:30Z

ping @jkbradley Sorry I know this is like the last minute for 2.2, but the change may be important for user experience. If we're not comfortable making API change right now, how about we just change the default optimizer to LBFGS before 2.2 releases?

Also appreciate the comments from other reviewers.

SparkQA · 2017-05-04T20:02:24Z

Test build #76463 has finished for PR 17862 at commit d46e5ed.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

sethah · 2017-05-04T20:27:33Z

mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala

+   * @group setParam
+   */
+  @Since("2.2.0")
+  def setOptimizer(value: String): this.type = set(optimizer, value.toLowerCase(Locale.ROOT))


Maybe just call it solver, since LinearRegression also has something similar like this.

👍 - can use HasSolver trait

The solver is l-bfgs if training with lbfgs(l2 reg) or owlqn(l1/elasticNet reg) for other algorithms such as LinearRegression and LogisticRegression, so I concerned that it may confuse users if we distinguish them in LinearSVC. Or we should clarify it in document. Thanks.

MLnick · 2017-05-05T07:49:40Z

mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala


-      def regParamL1Fun = (index: Int) => 0D
-      val optimizer = new BreezeOWLQN[Int, BDV[Double]]($(maxIter), 10, regParamL1Fun, $(tol))
+      val optimizerAlgo = $(optimizer) match {


Could you refresh me on why OWLQN was used originally? AFAIK OWLQN is for L1 regularized loss functions, while L-BFGS only supports L2?

@MLnick Hinge Loss is a non-smooth and non-differentiable loss, the convex optimization problem is a non-smooth one, so we use use OWL-QN rather than L-BFGS as the underlying optimizer.
@hhbyyh AFAIK in cases where L-BFGS does not encounter any non-smooth point, it often converges to the optimum faster, but there is change of failure on non-smooth point.

Thanks for providing the info @yanboliang. I'm thinking we should provide both LBFGS and OWNQN as options for L1-SVM (Standard hinge). From the several dataset I tested, LBFGS converges as good as (or even better than) OWNQN and uses part of the time. Yet due to the potential failure, we may not use LBFGS as the default optimizer(solver).

A better solution as I see is to use L2-SVM(squared hinge loss) with LBFGS (combine this with #17645), but the change may be too large to be included into 2.2 now.

I saw other libraries like sklearn.svm.linearSVC use squared hinge loss as the default loss function, it's differentiable. What about merging #17645 firstly and then we can use LBFGS as the default solver as well? Otherwise, we would make behavior change if we make this switch after 2.2 release. cc @jkbradley

Sounds good to me. But we'll also need to update python and R API. I don't know the release date of 2.2.

Let's wait for more other comments, if we reach a consensus, I'd like to help review and merge this to catch 2.2. cc @jkbradley @MLnick

Personally I would be in favor of making square hinge the default with L-BFGS. But we would need to make it clear to users that if they select the pure hinge loss then there is a risk involved in using L-BFGS rather than OWL-QN.

hhbyyh · 2017-05-06T02:36:31Z

Update: switch to HasSolver trait and use OWLQN as default optimizer

SparkQA · 2017-05-06T03:35:08Z

Test build #76508 has finished for PR 17862 at commit f7d5559.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

debasish83 · 2017-05-07T02:36:01Z

@hhbyyh can we smooth the hinge-loss using soft-max (variant of ReLU) and then use LBFGS ?

hhbyyh · 2017-05-08T19:15:23Z

@debasish83 There're several approaches trying to smooth the hinge loss. https://en.wikipedia.org/wiki/Hinge_loss. For the one you're proposing, do you know if it's used in other SVM implementations? Please help provide some reference or evaluation data if possible. Thanks.

SparkQA · 2017-05-09T06:12:39Z

Test build #76637 has started for PR 17862 at commit 8a7c10f.

yanboliang · 2017-05-09T08:06:00Z

mllib/src/test/scala/org/apache/spark/ml/classification/LinearSVCSuite.scala

-    val trainer1 = new LinearSVC()
+  test("linearSVC OWLQN comparison with R e1071 and scikit-learn") {
+    val trainer1 = new LinearSVC().setSolver(LinearSVC.OWLQN)
      .setRegParam(0.00002) // set regParam = 2.0 / datasize / c


Can I ask why we set regParam = 2.0 / datasize / c to match solution of sklearn.svm.LinearSVC? AFAIK, sklearn called liblinear to train linear SVM classification model, I can understand liblinear using C for penalty parameter of the error term which is different from regParam, and it doesn't multiply 1/n on the error term, but where is the 2.0 come from? Thanks.

http://www.robots.ox.ac.uk/~az/lectures/ml/lect2.pdf
Please refer to page 36.

This slides also explain it...Please see slide 32...the max can be replaced by soft-max with the softness lambda can be tuned...log-sum-exp is a standard soft-max that can be used which is similar to ReLu functions and we can re-use it from MLP:
ftp://ftp.cs.wisc.edu/math-prog/talks/informs99ssv.ps
ftp://ftp.cs.wisc.edu/pub/dmi/tech-reports/99-03.pdf
I can add the formulation if there is interest...it needs some tuning for soft-max parameter but the convergence will be good with LBFGS (OWLQN is not needed)

@debasish83 Not sure if it's a better solution than squared hinge loss. But I would be interested to learn the performance (accuracy and speed). Can you please help try to evaluate it?

hinge loss is not differentiable...how are you smoothing it before you can use a quasi newton solver ? Since the papers smooth the max, a newton/quasi-newton solver should work well...if you are keeping the non-differentiable loss better will be to use a sub-gradient solver as suggested by the talk...I will evaluate the formulation...

@hhbyyh I saw some posts that hinge loss is not differentiable but squared hinge loss is for practical purposes...can you please point to a reference on squared hinge loss ?

yanboliang · 2017-05-09T08:19:25Z

mllib/src/test/scala/org/apache/spark/ml/classification/LinearSVCSuite.scala


+  test("linearSVC L-BFGS comparison with R e1071 and scikit-learn") {
+    val trainer1 = new LinearSVC().setSolver(LinearSVC.LBFGS)
+      .setRegParam(0.00003)


Why we switch to different regParam compared with the above test? Does it means the converged solution will depends on optimizer?

Indeed. I can use your help here since I cannot find the theory proof for this case.

LR may have the same behavior. Since if I set the optimizer of LR to OWLQN, all the L2 regularization cases will fail.

SparkQA · 2017-06-13T19:04:43Z

Test build #78002 has finished for PR 17862 at commit 3707580.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-06-14T02:13:13Z

Test build #78019 has finished for PR 17862 at commit 2ca5a74.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

hhbyyh · 2017-06-14T03:27:17Z

Merge the change from #17645 into a single change.

sethah · 2017-06-14T04:58:57Z

@hhbyyh Thanks for doing the extra work to use the new aggregator here. I do think it's better to separate those changes from this one, though. There is actually more that needs to be done for the conversion (need to use RDDLossFunction and also add a test suite for the aggregator). Would you mind submitting a PR for just the conversion changes?

SparkQA · 2017-06-14T05:36:32Z

Test build #78024 has finished for PR 17862 at commit 5f7f456.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

hhbyyh · 2017-06-14T19:31:58Z

Sure. That's reasonable. I'll move the hingeAggregator to a new PR.

SparkQA · 2017-06-15T20:34:44Z

Test build #78119 has finished for PR 17862 at commit 0297057.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jkbradley · 2017-08-29T02:50:23Z

Catching up here... To make sure I caught the decisions made in the discussion above, is it correct that this PR will:

Add support for squared hinge loss, and use that as the default (which I fully support)
Switch from OWLQN to LBFGS (which is fine with me if reasonably large-scale tests support that choice)
- On this note, has anyone tested on large-scale datasets (ideally with millions of rows and columns)?

If those are the decisions, can you please update the PR description and JIRA as such when you update the PR @hhbyyh ? Thank you!

yanboliang · 2017-08-30T09:47:11Z

+1 @jkbradley for test on large-scale datasets. @hhbyyh Do you have time to test it? If not, I can help. Thanks.

WeichenXu123 · 2017-08-30T10:20:06Z

+1 for adding test on large-scale datasets.
Another thing I want to know is that: you can compare the final loss value on the result coefficients, between LIBLINEAR(scikit-learn), LBFGS, OWLQN, so that we can know which optimizer can get more optimized result (testing and comparing on different kind of dataset).

hhbyyh · 2017-08-30T18:23:03Z

Sure, I can find some larger dataset to test with.

But I guess, as showed in the PR description, LBFGS will generally outperform OWLQS, but not in all the cases. I assume single large scale dataset is not sufficient for the testing, so if you plan to test on some data set, please leave comment here. Thank you.

SparkQA · 2017-09-04T00:13:07Z

Test build #81367 has finished for PR 17862 at commit cec628b.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

hhbyyh · 2017-09-12T16:12:46Z

Tested with several larger data set with Hinge Loss function, to compare l-bfgs and owlqn solvers.
Run until converged or exceed maxIter (2000).

dataset	numRecords	numFeatures	l-bfgs iterations	owlqn iterations	l-bfgs final loss	owlqn final loss
url_combined	2396130	3231961	317 (952 sec)	287 (1661 sec)	9.71E-5	1.64E-4
kdda	8407752	20216830	2000+ (29729 sec)	288 13664 (sec)	0.0068	0.0135
webspam	350000	254	344 (67 sec)	1502 (714 sec)	0.18273	0.18273
SUSY	5000000	18	152 (145 sec)	1242 (3357 sec)	0.499	0.499

l-bfgs does not always take fewer iterations, but it converges to a smaller final loss.
For each iteration, owlqn takes longer time ( 2 or 3 times) than l-bfgs. Logistic Regression also exhibits the similar behavior.

WeichenXu123 · 2017-09-13T01:41:33Z

@hhbyyh Test result looks good!
OWLQN takes longer time for each iteration, because each iteration's line search, it made more passes on dataset.

SparkQA · 2017-09-16T01:50:43Z

Test build #81836 has finished for PR 17862 at commit 0f5cad5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-10-02T01:03:46Z

Test build #82376 has finished for PR 17862 at commit bf4d955.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-10-02T02:20:23Z

Test build #82377 has finished for PR 17862 at commit 1f8e984.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-10-15T04:26:38Z

Test build #82766 has finished for PR 17862 at commit 0bb5afe.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

hhbyyh · 2017-10-17T05:32:58Z

Please let me know if there's any unresolved comments. Thanks.

WeichenXu123 · 2017-10-18T10:06:02Z

mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala

+  private[classification] val HINGE: String = "hinge".toLowerCase(Locale.ROOT)
+
+  /** String name for Squared Hinge Loss. */
+  private[classification] val SQUARED_HINGE: String = "squared_hinge".toLowerCase(Locale.ROOT)


Why need .toLowerCase(Locale.ROOT) here ?

To ensure consistency with param validation across all Locales.

ditto. IMO these characters are all in ASCII, I think they won't encounter locales issue. (But do you encounter such issue in some env ?)

Personally I never, but I cannot grantee it for all the Locales.

WeichenXu123 · 2017-10-18T10:16:08Z

mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala

+  @Since("2.3.0")
+  final val loss: Param[String] = new Param(this, "loss", "Specifies the loss " +
+    "function. hinge is the standard SVM loss while squared_hinge is the square of the hinge loss.",
+    (s: String) => LinearSVC.supportedLoss.contains(s.toLowerCase(Locale.ROOT)))


The isValid function you can use
ParamValidators.inArray[String](supportedLosses))

Correct me if I'm wrong, IMO we need toLowerCase here.

Yeah, I thought about this, but solver param in LinearRegression also ignore the thing. I tend to keep them consistent, what do you think of it ?

I tend to support case-insensitive params in LinearRegression, or change the default behavior of ParamValidators.inArray. And we should improve the consistency in supporting case-insensitive String params anyway.

Created a jira to address that issue: https://issues.apache.org/jira/browse/SPARK-22331

WeichenXu123 · 2017-10-18T10:17:04Z

mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala

+  private[classification] val SQUARED_HINGE: String = "squared_hinge".toLowerCase(Locale.ROOT)
+
+  /* Set of loss function that LinearSVC supports */
+  private[classification] val supportedLoss = Array(HINGE, SQUARED_HINGE)


supportedLoss ==> supportedLosses

Sure. I can update it.

hhbyyh · 2017-10-23T04:52:17Z

Thanks @WeichenXu123 for the comments. I'll update supportedLoss ==> supportedLosses with other possible changes.

SparkQA · 2019-05-17T22:23:33Z

Test build #105501 has finished for PR 17862 at commit 0bb5afe.

This patch fails Spark unit tests.
This patch does not merge cleanly.
This patch adds no public classes.

SparkQA · 2019-10-22T03:26:50Z

Test build #112418 has finished for PR 17862 at commit 64bc339.

This patch fails Spark unit tests.
This patch does not merge cleanly.
This patch adds no public classes.

WeichenXu123 · 2019-10-24T09:22:55Z

@hhbyyh This PR is stale. If there's nobody interested in this and no further updates, would you mind to close it ? Thanks!

add lbfgs as default optimizer of LinearSVC

d46e5ed

sethah reviewed May 4, 2017

View reviewed changes

MLnick reviewed May 5, 2017

View reviewed changes

set owlqn as default

f7d5559

set check

8a7c10f

yanboliang reviewed May 9, 2017

View reviewed changes

hhbyyh mentioned this pull request Jun 12, 2017

[SPARK-20348] [ML] Support squared hinge loss (L2 loss) for LinearSVC #17645

Closed

YY-OnCall added 3 commits June 12, 2017 14:54

:Merge remote-tracking branch 'upstream/master' into svclbfgs

4ce0787

Merge remote-tracking branch 'upstream/master' into svclbfgs

c8afc63

merge loss change

3707580

YY-OnCall added 2 commits June 13, 2017 14:11

Merge remote-tracking branch 'upstream/master' into svclbfgs

2ffd0eb

fix r and python

2ca5a74

hhbyyh changed the title ~~[SPARK-20602] [ML]Adding LBFGS as optimizer for LinearSVC~~ [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hinge loss for LinearSVC Jun 14, 2017

switch between Hinge and Square

5f7f456

YY-OnCall added 2 commits June 15, 2017 11:51

Merge remote-tracking branch 'upstream/master' into svclbfgs

d19f619

use RDDLossFunction

0297057

YY-OnCall added 3 commits September 2, 2017 16:00

merge conflict

ea82f35

merge conflict and add unit tests

93f7b68

style

cec628b

YY-OnCall added 2 commits September 15, 2017 10:07

resolve conflict

55ce6b9

fix python ut

0f5cad5

resolve conflict

bf4d955

style

1f8e984

YY-OnCall added 3 commits October 3, 2017 16:02

Merge remote-tracking branch 'upstream/master' into svclbfgs

a6b4cda

Merge remote-tracking branch 'upstream/master' into svclbfgs

f778f97

minor updates

0bb5afe

WeichenXu123 reviewed Oct 18, 2017

View reviewed changes

Merge remote-tracking branch 'upstream/master' into svclbfgs

64bc339

dongjoon-hyun added the ML label Jun 14, 2019

hhbyyh closed this Oct 26, 2019

[SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hinge loss for LinearSVC #17862

[SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hinge loss for LinearSVC #17862

Uh oh!

Conversation

hhbyyh commented May 4, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

hhbyyh commented May 4, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented May 4, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MLnick May 5, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yanboliang May 5, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yanboliang May 8, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hhbyyh commented May 6, 2017

Uh oh!

SparkQA commented May 6, 2017

Uh oh!

debasish83 commented May 7, 2017

Uh oh!

hhbyyh commented May 8, 2017

Uh oh!

SparkQA commented May 9, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

debasish83 May 10, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hhbyyh May 10, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jun 13, 2017

Uh oh!

SparkQA commented Jun 14, 2017

Uh oh!

hhbyyh commented Jun 14, 2017

Uh oh!

sethah commented Jun 14, 2017

Uh oh!

SparkQA commented Jun 14, 2017

Uh oh!

hhbyyh commented May 4, 2017 •

edited

Loading

hhbyyh commented May 4, 2017 •

edited

Loading

MLnick May 5, 2017 •

edited

Loading

yanboliang May 5, 2017 •

edited

Loading

yanboliang May 8, 2017 •

edited

Loading

debasish83 May 10, 2017 •

edited

Loading

hhbyyh May 10, 2017 •

edited

Loading

hhbyyh Oct 23, 2017 •

edited

Loading