Skip to content

Conversation

@dbtsai
Copy link
Member

@dbtsai dbtsai commented Jul 21, 2014

(Note: This is not ready to be merged. Need documentation, and make sure it's backforwad compatible with Spark 1.0 apis).

The current implementation of regularization in linear model is using Updater, and this design has couple issues as the following.

  1. It will penalize all the weights including intercept. In machine learning training process, typically, people don't penalize the intercept.
  2. The Updater has the logic of adaptive step size for gradient decent, and we would like to clean it up by separating the logic of regularization out from updater to regularizer so in LBFGS optimizer, we don't need the trick for getting the loss and gradient of objective function.
    In this work, a weighted regularizer will be implemented, and users can exclude the intercept or any weight from regularization by setting that term with zero weighted penalty. Since the regularizer will return a tuple of loss and gradient, the adaptive step size logic, and soft thresholding for L1 in Updater will be moved to SGD optimizer.

@SparkQA
Copy link

SparkQA commented Jul 21, 2014

QA tests have started for PR 1518. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16928/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 21, 2014

QA results for PR 1518:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds the following public classes (experimental):
abstract class Regularizer extends Serializable {
class SimpleRegularizer extends Regularizer {
class CompositeRegularizer extends Regularizer {

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16928/consoleFull

@mengxr
Copy link
Contributor

mengxr commented Jul 29, 2014

@dbtsai I thought another way to do this and want to know your opinion. We can add an optional argument to appendBias: appendBias(bias: Double = 1.0). If this is used in adding intercept, we can add a large bias so the corresponding weight gets less regularized.

@dbtsai
Copy link
Member Author

dbtsai commented Jul 30, 2014

I tried to make the bias really big to make the intercept smaller to avoid being regularized. The result is still quite different from R, and very sensitive to the strength of bias.

Users may re-scale the features to improve the convergence of optimization process, and in order to get the same coefficients without scaling, each component has to be penalized differently. Also, users may know which feature is less important, and want to penalize more.

As a result, I still want to implement the full weighted regualizer, and de-couple the adaptive learning rate from updater. Let's talk in detail when we meet tomorrow. Thanks.

@mengxr
Copy link
Contributor

mengxr commented Jul 30, 2014

I think this is the approach LIBLINEAR uses. Yes, let's discuss tomorrow.

@MLnick
Copy link
Contributor

MLnick commented Aug 5, 2014

This looks promising. FWIW, I support decoupling regularization from the raw gradient update and believe it is a good way to go - it will allow various update/learning rate schemes (adagrad, normalized adaptive gradient, etc) to be applied independent of the regularization.

@dbtsai
Copy link
Member Author

dbtsai commented Aug 5, 2014

It's too late to get into 1.1, but I'll try to make it happen in 1.2. We'll use this in company implementation first.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The case statement will not affect performance?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is not finished yet. Will replace this with the new implemented api foreachActive.

@srowen
Copy link
Member

srowen commented Mar 5, 2015

I'm looking at really old PRs -- this is obsolete now, right?

@dbtsai
Copy link
Member Author

dbtsai commented Mar 5, 2015

@srowen I'm still working on this PR, but unfortunately, I didn't have enough time to finish it so I keep delaying. This PR is important since it will be a general framework to solve L1/L2 problem. The current way we use Updater is very awkward in my opinion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants