-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-2505][MLlib] Weighted Regularizer for Generalized Linear Model #1518
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
QA tests have started for PR 1518. This patch merges cleanly. |
|
QA results for PR 1518: |
|
@dbtsai I thought another way to do this and want to know your opinion. We can add an optional argument to |
|
I tried to make the bias really big to make the intercept smaller to avoid being regularized. The result is still quite different from R, and very sensitive to the strength of bias. Users may re-scale the features to improve the convergence of optimization process, and in order to get the same coefficients without scaling, each component has to be penalized differently. Also, users may know which feature is less important, and want to penalize more. As a result, I still want to implement the full weighted regualizer, and de-couple the adaptive learning rate from updater. Let's talk in detail when we meet tomorrow. Thanks. |
|
I think this is the approach LIBLINEAR uses. Yes, let's discuss tomorrow. |
|
This looks promising. FWIW, I support decoupling regularization from the raw gradient update and believe it is a good way to go - it will allow various update/learning rate schemes (adagrad, normalized adaptive gradient, etc) to be applied independent of the regularization. |
|
It's too late to get into 1.1, but I'll try to make it happen in 1.2. We'll use this in company implementation first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The case statement will not affect performance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is not finished yet. Will replace this with the new implemented api foreachActive.
|
I'm looking at really old PRs -- this is obsolete now, right? |
|
@srowen I'm still working on this PR, but unfortunately, I didn't have enough time to finish it so I keep delaying. This PR is important since it will be a general framework to solve L1/L2 problem. The current way we use Updater is very awkward in my opinion. |
(Note: This is not ready to be merged. Need documentation, and make sure it's backforwad compatible with Spark 1.0 apis).
The current implementation of regularization in linear model is using
Updater, and this design has couple issues as the following.Updaterhas the logic of adaptive step size for gradient decent, and we would like to clean it up by separating the logic of regularization out from updater to regularizer so in LBFGS optimizer, we don't need the trick for getting the loss and gradient of objective function.In this work, a weighted regularizer will be implemented, and users can exclude the intercept or any weight from regularization by setting that term with zero weighted penalty. Since the regularizer will return a tuple of loss and gradient, the adaptive step size logic, and soft thresholding for L1 in Updater will be moved to SGD optimizer.