-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-18471][MLLIB] In LBFGS, avoid sending huge vectors of 0 #16037
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
0ce8c64
9b30c5c
3b59ab2
d87fc46
18fcbba
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -241,16 +241,24 @@ object LBFGS extends Logging { | |
| val bcW = data.context.broadcast(w) | ||
| val localGradient = gradient | ||
|
|
||
| val (gradientSum, lossSum) = data.treeAggregate((Vectors.zeros(n), 0.0))( | ||
| seqOp = (c, v) => (c, v) match { case ((grad, loss), (label, features)) => | ||
| val l = localGradient.compute( | ||
| features, label, bcW.value, grad) | ||
| (grad, loss + l) | ||
| }, | ||
| combOp = (c1, c2) => (c1, c2) match { case ((grad1, loss1), (grad2, loss2)) => | ||
| axpy(1.0, grad2, grad1) | ||
| (grad1, loss1 + loss2) | ||
| }) | ||
| val seqOp = (c: (Vector, Double), v: (Double, Vector)) => | ||
| (c, v) match { | ||
| case ((grad, loss), (label, features)) => | ||
| val denseGrad = grad.toDense | ||
| val l = localGradient.compute(features, label, bcW.value, denseGrad) | ||
| (denseGrad, loss + l) | ||
| } | ||
|
|
||
| val combOp = (c1: (Vector, Double), c2: (Vector, Double)) => | ||
| (c1, c2) match { case ((grad1, loss1), (grad2, loss2)) => | ||
| val denseGrad1 = grad1.toDense | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am still pretty strongly in favor of adding a test case explicitly for this. Just make an RDD with at least one empty partition, and be sure that LBFGS will run on it.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm sorry, I have no clue how to generate a non-empty RDD with an empty partition. Can you please hint me at some entry points so that I can contribute the UTest you request ?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What about just parallelizing n elements into more than n partitions? at least one partition must be empty.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @AnthonyTruchet does that let you add a quick test case? then I think this can be merged.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll try to work on this in the next two days. I will not be able to relaunch the actual benchmark we did weeks ago, our internal codebase has changed too much.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I attempted to submit a PR to your branch, but combination of sketchy wifi and git - not sure it worked.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I did thanks and I merged it. I was just comming back to this contrib when I saw it. |
||
| val denseGrad2 = grad2.toDense | ||
| axpy(1.0, denseGrad2, denseGrad1) | ||
| (denseGrad1, loss1 + loss2) | ||
| } | ||
|
|
||
| val zeroSparseVector = Vectors.sparse(n, Seq()) | ||
| val (gradientSum, lossSum) = data.treeAggregate((zeroSparseVector, 0.0))(seqOp, combOp) | ||
|
|
||
| /** | ||
| * regVal is sum of weight squares if it's L2 updater; | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is necessary because we may have empty partitions, right? Might be nice to add a test explicitly for this (it seems it failed in pyspark without it, but that was just a coincidence?) so someone doesn't remove these lines in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Meaning, when would the args ever not be dense? I agree, shouldn't be sparse at this stage, but doing this defensively seems fine since it's a no-op for dense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. actually missing the handling of dense vector was the cause for the PySpark UTest failure we observed.