[SPARK-18471][MLLIB] In LBFGS, avoid sending huge vectors of 0 #15963

AnthonyTruchet · 2016-11-21T14:01:07Z

What changes were proposed in this pull request?

CostFun used to send a dense vector of zeroes as a closure in a
treeAggregate call. To avoid that, we replace treeAggregate by
mapPartition + treeReduce, creating a zero vector inside the mapPartition
block in-place.

How was this patch tested?

Unit test for module mllib run locally for correctness.

As for performance we run an heavy optimization on our production data (50 iterations on 128 MB weight vectors) and have seen significant decrease in terms both of runtime and container being killed by lack of off-heap memory.

CostFun used to send a dense vector of zeroes as a closure in a treeAggregate call. To avoid that, we replace treeAggregate by mapPartition + treeReduce, creating a zero vector inside the mapPartition block in-place.

[SPARK-18471][MLLIB] In LBFGS, avoid sending huge vectors of 0

AmplabJenkins · 2016-11-21T14:02:15Z

Can one of the admins verify this patch?

AnthonyTruchet · 2016-11-21T14:02:19Z

@srowen Here is at last the real PR for SPARK-18471.

Sorry for the noise due to GitHub fiddling...

srowen

I think it's still worth looking into making a similar change elsewhere that seqOp and combOp are used this way, and which might also accidentally take something into the closure. But (modulo minor changes) I think this is an OK change.

srowen · 2016-11-21T14:37:55Z

mllib/src/main/scala/org/apache/spark/mllib/optimization/LBFGS.scala

-          seqOp = (c, v) => (c, v) match { case ((grad, loss), (label, features)) =>
-            val l = localGradient.compute(
-              features, label, bcW.value, grad)
+      /** Given (current accumulated gradient, current loss) and (label, features)


Nit: just use // for two lines of comment. Really /** (vs /*) is for javadoc.

srowen · 2016-11-21T14:38:17Z

mllib/src/main/scala/org/apache/spark/mllib/optimization/LBFGS.scala

-          })
+       }
+
+      val (gradientSum, lossSum) = data.mapPartitions { it => {


Nit: the second brace and its match are redundant

srowen · 2016-11-26T09:19:16Z

Ping @AnthonyTruchet

AnthonyTruchet · 2016-11-27T18:22:41Z

Hello @srowen , sorry not to have updated this lately, I've been taken by other emergencies at work. I'll update this on Monday. Actually I'll submit a variant of treeAggregate in core that we will be able to use for other similar use case in ML(lib).

I appreciate your care about those patches and will attempt to reach a reasonable reactivity too :-)

AnthonyTruchet · 2016-11-28T15:44:34Z

Once more (last time hopefully) I mistakenly fiddled with PR. Closing this one and replace it with #16037.
Code style review above taken into account in new PR.

Eugene Kharitonov and others added 3 commits November 18, 2016 11:07

[SPARK-18471][MLLIB] In LBFGS, avoid sending huge vectors of 0

d5d110b

CostFun used to send a dense vector of zeroes as a closure in a treeAggregate call. To avoid that, we replace treeAggregate by mapPartition + treeReduce, creating a zero vector inside the mapPartition block in-place.

Style fix according to reviewers' feedback

7095bc2

Merge pull request #11 from AnthonyTruchet/ENG-17719-lbfgs-only

011a8d7

[SPARK-18471][MLLIB] In LBFGS, avoid sending huge vectors of 0

srowen requested changes Nov 21, 2016

View reviewed changes

AnthonyTruchet closed this Nov 28, 2016

AnthonyTruchet mentioned this pull request Nov 28, 2016

[SPARK-18471][MLLIB] In LBFGS, avoid sending huge vectors of 0 #16037

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-18471][MLLIB] In LBFGS, avoid sending huge vectors of 0 #15963

[SPARK-18471][MLLIB] In LBFGS, avoid sending huge vectors of 0 #15963

Uh oh!

AnthonyTruchet commented Nov 21, 2016

Uh oh!

AmplabJenkins commented Nov 21, 2016

Uh oh!

AnthonyTruchet commented Nov 21, 2016 •

edited

Loading

Uh oh!

srowen left a comment

Uh oh!

srowen Nov 21, 2016

Uh oh!

srowen Nov 21, 2016

Uh oh!

srowen commented Nov 26, 2016

Uh oh!

AnthonyTruchet commented Nov 27, 2016

Uh oh!

AnthonyTruchet commented Nov 28, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-18471][MLLIB] In LBFGS, avoid sending huge vectors of 0 #15963

[SPARK-18471][MLLIB] In LBFGS, avoid sending huge vectors of 0 #15963

Uh oh!

Conversation

AnthonyTruchet commented Nov 21, 2016

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

AmplabJenkins commented Nov 21, 2016

Uh oh!

AnthonyTruchet commented Nov 21, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

srowen left a comment

Choose a reason for hiding this comment

Uh oh!

srowen Nov 21, 2016

Choose a reason for hiding this comment

Uh oh!

srowen Nov 21, 2016

Choose a reason for hiding this comment

Uh oh!

srowen commented Nov 26, 2016

Uh oh!

AnthonyTruchet commented Nov 27, 2016

Uh oh!

AnthonyTruchet commented Nov 28, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

AnthonyTruchet commented Nov 21, 2016 •

edited

Loading