[SPARK-18471][MLLIB] In LBFGS, avoid sending huge vectors of 0 #15905

AnthonyTruchet · 2016-11-16T15:58:14Z

What changes were proposed in this pull request?

CostFun used to send a dense vector of zeroes as a closure in a
treeAggregate call. To avoid that, we replace treeAggregate by
mapPartition + treeReduce, creating a zero vector inside the mapPartition
block in-place.

How was this patch tested?

Tests run by hand locally.
(Setting up local infra to run the official Spark tests is in progress)

CostFun used to send a dense vector of zeroes as a closure in a treeAggregate call. To avoid that, we replace treeAggregate by mapPartition + treeReduce, creating a zero vector inside the mapPartition block in-place.

AmplabJenkins · 2016-11-16T16:02:19Z

Can one of the admins verify this patch?

srowen

Oh I get it, it's just that you find that the zero values accidentally gets into those function closures. Yes sounds good if it can be rewritten to avoid it. I think there are more instances of this pattern though, in NaiveBayes, ALS, etc. It would be cool to break out these operations even just for code clarity, but especially if it avoids some silent overhead.

srowen · 2016-11-16T16:02:45Z

mllib/src/main/scala/org/apache/spark/mllib/optimization/LBFGS.scala

+       * tuples, updates the current gradient and current loss
+       */
+      val seqOp = (c: (Vector, Double), v: (Double, Vector)) => {
+            (c, v) match { case ((grad, loss), (label, features)) =>


Nit: unindent 2 spaces and you can remove the outer braces? same in the next function

ok will do it.

srowen · 2016-11-16T16:02:57Z

mllib/src/main/scala/org/apache/spark/mllib/optimization/LBFGS.scala

+           }
+      }
+
+      val (gradientSum, lossSum) = data.mapPartitions(it => {


Nit: .mapPartitions { it =>

AnthonyTruchet · 2016-11-17T09:01:18Z

By he way do you think that this should be addressed in core or just in each ML specific use ?

srowen · 2016-11-17T11:04:40Z

I personally think it's good to be consistent. I think it's more readable to break out these function definitions, and, it seems like there's evidence it might avoid some unintended objects in a closure. Have a look for other instances of "seqOp = ..." etc and see which ones look like the same pattern that could be refactored.

AnthonyTruchet · 2016-11-17T15:59:40Z

I missed part of my company guidelines. Closing this PR and creating a new one shortly from my company account. Sorry for the noise.

[SPARK-18471][MLLIB] In LBFGS, avoid sending huge vectors of 0

e0a88bd

CostFun used to send a dense vector of zeroes as a closure in a treeAggregate call. To avoid that, we replace treeAggregate by mapPartition + treeReduce, creating a zero vector inside the mapPartition block in-place.

srowen reviewed Nov 16, 2016

View reviewed changes

Fix formatting as per reviewers indications

4d31264

AnthonyTruchet closed this Nov 17, 2016

AnthonyTruchet mentioned this pull request Nov 18, 2016

[SPARK-18471][MLLIB] In LBFGS, avoid sending huge vectors of 0 #15931

Closed

srowen mentioned this pull request Nov 28, 2016

[SPARK-18471][CORE] New treeAggregate overload for big large aggregators #16038

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-18471][MLLIB] In LBFGS, avoid sending huge vectors of 0 #15905

[SPARK-18471][MLLIB] In LBFGS, avoid sending huge vectors of 0 #15905

Uh oh!

AnthonyTruchet commented Nov 16, 2016

Uh oh!

AmplabJenkins commented Nov 16, 2016

Uh oh!

srowen left a comment

Uh oh!

srowen Nov 16, 2016

Uh oh!

AnthonyTruchet Nov 17, 2016

Uh oh!

srowen Nov 16, 2016

Uh oh!

AnthonyTruchet commented Nov 17, 2016

Uh oh!

srowen commented Nov 17, 2016

Uh oh!

AnthonyTruchet commented Nov 17, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-18471][MLLIB] In LBFGS, avoid sending huge vectors of 0 #15905

[SPARK-18471][MLLIB] In LBFGS, avoid sending huge vectors of 0 #15905

Uh oh!

Conversation

AnthonyTruchet commented Nov 16, 2016

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

AmplabJenkins commented Nov 16, 2016

Uh oh!

srowen left a comment

Choose a reason for hiding this comment

Uh oh!

srowen Nov 16, 2016

Choose a reason for hiding this comment

Uh oh!

AnthonyTruchet Nov 17, 2016

Choose a reason for hiding this comment

Uh oh!

srowen Nov 16, 2016

Choose a reason for hiding this comment

Uh oh!

AnthonyTruchet commented Nov 17, 2016

Uh oh!

srowen commented Nov 17, 2016

Uh oh!

AnthonyTruchet commented Nov 17, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants