Skip to content

Conversation

@AnthonyTruchet
Copy link

What changes were proposed in this pull request?

CostFun used to send a dense vector of zeroes as a closure in a
treeAggregate call. To avoid that, we replace treeAggregate by
mapPartition + treeReduce, creating a zero vector inside the mapPartition
block in-place.

How was this patch tested?

Unit test for module mllib run locally for correctness.

As for performance we run an heavy optimization on our production data (50 iterations on 128 MB weight vectors) and have seen significant decrease in terms both of runtime and container being killed by lack of off-heap memory.

Eugene Kharitonov and others added 3 commits November 18, 2016 11:07
CostFun used to send a dense vector of zeroes as a closure in a
treeAggregate call. To avoid that, we replace treeAggregate by
mapPartition + treeReduce, creating a zero vector inside the mapPartition
block in-place.
[SPARK-18471][MLLIB] In LBFGS, avoid sending huge vectors of 0
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@AnthonyTruchet
Copy link
Author

AnthonyTruchet commented Nov 21, 2016

@srowen Here is at last the real PR for SPARK-18471.

Sorry for the noise due to GitHub fiddling...

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's still worth looking into making a similar change elsewhere that seqOp and combOp are used this way, and which might also accidentally take something into the closure. But (modulo minor changes) I think this is an OK change.

seqOp = (c, v) => (c, v) match { case ((grad, loss), (label, features)) =>
val l = localGradient.compute(
features, label, bcW.value, grad)
/** Given (current accumulated gradient, current loss) and (label, features)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: just use // for two lines of comment. Really /** (vs /*) is for javadoc.

})
}

val (gradientSum, lossSum) = data.mapPartitions { it => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: the second brace and its match are redundant

@srowen
Copy link
Member

srowen commented Nov 26, 2016

Ping @AnthonyTruchet

@AnthonyTruchet
Copy link
Author

Hello @srowen , sorry not to have updated this lately, I've been taken by other emergencies at work. I'll update this on Monday. Actually I'll submit a variant of treeAggregate in core that we will be able to use for other similar use case in ML(lib).

I appreciate your care about those patches and will attempt to reach a reasonable reactivity too :-)

@AnthonyTruchet
Copy link
Author

Once more (last time hopefully) I mistakenly fiddled with PR. Closing this one and replace it with #16037.
Code style review above taken into account in new PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants