Skip to content

Conversation

@AnthonyTruchet
Copy link

What changes were proposed in this pull request?

CostFun used to send a dense vector of zeroes as a closure in a
treeAggregate call. To avoid that, we replace treeAggregate by
mapPartition + treeReduce, creating a zero vector inside the mapPartition
block in-place.

How was this patch tested?

Tests run by hand locally.
(Setting up local infra to run the official Spark tests is in progress)

CostFun used to send a dense vector of zeroes as a closure in a
treeAggregate call. To avoid that, we replace treeAggregate by
mapPartition + treeReduce, creating a zero vector inside the mapPartition
block in-place.
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I get it, it's just that you find that the zero values accidentally gets into those function closures. Yes sounds good if it can be rewritten to avoid it. I think there are more instances of this pattern though, in NaiveBayes, ALS, etc. It would be cool to break out these operations even just for code clarity, but especially if it avoids some silent overhead.

* tuples, updates the current gradient and current loss
*/
val seqOp = (c: (Vector, Double), v: (Double, Vector)) => {
(c, v) match { case ((grad, loss), (label, features)) =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: unindent 2 spaces and you can remove the outer braces? same in the next function

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok will do it.

}
}

val (gradientSum, lossSum) = data.mapPartitions(it => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: .mapPartitions { it =>

@AnthonyTruchet
Copy link
Author

By he way do you think that this should be addressed in core or just in each ML specific use ?

@srowen
Copy link
Member

srowen commented Nov 17, 2016

I personally think it's good to be consistent. I think it's more readable to break out these function definitions, and, it seems like there's evidence it might avoid some unintended objects in a closure. Have a look for other instances of "seqOp = ..." etc and see which ones look like the same pattern that could be refactored.

@AnthonyTruchet
Copy link
Author

I missed part of my company guidelines. Closing this PR and creating a new one shortly from my company account. Sorry for the noise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants