[SPARK-18471][MLLIB][BACKPORT-2.0] In LBFGS, avoid sending huge vectors of 0 #16279

AnthonyTruchet · 2016-12-14T11:22:25Z

Backport #16037 to 2.0 branch

What changes were proposed in this pull request?

CostFun used to send a dense vector of zeroes as a closure in a
treeAggregate call. To avoid that, we change the aggregation operations
to convert sparse vectors into dense vectors on the fly if needed and we
pass a sparse 0 vector which is lightweight.

How was this patch tested?

Unit test for module mllib run locally for correctness.

As for performance we run an heavy optimization on our production data (50 iterations on 128 MB weight vectors) and have seen significant decrease in terms both of runtime and container being killed by lack of off-heap memory.

Author: Anthony Truchet [email protected]
Author: sethah [email protected]

…rs of 0 CostFun used to send a dense vector of zeroes as a closure in a treeAggregate call. To avoid that, we replace treeAggregate by mapPartition + treeReduce, creating a zero vector inside the mapPartition block in-place. Unit test for module mllib run locally for correctness. As for performance we run an heavy optimization on our production data (50 iterations on 128 MB weight vectors) and have seen significant decrease in terms both of runtime and container being killed by lack of off-heap memory. Author: Anthony Truchet <[email protected]> Author: sethah <[email protected]>

srowen · 2016-12-14T13:11:15Z

I don't think we would backport this to even 2.1. You can close this.

AnthonyTruchet · 2016-12-14T17:00:41Z

May I ask why ? There was no conflicts so no additional qualification work is required and this looks like a performance bug fix to me, not a new feature. In order to adjust our contribution policy (fix to our internal version vs pushing upstream) we would need to understand better the backporting policy: would you have any more detailed pointer at it ?

srowen · 2016-12-14T18:28:15Z

The guidance such as it is is here: http://spark.apache.org/versioning-policy.html
It is in truth mostly a judgment call. Improvements are rarely back-ported to a maintenance branch, but it depends on impact vs risk. This is low-risk, but also seems minor in impact too. I'd back-port if someone else would second that.

AnthonyTruchet · 2016-12-15T09:38:35Z

Ok, thanks for the pointer. I do agree this is a non critical judgement call .

sethah · 2017-01-10T02:23:32Z

Can we close it?

AnthonyTruchet · 2017-01-10T09:38:34Z

We have backported it to our internal version of Spark anyhow. As told above feel free to close it if you consider this is not worth officially backporting :-)

SparkQA · 2017-01-14T11:29:11Z

Test build #3534 has finished for PR 16279 at commit 27f5796.

This patch passes all tests.
This patch does not merge cleanly.
This patch adds no public classes.

srowen mentioned this pull request Feb 2, 2017

[BUILD] Close stale PRs #16778

Closed

asfgit closed this in 20b4ca1 Feb 3, 2017

Willymontaz deleted the SPARK-18471-branch-2.0 branch April 2, 2019 15:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-18471][MLLIB][BACKPORT-2.0] In LBFGS, avoid sending huge vectors of 0 #16279

[SPARK-18471][MLLIB][BACKPORT-2.0] In LBFGS, avoid sending huge vectors of 0 #16279

Uh oh!

AnthonyTruchet commented Dec 14, 2016 •

edited

Loading

Uh oh!

srowen commented Dec 14, 2016

Uh oh!

AnthonyTruchet commented Dec 14, 2016

Uh oh!

srowen commented Dec 14, 2016

Uh oh!

AnthonyTruchet commented Dec 15, 2016

Uh oh!

sethah commented Jan 10, 2017

Uh oh!

AnthonyTruchet commented Jan 10, 2017

Uh oh!

SparkQA commented Jan 14, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SPARK-18471][MLLIB][BACKPORT-2.0] In LBFGS, avoid sending huge vectors of 0 #16279

[SPARK-18471][MLLIB][BACKPORT-2.0] In LBFGS, avoid sending huge vectors of 0 #16279

Uh oh!

Conversation

AnthonyTruchet commented Dec 14, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

srowen commented Dec 14, 2016

Uh oh!

AnthonyTruchet commented Dec 14, 2016

Uh oh!

srowen commented Dec 14, 2016

Uh oh!

AnthonyTruchet commented Dec 15, 2016

Uh oh!

sethah commented Jan 10, 2017

Uh oh!

AnthonyTruchet commented Jan 10, 2017

Uh oh!

SparkQA commented Jan 14, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

AnthonyTruchet commented Dec 14, 2016 •

edited

Loading