Skip to content

Conversation

@AnthonyTruchet
Copy link

@AnthonyTruchet AnthonyTruchet commented Dec 14, 2016

Backport #16037 to 2.0 branch

What changes were proposed in this pull request?

CostFun used to send a dense vector of zeroes as a closure in a
treeAggregate call. To avoid that, we change the aggregation operations
to convert sparse vectors into dense vectors on the fly if needed and we
pass a sparse 0 vector which is lightweight.

How was this patch tested?

Unit test for module mllib run locally for correctness.

As for performance we run an heavy optimization on our production data (50 iterations on 128 MB weight vectors) and have seen significant decrease in terms both of runtime and container being killed by lack of off-heap memory.

Author: Anthony Truchet [email protected]
Author: sethah [email protected]

…rs of 0

CostFun used to send a dense vector of zeroes as a closure in a
treeAggregate call. To avoid that, we replace treeAggregate by
mapPartition + treeReduce, creating a zero vector inside the mapPartition
block in-place.

Unit test for module mllib run locally for correctness.

As for performance we run an heavy optimization on our production data (50 iterations on 128 MB weight vectors) and have seen significant decrease in terms both of runtime and container being killed by lack of off-heap memory.

Author: Anthony Truchet <[email protected]>
Author: sethah <[email protected]>
@srowen
Copy link
Member

srowen commented Dec 14, 2016

I don't think we would backport this to even 2.1. You can close this.

@AnthonyTruchet
Copy link
Author

May I ask why ? There was no conflicts so no additional qualification work is required and this looks like a performance bug fix to me, not a new feature. In order to adjust our contribution policy (fix to our internal version vs pushing upstream) we would need to understand better the backporting policy: would you have any more detailed pointer at it ?

@srowen
Copy link
Member

srowen commented Dec 14, 2016

The guidance such as it is is here: http://spark.apache.org/versioning-policy.html
It is in truth mostly a judgment call. Improvements are rarely back-ported to a maintenance branch, but it depends on impact vs risk. This is low-risk, but also seems minor in impact too. I'd back-port if someone else would second that.

@AnthonyTruchet
Copy link
Author

Ok, thanks for the pointer. I do agree this is a non critical judgement call .

@sethah
Copy link
Contributor

sethah commented Jan 10, 2017

Can we close it?

@AnthonyTruchet
Copy link
Author

We have backported it to our internal version of Spark anyhow. As told above feel free to close it if you consider this is not worth officially backporting :-)

@SparkQA
Copy link

SparkQA commented Jan 14, 2017

Test build #3534 has finished for PR 16279 at commit 27f5796.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@srowen srowen mentioned this pull request Feb 2, 2017
@asfgit asfgit closed this in 20b4ca1 Feb 3, 2017
@Willymontaz Willymontaz deleted the SPARK-18471-branch-2.0 branch April 2, 2019 15:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants