[SPARK-18471][MLLIB][BACKPORT-2.0] In LBFGS, avoid sending huge vectors of 0 #16279
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Backport #16037 to 2.0 branch
What changes were proposed in this pull request?
CostFun used to send a dense vector of zeroes as a closure in a
treeAggregate call. To avoid that, we change the aggregation operations
to convert sparse vectors into dense vectors on the fly if needed and we
pass a sparse 0 vector which is lightweight.
How was this patch tested?
Unit test for module mllib run locally for correctness.
As for performance we run an heavy optimization on our production data (50 iterations on 128 MB weight vectors) and have seen significant decrease in terms both of runtime and container being killed by lack of off-heap memory.
Author: Anthony Truchet [email protected]
Author: sethah [email protected]