Skip to content

Conversation

@VinceShieh
Copy link

What changes were proposed in this pull request?

Multinomial logistic regression uses LogisticAggregator class for gradient updates.
This PR refactors MLOR to use level 2 BLAS operations for the updates

How was this patch tested?

Existing test would do

Signed-off-by: VinceShieh [email protected]

Multinomial logistic regression uses LogisticAggregator class for gradient updates.
This PR refactors MLOR to use level 2 BLAS operations for the updates.

Signed-off-by: VinceShieh <[email protected]>
Copy link
Member

@dbtsai dbtsai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any benchmark? I wonder how much speed up with this PR? Thank you for working on this.

@SparkQA
Copy link

SparkQA commented May 8, 2017

Test build #76558 has finished for PR 17894 at commit b4fd733.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


import breeze.linalg.{DenseVector => BDV}
import breeze.optimize.{CachedDiffFunction, DiffFunction, LBFGS => BreezeLBFGS, LBFGSB => BreezeLBFGSB, OWLQN => BreezeOWLQN}
import com.github.fommil.netlib.BLAS.{getInstance => blas}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it better to use MLlib BLAS interface?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MLLib BLAS doesnt have ger support, we might, of course, add an API support in MLLib Blas for this issue

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add it in spark ml? Thanks.

var maxMargin = Double.NegativeInfinity

val margins = new Array[Double](numClasses)
val featureStdArray = new Array[Double](features.size)
Copy link
Member

@dbtsai dbtsai May 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will densify the sparse features. We should handle them differently. For sparse, we don't need to do level 2 BLAS which will not help.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Still, we will try benchmark on the sparse dataset, if such change hurt the performance for sparse data, we will bypass this change for it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my company, we have use-case of handing very sparse input with around 20 non-zero features with millions of total feature space. This implementation will break in this scenario.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest change the featureStdArray as Aggregator class member, so that avoid each update allocate a new temporary array.


import breeze.linalg.{DenseVector => BDV}
import breeze.optimize.{CachedDiffFunction, DiffFunction, LBFGS => BreezeLBFGS, LBFGSB => BreezeLBFGSB, OWLQN => BreezeOWLQN}
import com.github.fommil.netlib.BLAS.{getInstance => blas}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hhbyyh
Copy link
Contributor

hhbyyh commented May 8, 2017

I'm not sure how much acceleration we can get from Level 2 BLAS. For benchmark, we also would need to evaluate the performance for sparse data.

@VinceShieh
Copy link
Author

@hhbyyh performance testing is ongoing, thanks!

@sethah
Copy link
Contributor

sethah commented May 11, 2017

Would you mind adding [WIP] to the title? Without even a benchmark for dense features, this is definitely a work-in-progress.

@VinceShieh VinceShieh changed the title [SPARK-17134][ML] Use level 2 BLAS operations in LogisticAggregator [WIP][SPARK-17134][ML] Use level 2 BLAS operations in LogisticAggregator May 17, 2017
@VinceShieh
Copy link
Author

@sethah Sorry for the late response. Setting as WIP. We have performance data for dense features, data for the sparse feature will be ready soon. thanks.

@VinceShieh
Copy link
Author

sorry for late update!
we tested on this PR against the current implementation with both dense and sparse(0.95 sparsity):
image
image
image

The test on single machine was run on 100 samples on each feature set scale, we can get performance gain (less training time) on both dense and sparse dataset, on distributed case, we can also achieve a good performance with fine tuning (num_cores, data partitions, etc..), but this change inevitably put more constraint on memory and will bring up GC problem if no enough memory is available on worker node, for sparse dataset on distributed cluster, we are still unable to get a good result, so maybe we should bypass this change for sparse case, but before making such change, I
d like to hear your thoughts on current test result we have, maybe we can make it a better PR with your input :)

Thanks.

@VinceShieh
Copy link
Author

Forgot to mention, we observed a nearly 2x performance gain with the help of nativeBLAS- MKL, without a fine tuning, so if we can also make F2J version run faster in distributed cluster than the current design, it would truly be a good PR for community. :)

image

@sethah
Copy link
Contributor

sethah commented Jun 1, 2017

@VinceShieh Thanks for posting your results. You tested these on datasets with only 100 samples correct? That's probably not a representative use case of a normal workload... Also, how many classes (i.e. numClasses) did you use?

I've actually been looking at using level 3 BLAS operations in the logistic aggregator, and initial results showed close to 10x speedups in some cases. I am holding off submitting any code because it would require a fairly significant refactoring of the code, which will be made much easier after #17094 is merged. Using level 2 BLAS is a less invasive change, but the test results you show provide rather small speedups.

My preference is to wait a bit and submit a change that incorporates level 3 BLAS in logistic regression. We should get @dbtsai's opinion too.

@VinceShieh
Copy link
Author

@sethah yes, we only take 100 samples and trained with 3 iterations, numClasss is 20 of our test dataset for single node testing.
Yeah, I also believe it'd have a better result if it's possible to use level3 BLAS, please let me know what I can help with that! but some constraint will still emerge such as memory shortage bringing up GC issue.

var maxMargin = Double.NegativeInfinity

val margins = new Array[Double](numClasses)
val featureStdArray = new Array[Double](features.size)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest change the featureStdArray as Aggregator class member, so that avoid each update allocate a new temporary array.

margins(j) += localCoefficients(index * numClasses + j) * stdValue
j += 1
}
featureStdArray(index) = value / localFeaturesStd(index)
Copy link
Contributor

@WeichenXu123 WeichenXu123 Aug 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here why don't deal with the case localFeaturesStd(index) == 0.0 ?
I remember other place it deal with such case, such as:

featureStdArray(index) = if (localFeaturesStd(index) != 0.0) value / localFeaturesStd(index) else 0.0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems to be a bug, I send a PR to fix this #18896

@WeichenXu123
Copy link
Contributor

I am also interested in implementation by level-3 BLAS. Can you post a design doc first?

@HyukjinKwon
Copy link
Member

gentle ping @VinceShieh for @WeichenXu123's comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants