[WIP][SPARK-17134][ML] Use level 2 BLAS operations in LogisticAggregator #17894

VinceShieh · 2017-05-08T02:18:17Z

What changes were proposed in this pull request?

Multinomial logistic regression uses LogisticAggregator class for gradient updates.
This PR refactors MLOR to use level 2 BLAS operations for the updates

How was this patch tested?

Existing test would do

Signed-off-by: VinceShieh [email protected]

Multinomial logistic regression uses LogisticAggregator class for gradient updates. This PR refactors MLOR to use level 2 BLAS operations for the updates. Signed-off-by: VinceShieh <[email protected]>

dbtsai

Do you have any benchmark? I wonder how much speed up with this PR? Thank you for working on this.

SparkQA · 2017-05-08T03:20:06Z

Test build #76558 has finished for PR 17894 at commit b4fd733.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2017-05-08T06:51:55Z

mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala


 import breeze.linalg.{DenseVector => BDV}
 import breeze.optimize.{CachedDiffFunction, DiffFunction, LBFGS => BreezeLBFGS, LBFGSB => BreezeLBFGSB, OWLQN => BreezeOWLQN}
+import com.github.fommil.netlib.BLAS.{getInstance => blas}


Is it better to use MLlib BLAS interface?

We have blas interface in https://github.com/apache/spark/blob/master/mllib-local/src/main/scala/org/apache/spark/ml/linalg/BLAS.scala

MLLib BLAS doesnt have ger support, we might, of course, add an API support in MLLib Blas for this issue

Can you add it in spark ml? Thanks.

dbtsai · 2017-05-08T18:03:39Z

mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala

    var maxMargin = Double.NegativeInfinity

    val margins = new Array[Double](numClasses)
+    val featureStdArray = new Array[Double](features.size)


This will densify the sparse features. We should handle them differently. For sparse, we don't need to do level 2 BLAS which will not help.

Agree. Still, we will try benchmark on the sparse dataset, if such change hurt the performance for sparse data, we will bypass this change for it.

In my company, we have use-case of handing very sparse input with around 20 non-zero features with millions of total feature space. This implementation will break in this scenario.

I suggest change the featureStdArray as Aggregator class member, so that avoid each update allocate a new temporary array.

dbtsai · 2017-05-08T18:05:04Z

mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala


 import breeze.linalg.{DenseVector => BDV}
 import breeze.optimize.{CachedDiffFunction, DiffFunction, LBFGS => BreezeLBFGS, LBFGSB => BreezeLBFGSB, OWLQN => BreezeOWLQN}
+import com.github.fommil.netlib.BLAS.{getInstance => blas}


We have blas interface in https://github.com/apache/spark/blob/master/mllib-local/src/main/scala/org/apache/spark/ml/linalg/BLAS.scala

hhbyyh · 2017-05-08T19:18:53Z

I'm not sure how much acceleration we can get from Level 2 BLAS. For benchmark, we also would need to evaluate the performance for sparse data.

VinceShieh · 2017-05-09T06:31:59Z

@hhbyyh performance testing is ongoing, thanks!

sethah · 2017-05-11T16:49:23Z

Would you mind adding [WIP] to the title? Without even a benchmark for dense features, this is definitely a work-in-progress.

VinceShieh · 2017-05-17T02:33:52Z

@sethah Sorry for the late response. Setting as WIP. We have performance data for dense features, data for the sparse feature will be ready soon. thanks.

VinceShieh · 2017-06-01T14:57:59Z

sorry for late update!
we tested on this PR against the current implementation with both dense and sparse(0.95 sparsity):

The test on single machine was run on 100 samples on each feature set scale, we can get performance gain (less training time) on both dense and sparse dataset, on distributed case, we can also achieve a good performance with fine tuning (num_cores, data partitions, etc..), but this change inevitably put more constraint on memory and will bring up GC problem if no enough memory is available on worker node, for sparse dataset on distributed cluster, we are still unable to get a good result, so maybe we should bypass this change for sparse case, but before making such change, I
d like to hear your thoughts on current test result we have, maybe we can make it a better PR with your input :)

Thanks.

VinceShieh · 2017-06-01T15:08:58Z

Forgot to mention, we observed a nearly 2x performance gain with the help of nativeBLAS- MKL, without a fine tuning, so if we can also make F2J version run faster in distributed cluster than the current design, it would truly be a good PR for community. :)

sethah · 2017-06-01T18:26:12Z

@VinceShieh Thanks for posting your results. You tested these on datasets with only 100 samples correct? That's probably not a representative use case of a normal workload... Also, how many classes (i.e. numClasses) did you use?

I've actually been looking at using level 3 BLAS operations in the logistic aggregator, and initial results showed close to 10x speedups in some cases. I am holding off submitting any code because it would require a fairly significant refactoring of the code, which will be made much easier after #17094 is merged. Using level 2 BLAS is a less invasive change, but the test results you show provide rather small speedups.

My preference is to wait a bit and submit a change that incorporates level 3 BLAS in logistic regression. We should get @dbtsai's opinion too.

VinceShieh · 2017-06-02T01:31:47Z

@sethah yes, we only take 100 samples and trained with 3 iterations, numClasss is 20 of our test dataset for single node testing.
Yeah, I also believe it'd have a better result if it's possible to use level3 BLAS, please let me know what I can help with that! but some constraint will still emerge such as memory shortage bringing up GC issue.

WeichenXu123 · 2017-08-09T00:49:49Z

mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala

    var maxMargin = Double.NegativeInfinity

    val margins = new Array[Double](numClasses)
+    val featureStdArray = new Array[Double](features.size)


I suggest change the featureStdArray as Aggregator class member, so that avoid each update allocate a new temporary array.

WeichenXu123 · 2017-08-09T00:53:23Z

mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala

-        margins(j) += localCoefficients(index * numClasses + j) * stdValue
-        j += 1
-      }
+      featureStdArray(index) = value / localFeaturesStd(index)


Here why don't deal with the case localFeaturesStd(index) == 0.0 ?
I remember other place it deal with such case, such as:

featureStdArray(index) = if (localFeaturesStd(index) != 0.0) value / localFeaturesStd(index) else 0.0

it seems to be a bug, I send a PR to fix this #18896

WeichenXu123 · 2017-08-09T05:29:45Z

I am also interested in implementation by level-3 BLAS. Can you post a design doc first?

HyukjinKwon · 2018-07-16T03:07:53Z

gentle ping @VinceShieh for @WeichenXu123's comment.

[SPARK-17134][ML] Use level 2 BLAS operations in LogisticAggregator

b4fd733

Multinomial logistic regression uses LogisticAggregator class for gradient updates. This PR refactors MLOR to use level 2 BLAS operations for the updates. Signed-off-by: VinceShieh <[email protected]>

dbtsai reviewed May 8, 2017

View reviewed changes

viirya reviewed May 8, 2017

View reviewed changes

dbtsai reviewed May 8, 2017

View reviewed changes

VinceShieh changed the title ~~[SPARK-17134][ML] Use level 2 BLAS operations in LogisticAggregator~~ [WIP][SPARK-17134][ML] Use level 2 BLAS operations in LogisticAggregator May 17, 2017

WeichenXu123 reviewed Aug 9, 2017

View reviewed changes

HyukjinKwon mentioned this pull request Jul 16, 2018

[INFRA] Close stale PR #21781

Closed

asfgit closed this in 1a4fda8 Jul 19, 2018

[WIP][SPARK-17134][ML] Use level 2 BLAS operations in LogisticAggregator #17894

[WIP][SPARK-17134][ML] Use level 2 BLAS operations in LogisticAggregator #17894

Uh oh!

Conversation

VinceShieh commented May 8, 2017

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

dbtsai left a comment

Choose a reason for hiding this comment

Uh oh!

SparkQA commented May 8, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dbtsai May 8, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hhbyyh commented May 8, 2017

Uh oh!

VinceShieh commented May 9, 2017

Uh oh!

sethah commented May 11, 2017

Uh oh!

VinceShieh commented May 17, 2017

Uh oh!

VinceShieh commented Jun 1, 2017

Uh oh!

VinceShieh commented Jun 1, 2017

Uh oh!

sethah commented Jun 1, 2017

Uh oh!

VinceShieh commented Jun 2, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

WeichenXu123 Aug 9, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

WeichenXu123 commented Aug 9, 2017

Uh oh!

HyukjinKwon commented Jul 16, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

dbtsai May 8, 2017 •

edited

Loading

WeichenXu123 Aug 9, 2017 •

edited

Loading