-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[WIP][SPARK-17134][ML] Use level 2 BLAS operations in LogisticAggregator #17894
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Multinomial logistic regression uses LogisticAggregator class for gradient updates. This PR refactors MLOR to use level 2 BLAS operations for the updates. Signed-off-by: VinceShieh <[email protected]>
dbtsai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have any benchmark? I wonder how much speed up with this PR? Thank you for working on this.
|
Test build #76558 has finished for PR 17894 at commit
|
|
|
||
| import breeze.linalg.{DenseVector => BDV} | ||
| import breeze.optimize.{CachedDiffFunction, DiffFunction, LBFGS => BreezeLBFGS, LBFGSB => BreezeLBFGSB, OWLQN => BreezeOWLQN} | ||
| import com.github.fommil.netlib.BLAS.{getInstance => blas} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it better to use MLlib BLAS interface?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MLLib BLAS doesnt have ger support, we might, of course, add an API support in MLLib Blas for this issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add it in spark ml? Thanks.
| var maxMargin = Double.NegativeInfinity | ||
|
|
||
| val margins = new Array[Double](numClasses) | ||
| val featureStdArray = new Array[Double](features.size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will densify the sparse features. We should handle them differently. For sparse, we don't need to do level 2 BLAS which will not help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. Still, we will try benchmark on the sparse dataset, if such change hurt the performance for sparse data, we will bypass this change for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my company, we have use-case of handing very sparse input with around 20 non-zero features with millions of total feature space. This implementation will break in this scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest change the featureStdArray as Aggregator class member, so that avoid each update allocate a new temporary array.
|
|
||
| import breeze.linalg.{DenseVector => BDV} | ||
| import breeze.optimize.{CachedDiffFunction, DiffFunction, LBFGS => BreezeLBFGS, LBFGSB => BreezeLBFGSB, OWLQN => BreezeOWLQN} | ||
| import com.github.fommil.netlib.BLAS.{getInstance => blas} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
I'm not sure how much acceleration we can get from Level 2 BLAS. For benchmark, we also would need to evaluate the performance for sparse data. |
|
@hhbyyh performance testing is ongoing, thanks! |
|
Would you mind adding |
|
@sethah Sorry for the late response. Setting as WIP. We have performance data for dense features, data for the sparse feature will be ready soon. thanks. |
|
@VinceShieh Thanks for posting your results. You tested these on datasets with only 100 samples correct? That's probably not a representative use case of a normal workload... Also, how many classes (i.e. I've actually been looking at using level 3 BLAS operations in the logistic aggregator, and initial results showed close to 10x speedups in some cases. I am holding off submitting any code because it would require a fairly significant refactoring of the code, which will be made much easier after #17094 is merged. Using level 2 BLAS is a less invasive change, but the test results you show provide rather small speedups. My preference is to wait a bit and submit a change that incorporates level 3 BLAS in logistic regression. We should get @dbtsai's opinion too. |
|
@sethah yes, we only take 100 samples and trained with 3 iterations, numClasss is 20 of our test dataset for single node testing. |
| var maxMargin = Double.NegativeInfinity | ||
|
|
||
| val margins = new Array[Double](numClasses) | ||
| val featureStdArray = new Array[Double](features.size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest change the featureStdArray as Aggregator class member, so that avoid each update allocate a new temporary array.
| margins(j) += localCoefficients(index * numClasses + j) * stdValue | ||
| j += 1 | ||
| } | ||
| featureStdArray(index) = value / localFeaturesStd(index) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here why don't deal with the case localFeaturesStd(index) == 0.0 ?
I remember other place it deal with such case, such as:
featureStdArray(index) = if (localFeaturesStd(index) != 0.0) value / localFeaturesStd(index) else 0.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems to be a bug, I send a PR to fix this #18896
|
I am also interested in implementation by level-3 BLAS. Can you post a design doc first? |
|
gentle ping @VinceShieh for @WeichenXu123's comment. |




What changes were proposed in this pull request?
Multinomial logistic regression uses LogisticAggregator class for gradient updates.
This PR refactors MLOR to use level 2 BLAS operations for the updates
How was this patch tested?
Existing test would do
Signed-off-by: VinceShieh [email protected]