Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ import scala.collection.mutable

import breeze.linalg.{DenseVector => BDV}
import breeze.optimize.{CachedDiffFunction, DiffFunction, LBFGS => BreezeLBFGS, LBFGSB => BreezeLBFGSB, OWLQN => BreezeOWLQN}
import com.github.fommil.netlib.BLAS.{getInstance => blas}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it better to use MLlib BLAS interface?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MLLib BLAS doesnt have ger support, we might, of course, add an API support in MLLib Blas for this issue

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add it in spark ml? Thanks.

import org.apache.hadoop.fs.Path

import org.apache.spark.SparkException
Expand Down Expand Up @@ -1722,25 +1723,22 @@ private class LogisticAggregator(
var maxMargin = Double.NegativeInfinity

val margins = new Array[Double](numClasses)
val featureStdArray = new Array[Double](features.size)
Copy link
Member

@dbtsai dbtsai May 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will densify the sparse features. We should handle them differently. For sparse, we don't need to do level 2 BLAS which will not help.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Still, we will try benchmark on the sparse dataset, if such change hurt the performance for sparse data, we will bypass this change for it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my company, we have use-case of handing very sparse input with around 20 non-zero features with millions of total feature space. This implementation will break in this scenario.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest change the featureStdArray as Aggregator class member, so that avoid each update allocate a new temporary array.

features.foreachActive { (index, value) =>
val stdValue = value / localFeaturesStd(index)
var j = 0
while (j < numClasses) {
margins(j) += localCoefficients(index * numClasses + j) * stdValue
j += 1
}
featureStdArray(index) = value / localFeaturesStd(index)
Copy link
Contributor

@WeichenXu123 WeichenXu123 Aug 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here why don't deal with the case localFeaturesStd(index) == 0.0 ?
I remember other place it deal with such case, such as:

featureStdArray(index) = if (localFeaturesStd(index) != 0.0) value / localFeaturesStd(index) else 0.0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems to be a bug, I send a PR to fix this #18896

}
var i = 0
while (i < numClasses) {
if (fitIntercept) {

blas.dgemv("N", numCoefficientSets, numFeatures, 1.0, coefficientsArray,
numCoefficientSets, featureStdArray, 1, 1.0, margins, 1)
if (fitIntercept) {
var i = 0
while (i < numClasses) {
margins(i) += localCoefficients(numClasses * numFeatures + i)
i += 1
}
if (i == label.toInt) marginOfLabel = margins(i)
if (margins(i) > maxMargin) {
maxMargin = margins(i)
}
i += 1
}
marginOfLabel = margins(label.toInt)
maxMargin = margins.max

/**
* When maxMargin is greater than 0, the original formula could cause overflow.
Expand All @@ -1764,17 +1762,10 @@ private class LogisticAggregator(
margins.indices.foreach { i =>
multipliers(i) = multipliers(i) / sum - (if (label == i) 1.0 else 0.0)
}
features.foreachActive { (index, value) =>
if (localFeaturesStd(index) != 0.0 && value != 0.0) {
val stdValue = value / localFeaturesStd(index)
var j = 0
while (j < numClasses) {
localGradientArray(index * numClasses + j) +=
weight * multipliers(j) * stdValue
j += 1
}
}
}

blas.dger(numCoefficientSets, numFeatures, weight, multipliers,
1, featureStdArray, 1, localGradientArray, numCoefficientSets)

if (fitIntercept) {
var i = 0
while (i < numClasses) {
Expand Down