Skip to content

Commit e9e418e

Browse files
author
DB Tsai
committed
Update
1 parent 7381521 commit e9e418e

File tree

1 file changed

+31
-26
lines changed

1 file changed

+31
-26
lines changed

docs/mllib-optimization.md

Lines changed: 31 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -128,24 +128,19 @@ is sampled, i.e. `$|S|=$ miniBatchFraction $\cdot n = 1$`, then the algorithm is
128128
standard SGD. In that case, the step direction depends from the uniformly random sampling of the
129129
point.
130130

131-
### Limited-memory BFGS
132-
[Limited-memory BFGS (L-BFGS)](http://en.wikipedia.org/wiki/Limited-memory_BFGS) is an optimization
131+
### L-BFGS
132+
[L-BFGS](http://en.wikipedia.org/wiki/Limited-memory_BFGS) is an optimization
133133
algorithm in the family of quasi-Newton methods to solve the optimization problems of the form
134-
`$\min_{\wv \in\R^d} \; f(\wv)$`. The L-BFGS approximates the objective function locally as a quadratic
135-
without evaluating the second partial derivatives of the objective function to construct the
134+
`$\min_{\wv \in\R^d} \; f(\wv)$`. The L-BFGS method approximates the objective function locally as a
135+
quadratic without evaluating the second partial derivatives of the objective function to construct the
136136
Hessian matrix. The Hessian matrix is approximated by previous gradient evaluations, so there is no
137137
vertical scalability issue (the number of training features) when computing the Hessian matrix
138-
explicitly in Newton method. As a result, L-BFGS often achieves rapider convergence compared with
138+
explicitly in Newton's method. As a result, L-BFGS often achieves rapider convergence compared with
139139
other first-order optimization.
140140

141-
Since the Hessian is constructed approximately from previous gradient evaluations, the objective
142-
function can not be changed during the optimization process. As a result, Stochastic L-BFGS will
143-
not work naively by just using miniBatch; therefore, we don't provide this until we have better
144-
understanding.
145-
146141
## Implementation in MLlib
147142

148-
### Gradient descent and Stochastic gradient descent
143+
### Gradient descent and stochastic gradient descent
149144
Gradient descent methods including stochastic subgradient descent (SGD) as
150145
included as a low-level primitive in `MLlib`, upon which various ML algorithms
151146
are developed, see the
@@ -182,11 +177,11 @@ Available algorithms for gradient descent:
182177
L-BFGS is currently only a low-level optimization primitive in `MLlib`. If you want to use L-BFGS in various
183178
ML algorithms such as Linear Regression, and Logistic Regression, you have to pass the gradient of objective
184179
function, and updater into optimizer yourself instead of using the training APIs like
185-
[LogisticRegression.LogisticRegressionWithSGD](api/mllib/index.html#org.apache.spark.mllib.classification.LogisticRegression).
180+
[LogisticRegression.LogisticRegressionWithSGD](api/mllib/index.html#org.apache.spark.mllib.classification.LogisticRegressionWithSGD).
186181
See the example below. It will be addressed in the next release.
187182

188183
The L1 regularization by using
189-
[Updater.L1Updater](api/mllib/index.html#org.apache.spark.mllib.optimization.Updater) will not work since the
184+
[L1Updater](api/mllib/index.html#org.apache.spark.mllib.optimization.L1Updater) will not work since the
190185
soft-thresholding logic in L1Updater is designed for gradient descent.
191186

192187
The L-BFGS method
@@ -198,17 +193,17 @@ being optimized, i.e., with respect to a single training example, at the
198193
current parameter value. MLlib includes gradient classes for common loss
199194
functions, e.g., hinge, logistic, least-squares. The gradient class takes as
200195
input a training example, its label, and the current parameter value.
201-
* `updater` is a class originally designed for gradient decent which computes
202-
the actual gradient descent step. However, we're able to take the gradient and
203-
loss of objective function of regularization for L-BFGS by ignoring the part of logic
204-
only for gradient decent such as adaptive step size stuff. We will refactorize
205-
this into regularizer to replace updater to separate the logic between
206-
regularization and step update later.
196+
* `updater` is a class that computes the gradient and loss of objective function
197+
of the regularization part for L-BFGS. MLlib includes updaters for cases without
198+
regularization, as well as L2 regularizer. Note that L1 regularizer doesn't work
199+
for L-BFGS. See the developer's note.
207200
* `numCorrections` is the number of corrections used in the L-BFGS update. 10 is
208201
recommended.
209202
* `maxNumIterations` is the maximal number of iterations that L-BFGS can be run.
210203
* `regParam` is the regularization parameter when using regularization.
211-
* `return` A tuple containing two elements. The first element is a column matrix
204+
205+
206+
The `return` is a tuple containing two elements. The first element is a column matrix
212207
containing weights for every feature, and the second element is an array containing
213208
the loss computed for every iteration.
214209

@@ -220,7 +215,6 @@ import org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
220215
import org.apache.spark.mllib.linalg.Vectors
221216
import org.apache.spark.mllib.util.MLUtils
222217
import org.apache.spark.mllib.classification.LogisticRegressionModel
223-
import breeze.linalg.{DenseVector => BDV}
224218

225219
val data = MLUtils.loadLibSVMFile(sc, "mllib/data/sample_libsvm_data.txt")
226220
val numFeatures = data.take(1)(0).features.size
@@ -229,10 +223,7 @@ val numFeatures = data.take(1)(0).features.size
229223
val splits = data.randomSplit(Array(0.6, 0.4), seed = 11L)
230224

231225
// Prepend 1 into the training data as intercept.
232-
val training = splits(0).map(x =>
233-
(x.label, Vectors.fromBreeze(
234-
BDV.vertcat(BDV.ones[Double](1), x.features.toBreeze.toDenseVector)))
235-
).cache()
226+
val training = splits(0).map(x => (x.label, MLUtils.appendBias(x.features))).cache()
236227

237228
val test = splits(1)
238229

@@ -273,4 +264,18 @@ val auROC = metrics.areaUnderROC()
273264
println("Loss of each step in training process")
274265
loss.foreach(println)
275266
println("Area under ROC = " + auROC)
276-
{% endhighlight %}
267+
{% endhighlight %}
268+
269+
#### Developer's note
270+
271+
Since the Hessian is constructed approximately from previous gradient evaluations,
272+
the objective function can not be changed during the optimization process.
273+
As a result, Stochastic L-BFGS will not work naively by just using miniBatch;
274+
therefore, we don't provide this until we have better understanding.
275+
276+
* `updater` is a class originally designed for gradient decent which computes
277+
the actual gradient descent step. However, we're able to take the gradient and
278+
loss of objective function of regularization for L-BFGS by ignoring the part of logic
279+
only for gradient decent such as adaptive step size stuff. We will refactorize
280+
this into regularizer to replace updater to separate the logic between
281+
regularization and step update later.

0 commit comments

Comments
 (0)