@@ -128,24 +128,19 @@ is sampled, i.e. `$|S|=$ miniBatchFraction $\cdot n = 1$`, then the algorithm is
128128standard SGD. In that case, the step direction depends from the uniformly random sampling of the
129129point.
130130
131- ### Limited-memory BFGS
132- [ Limited-memory BFGS ( L-BFGS) ] ( http://en.wikipedia.org/wiki/Limited-memory_BFGS ) is an optimization
131+ ### L- BFGS
132+ [ L-BFGS] ( http://en.wikipedia.org/wiki/Limited-memory_BFGS ) is an optimization
133133algorithm in the family of quasi-Newton methods to solve the optimization problems of the form
134- ` $\min_{\wv \in\R^d} \; f(\wv)$ ` . The L-BFGS approximates the objective function locally as a quadratic
135- without evaluating the second partial derivatives of the objective function to construct the
134+ ` $\min_{\wv \in\R^d} \; f(\wv)$ ` . The L-BFGS method approximates the objective function locally as a
135+ quadratic without evaluating the second partial derivatives of the objective function to construct the
136136Hessian matrix. The Hessian matrix is approximated by previous gradient evaluations, so there is no
137137vertical scalability issue (the number of training features) when computing the Hessian matrix
138- explicitly in Newton method. As a result, L-BFGS often achieves rapider convergence compared with
138+ explicitly in Newton's method. As a result, L-BFGS often achieves rapider convergence compared with
139139other first-order optimization.
140140
141- Since the Hessian is constructed approximately from previous gradient evaluations, the objective
142- function can not be changed during the optimization process. As a result, Stochastic L-BFGS will
143- not work naively by just using miniBatch; therefore, we don't provide this until we have better
144- understanding.
145-
146141## Implementation in MLlib
147142
148- ### Gradient descent and Stochastic gradient descent
143+ ### Gradient descent and stochastic gradient descent
149144Gradient descent methods including stochastic subgradient descent (SGD) as
150145included as a low-level primitive in ` MLlib ` , upon which various ML algorithms
151146are developed, see the
@@ -182,11 +177,11 @@ Available algorithms for gradient descent:
182177L-BFGS is currently only a low-level optimization primitive in ` MLlib ` . If you want to use L-BFGS in various
183178ML algorithms such as Linear Regression, and Logistic Regression, you have to pass the gradient of objective
184179function, and updater into optimizer yourself instead of using the training APIs like
185- [ LogisticRegression.LogisticRegressionWithSGD] ( api/mllib/index.html#org.apache.spark.mllib.classification.LogisticRegression ) .
180+ [ LogisticRegression.LogisticRegressionWithSGD] ( api/mllib/index.html#org.apache.spark.mllib.classification.LogisticRegressionWithSGD ) .
186181See the example below. It will be addressed in the next release.
187182
188183The L1 regularization by using
189- [ Updater. L1Updater] ( api/mllib/index.html#org.apache.spark.mllib.optimization.Updater ) will not work since the
184+ [ L1Updater] ( api/mllib/index.html#org.apache.spark.mllib.optimization.L1Updater ) will not work since the
190185soft-thresholding logic in L1Updater is designed for gradient descent.
191186
192187The L-BFGS method
@@ -198,17 +193,17 @@ being optimized, i.e., with respect to a single training example, at the
198193current parameter value. MLlib includes gradient classes for common loss
199194functions, e.g., hinge, logistic, least-squares. The gradient class takes as
200195input a training example, its label, and the current parameter value.
201- * ` updater ` is a class originally designed for gradient decent which computes
202- the actual gradient descent step. However, we're able to take the gradient and
203- loss of objective function of regularization for L-BFGS by ignoring the part of logic
204- only for gradient decent such as adaptive step size stuff. We will refactorize
205- this into regularizer to replace updater to separate the logic between
206- regularization and step update later.
196+ * ` updater ` is a class that computes the gradient and loss of objective function
197+ of the regularization part for L-BFGS. MLlib includes updaters for cases without
198+ regularization, as well as L2 regularizer. Note that L1 regularizer doesn't work
199+ for L-BFGS. See the developer's note.
207200* ` numCorrections ` is the number of corrections used in the L-BFGS update. 10 is
208201recommended.
209202* ` maxNumIterations ` is the maximal number of iterations that L-BFGS can be run.
210203* ` regParam ` is the regularization parameter when using regularization.
211- * ` return ` A tuple containing two elements. The first element is a column matrix
204+
205+
206+ The ` return ` is a tuple containing two elements. The first element is a column matrix
212207containing weights for every feature, and the second element is an array containing
213208the loss computed for every iteration.
214209
@@ -220,7 +215,6 @@ import org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
220215import org.apache.spark.mllib.linalg.Vectors
221216import org.apache.spark.mllib.util.MLUtils
222217import org.apache.spark.mllib.classification.LogisticRegressionModel
223- import breeze.linalg.{DenseVector => BDV}
224218
225219val data = MLUtils.loadLibSVMFile(sc, "mllib/data/sample_libsvm_data.txt")
226220val numFeatures = data.take(1)(0).features.size
@@ -229,10 +223,7 @@ val numFeatures = data.take(1)(0).features.size
229223val splits = data.randomSplit(Array(0.6, 0.4), seed = 11L)
230224
231225// Prepend 1 into the training data as intercept.
232- val training = splits(0).map(x =>
233- (x.label, Vectors.fromBreeze(
234- BDV.vertcat(BDV.ones[ Double] ( 1 ) , x.features.toBreeze.toDenseVector)))
235- ).cache()
226+ val training = splits(0).map(x => (x.label, MLUtils.appendBias(x.features))).cache()
236227
237228val test = splits(1)
238229
@@ -273,4 +264,18 @@ val auROC = metrics.areaUnderROC()
273264println("Loss of each step in training process")
274265loss.foreach(println)
275266println("Area under ROC = " + auROC)
276- {% endhighlight %}
267+ {% endhighlight %}
268+
269+ #### Developer's note
270+
271+ Since the Hessian is constructed approximately from previous gradient evaluations,
272+ the objective function can not be changed during the optimization process.
273+ As a result, Stochastic L-BFGS will not work naively by just using miniBatch;
274+ therefore, we don't provide this until we have better understanding.
275+
276+ * ` updater ` is a class originally designed for gradient decent which computes
277+ the actual gradient descent step. However, we're able to take the gradient and
278+ loss of objective function of regularization for L-BFGS by ignoring the part of logic
279+ only for gradient decent such as adaptive step size stuff. We will refactorize
280+ this into regularizer to replace updater to separate the logic between
281+ regularization and step update later.
0 commit comments