@@ -128,10 +128,24 @@ is sampled, i.e. `$|S|=$ miniBatchFraction $\cdot n = 1$`, then the algorithm is
128128standard SGD. In that case, the step direction depends from the uniformly random sampling of the
129129point.
130130
131+ ### Limited-memory BFGS
132+ [ Limited-memory BFGS (L-BFGS)] ( http://en.wikipedia.org/wiki/Limited-memory_BFGS ) is an optimization
133+ algorithm in the family of quasi-Newton methods to solve the optimization problems of the form
134+ ` $\min_{\wv \in\R^d} \; f(\wv)$ ` . The L-BFGS approximates the objective function locally as a quadratic
135+ without evaluating the second partial derivatives of the objective function to construct the
136+ Hessian matrix. The Hessian matrix is approximated by previous gradient evaluations, so there is no
137+ vertical scalability issue (the number of training features) when computing the Hessian matrix
138+ explicitly in Newton method. As a result, L-BFGS often achieves rapider convergence compared with
139+ other first-order optimization.
131140
141+ Since the Hessian is constructed approximately from previous gradient evaluations, the objective
142+ function can not be changed during the optimization process. As a result, Stochastic L-BFGS will
143+ not work naively by just using miniBatch; therefore, we don't provide this until we have better
144+ understanding.
132145
133146## Implementation in MLlib
134147
148+ ### Gradient descent and Stochastic gradient descent
135149Gradient descent methods including stochastic subgradient descent (SGD) as
136150included as a low-level primitive in ` MLlib ` , upon which various ML algorithms
137151are developed, see the
@@ -163,3 +177,100 @@ each iteration, to compute the gradient direction.
163177Available algorithms for gradient descent:
164178
165179* [ GradientDescent.runMiniBatchSGD] ( api/mllib/index.html#org.apache.spark.mllib.optimization.GradientDescent )
180+
181+ ### Limited-memory BFGS
182+ L-BFGS is currently only a low-level optimization primitive in ` MLlib ` . If you want to use L-BFGS in various
183+ ML algorithms such as Linear Regression, and Logistic Regression, you have to pass the gradient of objective
184+ function, and updater into optimizer yourself instead of using the training APIs like
185+ [ LogisticRegression.LogisticRegressionWithSGD] ( api/mllib/index.html#org.apache.spark.mllib.classification.LogisticRegression ) .
186+ See the example below. It will be addressed in the next release.
187+
188+ The L1 regularization by using
189+ [ Updater.L1Updater] ( api/mllib/index.html#org.apache.spark.mllib.optimization.Updater ) will not work since the
190+ soft-thresholding logic in L1Updater is designed for gradient descent.
191+
192+ The L-BFGS method
193+ [ LBFGS.runLBFGS] ( api/scala/index.html#org.apache.spark.mllib.optimization.LBFGS )
194+ has the following parameters:
195+
196+ * ` gradient ` is a class that computes the gradient of the objective function
197+ being optimized, i.e., with respect to a single training example, at the
198+ current parameter value. MLlib includes gradient classes for common loss
199+ functions, e.g., hinge, logistic, least-squares. The gradient class takes as
200+ input a training example, its label, and the current parameter value.
201+ * ` updater ` is a class originally designed for gradient decent which computes
202+ the actual gradient descent step. However, we're able to take the gradient and
203+ loss of objective function of regularization for L-BFGS by ignoring the part of logic
204+ only for gradient decent such as adaptive step size stuff. We will refactorize
205+ this into regularizer to replace updater to separate the logic between
206+ regularization and step update later.
207+ * ` numCorrections ` is the number of corrections used in the L-BFGS update. 10 is
208+ recommended.
209+ * ` maxNumIterations ` is the maximal number of iterations that L-BFGS can be run.
210+ * ` regParam ` is the regularization parameter when using regularization.
211+ * ` return ` A tuple containing two elements. The first element is a column matrix
212+ containing weights for every feature, and the second element is an array containing
213+ the loss computed for every iteration.
214+
215+ Here is an example to train binary logistic regression with L2 regularization using
216+ L-BFGS optimizer.
217+ {% highlight scala %}
218+ import org.apache.spark.SparkContext
219+ import org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
220+ import org.apache.spark.mllib.linalg.Vectors
221+ import org.apache.spark.mllib.util.MLUtils
222+ import org.apache.spark.mllib.classification.LogisticRegressionModel
223+ import breeze.linalg.{DenseVector => BDV}
224+
225+ val data = MLUtils.loadLibSVMFile(sc, "mllib/data/sample_libsvm_data.txt")
226+ val numFeatures = data.take(1)(0).features.size
227+
228+ // Split data into training (60%) and test (40%).
229+ val splits = data.randomSplit(Array(0.6, 0.4), seed = 11L)
230+
231+ // Prepend 1 into the training data as intercept.
232+ val training = splits(0).map(x =>
233+ (x.label, Vectors.fromBreeze(
234+ BDV.vertcat(BDV.ones[ Double] ( 1 ) , x.features.toBreeze.toDenseVector)))
235+ ).cache()
236+
237+ val test = splits(1)
238+
239+ // Run training algorithm to build the model
240+ val numCorrections = 10
241+ val convergenceTol = 1e-4
242+ val maxNumIterations = 20
243+ val regParam = 0.1
244+ val initialWeightsWithIntercept = Vectors.dense(new Array[ Double] (numFeatures + 1))
245+
246+ val (weightsWithIntercept, loss) = LBFGS.runLBFGS(
247+ training,
248+ new LogisticGradient(),
249+ new SquaredL2Updater(),
250+ numCorrections,
251+ convergenceTol,
252+ maxNumIterations,
253+ regParam,
254+ initialWeightsWithIntercept)
255+
256+ val model = new LogisticRegressionModel(
257+ Vectors.dense(weightsWithIntercept.toArray.slice(1, weightsWithIntercept.size)),
258+ weightsWithIntercept(0))
259+
260+ // Clear the default threshold.
261+ model.clearThreshold()
262+
263+ // Compute raw scores on the test set.
264+ val scoreAndLabels = test.map { point =>
265+ val score = model.predict(point.features)
266+ (score, point.label)
267+ }
268+
269+ // Get evaluation metrics.
270+ val metrics = new BinaryClassificationMetrics(scoreAndLabels)
271+ val auROC = metrics.areaUnderROC()
272+
273+ println("Loss of each step in training process")
274+ loss.foreach(println)
275+ println("Area under ROC = " + auROC)
276+ {% endhighlight %}
0 commit comments