Skip to content

Commit 7381521

Browse files
author
DB Tsai
committed
L-BFGS Documentation
1 parent bd67551 commit 7381521

File tree

1 file changed

+111
-0
lines changed

1 file changed

+111
-0
lines changed

docs/mllib-optimization.md

Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -128,10 +128,24 @@ is sampled, i.e. `$|S|=$ miniBatchFraction $\cdot n = 1$`, then the algorithm is
128128
standard SGD. In that case, the step direction depends from the uniformly random sampling of the
129129
point.
130130

131+
### Limited-memory BFGS
132+
[Limited-memory BFGS (L-BFGS)](http://en.wikipedia.org/wiki/Limited-memory_BFGS) is an optimization
133+
algorithm in the family of quasi-Newton methods to solve the optimization problems of the form
134+
`$\min_{\wv \in\R^d} \; f(\wv)$`. The L-BFGS approximates the objective function locally as a quadratic
135+
without evaluating the second partial derivatives of the objective function to construct the
136+
Hessian matrix. The Hessian matrix is approximated by previous gradient evaluations, so there is no
137+
vertical scalability issue (the number of training features) when computing the Hessian matrix
138+
explicitly in Newton method. As a result, L-BFGS often achieves rapider convergence compared with
139+
other first-order optimization.
131140

141+
Since the Hessian is constructed approximately from previous gradient evaluations, the objective
142+
function can not be changed during the optimization process. As a result, Stochastic L-BFGS will
143+
not work naively by just using miniBatch; therefore, we don't provide this until we have better
144+
understanding.
132145

133146
## Implementation in MLlib
134147

148+
### Gradient descent and Stochastic gradient descent
135149
Gradient descent methods including stochastic subgradient descent (SGD) as
136150
included as a low-level primitive in `MLlib`, upon which various ML algorithms
137151
are developed, see the
@@ -163,3 +177,100 @@ each iteration, to compute the gradient direction.
163177
Available algorithms for gradient descent:
164178

165179
* [GradientDescent.runMiniBatchSGD](api/mllib/index.html#org.apache.spark.mllib.optimization.GradientDescent)
180+
181+
### Limited-memory BFGS
182+
L-BFGS is currently only a low-level optimization primitive in `MLlib`. If you want to use L-BFGS in various
183+
ML algorithms such as Linear Regression, and Logistic Regression, you have to pass the gradient of objective
184+
function, and updater into optimizer yourself instead of using the training APIs like
185+
[LogisticRegression.LogisticRegressionWithSGD](api/mllib/index.html#org.apache.spark.mllib.classification.LogisticRegression).
186+
See the example below. It will be addressed in the next release.
187+
188+
The L1 regularization by using
189+
[Updater.L1Updater](api/mllib/index.html#org.apache.spark.mllib.optimization.Updater) will not work since the
190+
soft-thresholding logic in L1Updater is designed for gradient descent.
191+
192+
The L-BFGS method
193+
[LBFGS.runLBFGS](api/scala/index.html#org.apache.spark.mllib.optimization.LBFGS)
194+
has the following parameters:
195+
196+
* `gradient` is a class that computes the gradient of the objective function
197+
being optimized, i.e., with respect to a single training example, at the
198+
current parameter value. MLlib includes gradient classes for common loss
199+
functions, e.g., hinge, logistic, least-squares. The gradient class takes as
200+
input a training example, its label, and the current parameter value.
201+
* `updater` is a class originally designed for gradient decent which computes
202+
the actual gradient descent step. However, we're able to take the gradient and
203+
loss of objective function of regularization for L-BFGS by ignoring the part of logic
204+
only for gradient decent such as adaptive step size stuff. We will refactorize
205+
this into regularizer to replace updater to separate the logic between
206+
regularization and step update later.
207+
* `numCorrections` is the number of corrections used in the L-BFGS update. 10 is
208+
recommended.
209+
* `maxNumIterations` is the maximal number of iterations that L-BFGS can be run.
210+
* `regParam` is the regularization parameter when using regularization.
211+
* `return` A tuple containing two elements. The first element is a column matrix
212+
containing weights for every feature, and the second element is an array containing
213+
the loss computed for every iteration.
214+
215+
Here is an example to train binary logistic regression with L2 regularization using
216+
L-BFGS optimizer.
217+
{% highlight scala %}
218+
import org.apache.spark.SparkContext
219+
import org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
220+
import org.apache.spark.mllib.linalg.Vectors
221+
import org.apache.spark.mllib.util.MLUtils
222+
import org.apache.spark.mllib.classification.LogisticRegressionModel
223+
import breeze.linalg.{DenseVector => BDV}
224+
225+
val data = MLUtils.loadLibSVMFile(sc, "mllib/data/sample_libsvm_data.txt")
226+
val numFeatures = data.take(1)(0).features.size
227+
228+
// Split data into training (60%) and test (40%).
229+
val splits = data.randomSplit(Array(0.6, 0.4), seed = 11L)
230+
231+
// Prepend 1 into the training data as intercept.
232+
val training = splits(0).map(x =>
233+
(x.label, Vectors.fromBreeze(
234+
BDV.vertcat(BDV.ones[Double](1), x.features.toBreeze.toDenseVector)))
235+
).cache()
236+
237+
val test = splits(1)
238+
239+
// Run training algorithm to build the model
240+
val numCorrections = 10
241+
val convergenceTol = 1e-4
242+
val maxNumIterations = 20
243+
val regParam = 0.1
244+
val initialWeightsWithIntercept = Vectors.dense(new Array[Double](numFeatures + 1))
245+
246+
val (weightsWithIntercept, loss) = LBFGS.runLBFGS(
247+
training,
248+
new LogisticGradient(),
249+
new SquaredL2Updater(),
250+
numCorrections,
251+
convergenceTol,
252+
maxNumIterations,
253+
regParam,
254+
initialWeightsWithIntercept)
255+
256+
val model = new LogisticRegressionModel(
257+
Vectors.dense(weightsWithIntercept.toArray.slice(1, weightsWithIntercept.size)),
258+
weightsWithIntercept(0))
259+
260+
// Clear the default threshold.
261+
model.clearThreshold()
262+
263+
// Compute raw scores on the test set.
264+
val scoreAndLabels = test.map { point =>
265+
val score = model.predict(point.features)
266+
(score, point.label)
267+
}
268+
269+
// Get evaluation metrics.
270+
val metrics = new BinaryClassificationMetrics(scoreAndLabels)
271+
val auROC = metrics.areaUnderROC()
272+
273+
println("Loss of each step in training process")
274+
loss.foreach(println)
275+
println("Area under ROC = " + auROC)
276+
{% endhighlight %}

0 commit comments

Comments
 (0)