@@ -25,9 +25,9 @@ title: MLlib - Optimization
2525
2626
2727
28- # Mathematical Description
28+ ## Mathematical Description
2929
30- ## (Sub) Gradient Descent
30+ ### Gradient descent
3131The simplest method to solve optimization problems of the form ` $\min_{\wv \in\R^d} \; f(\wv)$ `
3232is [ gradient descent] ( http://en.wikipedia.org/wiki/Gradient_descent ) .
3333Such first-order optimization methods (including gradient descent and stochastic variants
@@ -38,14 +38,14 @@ the direction of steepest descent, which is the negative of the derivative (call
3838[ gradient] ( http://en.wikipedia.org/wiki/Gradient ) ) of the function at the current point, i.e., at
3939the current parameter value.
4040If the objective function ` $f$ ` is not differentiable at all arguments, but still convex, then a
41- * subgradient *
41+ * sub-gradient *
4242is the natural generalization of the gradient, and assumes the role of the step direction.
43- In any case, computing a gradient or subgradient of ` $f$ ` is expensive --- it requires a full
43+ In any case, computing a gradient or sub-gradient of ` $f$ ` is expensive --- it requires a full
4444pass through the complete dataset, in order to compute the contributions from all loss terms.
4545
46- ## Stochastic (Sub)Gradient Descent (SGD)
46+ ### Stochastic gradient descent (SGD)
4747Optimization problems whose objective function ` $f$ ` is written as a sum are particularly
48- suitable to be solved using * stochastic subgradient descent (SGD)* .
48+ suitable to be solved using * stochastic gradient descent (SGD)* .
4949In our case, for the optimization formulations commonly used in <a
5050href="mllib-classification-regression.html">supervised machine learning</a >,
5151`\begin{equation}
@@ -98,7 +98,7 @@ For the L1-regularizer, the proximal operator is given by soft thresholding, as
9898[ L1Updater] ( api/mllib/index.html#org.apache.spark.mllib.optimization.L1Updater ) .
9999
100100
101- ## Update Schemes for Distributed SGD
101+ ### Update schemes for distributed SGD
102102The SGD implementation in
103103[ GradientDescent] ( api/mllib/index.html#org.apache.spark.mllib.optimization.GradientDescent ) uses
104104a simple (distributed) sampling of the data examples.
@@ -129,12 +129,12 @@ point.
129129
130130
131131
132- # Implementation in MLlib
132+ ## Implementation in MLlib
133133
134134Gradient descent methods including stochastic subgradient descent (SGD) as
135135included as a low-level primitive in ` MLlib ` , upon which various ML algorithms
136136are developed, see the
137- <a href =" mllib-classification-regression .html " >classification and regression </a >
137+ <a href =" mllib-linear-methods .html " >linear methods </a >
138138section for example.
139139
140140The SGD method
@@ -162,63 +162,3 @@ each iteration, to compute the gradient direction.
162162Available algorithms for gradient descent:
163163
164164* [ GradientDescent.runMiniBatchSGD] ( api/mllib/index.html#org.apache.spark.mllib.optimization.GradientDescent )
165-
166- ---
167-
168-
169- ### Optimization Methods Working on the Primal Formulation
170-
171- ** Stochastic subGradient Descent (SGD).**
172- For optimization objectives ` $f$ ` written as a sum, * stochastic subgradient descent (SGD)* can be
173- an efficient choice of optimization method, as we describe in the <a
174- href="mllib-optimization.html">optimization section</a > in more detail.
175- Because all methods considered here fit into the optimization formulation
176- ` $\eqref{eq:regPrimal}$ ` , this is especially natural, because the loss is written as an average
177- of the individual losses coming from each datapoint.
178-
179- Picking one datapoint ` $i\in[1..n]$ ` uniformly at random, we obtain a stochastic subgradient of
180- ` $\eqref{eq:regPrimal}$ ` , with respect to ` $\wv$ ` as follows:
181- `\[
182- f'_ {\wv,i} := L'_ {\wv,i} + \lambda\, R'_ \wv \ ,
183- \] `
184- where ` $L'_{\wv,i} \in \R^d$ ` is a subgradient of the part of the loss function determined by the
185- ` $i$ ` -th datapoint, that is ` $L'_{\wv,i} \in \frac{\partial}{\partial \wv} L(\wv;\x,y)$ ` .
186- Furthermore, ` $R'_\wv$ ` is a subgradient of the regularizer ` $R(\wv)$ ` , i.e. `$R'_ \wv \in
187- \frac{\partial}{\partial \wv} R(\wv)$` . The term ` $R'_ \wv$` does not depend on which random
188- datapoint is picked.
189-
190-
191-
192-
193- ## Implementation in MLlib
194-
195- #### Linear Methods
196-
197- For both classification and regression algorithms with convex loss functions, ` MLlib ` implements a simple distributed version of
198- stochastic subgradient descent (SGD), building on the underlying gradient descent primitive (as
199- described in the
200- <a href =" mllib-optimization.html " >optimization section</a >).
201- All provided algorithms take as input a regularization parameter (` regParam ` ) along with various
202- parameters associated with stochastic gradient
203- descent (` stepSize ` , ` numIterations ` , ` miniBatchFraction ` ).
204- For each of them, we support all 3 possible regularizations (none, L1 or L2).
205-
206- Available algorithms for binary classification:
207-
208- * [ SVMWithSGD] ( api/mllib/index.html#org.apache.spark.mllib.classification.SVMWithSGD )
209- * [ LogisticRegressionWithSGD] ( api/mllib/index.html#org.apache.spark.mllib.classification.LogisticRegressionWithSGD )
210-
211- Available algorithms for linear regression:
212-
213- * [ LinearRegressionWithSGD] ( api/mllib/index.html#org.apache.spark.mllib.regression.LinearRegressionWithSGD )
214- * [ RidgeRegressionWithSGD] ( api/mllib/index.html#org.apache.spark.mllib.regression.RidgeRegressionWithSGD )
215- * [ LassoWithSGD] ( api/mllib/index.html#org.apache.spark.mllib.regression.LassoWithSGD )
216-
217- Behind the scenes, all above methods use the SGD implementation from the
218- gradient descent primitive in MLlib, see the
219- <a href =" mllib-optimization.html " >optimization</a > part:
220-
221- * [ GradientDescent] ( api/mllib/index.html#org.apache.spark.mllib.optimization.GradientDescent )
222-
223-
224-
0 commit comments