Skip to content

Commit c82ffb4

Browse files
committed
clean optimization
1 parent 31660eb commit c82ffb4

File tree

1 file changed

+9
-69
lines changed

1 file changed

+9
-69
lines changed

docs/mllib-optimization.md

Lines changed: 9 additions & 69 deletions
Original file line numberDiff line numberDiff line change
@@ -25,9 +25,9 @@ title: MLlib - Optimization
2525

2626

2727

28-
# Mathematical Description
28+
## Mathematical Description
2929

30-
## (Sub)Gradient Descent
30+
### Gradient descent
3131
The simplest method to solve optimization problems of the form `$\min_{\wv \in\R^d} \; f(\wv)$`
3232
is [gradient descent](http://en.wikipedia.org/wiki/Gradient_descent).
3333
Such first-order optimization methods (including gradient descent and stochastic variants
@@ -38,14 +38,14 @@ the direction of steepest descent, which is the negative of the derivative (call
3838
[gradient](http://en.wikipedia.org/wiki/Gradient)) of the function at the current point, i.e., at
3939
the current parameter value.
4040
If the objective function `$f$` is not differentiable at all arguments, but still convex, then a
41-
*subgradient*
41+
*sub-gradient*
4242
is the natural generalization of the gradient, and assumes the role of the step direction.
43-
In any case, computing a gradient or subgradient of `$f$` is expensive --- it requires a full
43+
In any case, computing a gradient or sub-gradient of `$f$` is expensive --- it requires a full
4444
pass through the complete dataset, in order to compute the contributions from all loss terms.
4545

46-
## Stochastic (Sub)Gradient Descent (SGD)
46+
### Stochastic gradient descent (SGD)
4747
Optimization problems whose objective function `$f$` is written as a sum are particularly
48-
suitable to be solved using *stochastic subgradient descent (SGD)*.
48+
suitable to be solved using *stochastic gradient descent (SGD)*.
4949
In our case, for the optimization formulations commonly used in <a
5050
href="mllib-classification-regression.html">supervised machine learning</a>,
5151
`\begin{equation}
@@ -98,7 +98,7 @@ For the L1-regularizer, the proximal operator is given by soft thresholding, as
9898
[L1Updater](api/mllib/index.html#org.apache.spark.mllib.optimization.L1Updater).
9999

100100

101-
## Update Schemes for Distributed SGD
101+
### Update schemes for distributed SGD
102102
The SGD implementation in
103103
[GradientDescent](api/mllib/index.html#org.apache.spark.mllib.optimization.GradientDescent) uses
104104
a simple (distributed) sampling of the data examples.
@@ -129,12 +129,12 @@ point.
129129

130130

131131

132-
# Implementation in MLlib
132+
## Implementation in MLlib
133133

134134
Gradient descent methods including stochastic subgradient descent (SGD) as
135135
included as a low-level primitive in `MLlib`, upon which various ML algorithms
136136
are developed, see the
137-
<a href="mllib-classification-regression.html">classification and regression</a>
137+
<a href="mllib-linear-methods.html">linear methods</a>
138138
section for example.
139139

140140
The SGD method
@@ -162,63 +162,3 @@ each iteration, to compute the gradient direction.
162162
Available algorithms for gradient descent:
163163

164164
* [GradientDescent.runMiniBatchSGD](api/mllib/index.html#org.apache.spark.mllib.optimization.GradientDescent)
165-
166-
---
167-
168-
169-
### Optimization Methods Working on the Primal Formulation
170-
171-
**Stochastic subGradient Descent (SGD).**
172-
For optimization objectives `$f$` written as a sum, *stochastic subgradient descent (SGD)* can be
173-
an efficient choice of optimization method, as we describe in the <a
174-
href="mllib-optimization.html">optimization section</a> in more detail.
175-
Because all methods considered here fit into the optimization formulation
176-
`$\eqref{eq:regPrimal}$`, this is especially natural, because the loss is written as an average
177-
of the individual losses coming from each datapoint.
178-
179-
Picking one datapoint `$i\in[1..n]$` uniformly at random, we obtain a stochastic subgradient of
180-
`$\eqref{eq:regPrimal}$`, with respect to `$\wv$` as follows:
181-
`\[
182-
f'_{\wv,i} := L'_{\wv,i} + \lambda\, R'_\wv \ ,
183-
\]`
184-
where `$L'_{\wv,i} \in \R^d$` is a subgradient of the part of the loss function determined by the
185-
`$i$`-th datapoint, that is `$L'_{\wv,i} \in \frac{\partial}{\partial \wv} L(\wv;\x,y)$`.
186-
Furthermore, `$R'_\wv$` is a subgradient of the regularizer `$R(\wv)$`, i.e. `$R'_\wv \in
187-
\frac{\partial}{\partial \wv} R(\wv)$`. The term `$R'_\wv$` does not depend on which random
188-
datapoint is picked.
189-
190-
191-
192-
193-
## Implementation in MLlib
194-
195-
#### Linear Methods
196-
197-
For both classification and regression algorithms with convex loss functions, `MLlib` implements a simple distributed version of
198-
stochastic subgradient descent (SGD), building on the underlying gradient descent primitive (as
199-
described in the
200-
<a href="mllib-optimization.html">optimization section</a>).
201-
All provided algorithms take as input a regularization parameter (`regParam`) along with various
202-
parameters associated with stochastic gradient
203-
descent (`stepSize`, `numIterations`, `miniBatchFraction`).
204-
For each of them, we support all 3 possible regularizations (none, L1 or L2).
205-
206-
Available algorithms for binary classification:
207-
208-
* [SVMWithSGD](api/mllib/index.html#org.apache.spark.mllib.classification.SVMWithSGD)
209-
* [LogisticRegressionWithSGD](api/mllib/index.html#org.apache.spark.mllib.classification.LogisticRegressionWithSGD)
210-
211-
Available algorithms for linear regression:
212-
213-
* [LinearRegressionWithSGD](api/mllib/index.html#org.apache.spark.mllib.regression.LinearRegressionWithSGD)
214-
* [RidgeRegressionWithSGD](api/mllib/index.html#org.apache.spark.mllib.regression.RidgeRegressionWithSGD)
215-
* [LassoWithSGD](api/mllib/index.html#org.apache.spark.mllib.regression.LassoWithSGD)
216-
217-
Behind the scenes, all above methods use the SGD implementation from the
218-
gradient descent primitive in MLlib, see the
219-
<a href="mllib-optimization.html">optimization</a> part:
220-
221-
* [GradientDescent](api/mllib/index.html#org.apache.spark.mllib.optimization.GradientDescent)
222-
223-
224-

0 commit comments

Comments
 (0)