Skip to content

Commit 788ed13

Browse files
committed
add computational cost explanation
1 parent 6429c59 commit 788ed13

File tree

1 file changed

+18
-0
lines changed

1 file changed

+18
-0
lines changed

docs/mllib-dimensionality-reduction.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,24 @@ If we keep the top $k$ singular values, then the dimensions of the resulting low
3939
* `$\Sigma$`: `$k \times k$`,
4040
* `$V$`: `$n \times k$`.
4141

42+
### Performance
43+
We assume $n$ is smaller than $m$. The singular values and the right singular vectors are derived
44+
from the eigenvalues and the eigenvectors of the Gramian matrix $A^T A$. The matrix
45+
storing the right singular vectors $U$, is computed via matrix multiplication as
46+
$U = A (V S^{-1})$, if requested by user via the computeU parameter.
47+
The actual method to use is determined automatically based on the computational cost:
48+
49+
* If n is small ($n < 100$) or $k$ is large compared with $n$ ($k > n / 2$), we compute the Gramian matrix
50+
first and then compute its top eigenvalues and eigenvectors locally on the driver.
51+
This requires a single pass with $O(n^2)$ storage on each executor and on the driver, and
52+
$O(n^2 k)$ time on the driver.
53+
* Otherwise, we compute $(A^T A) v$ in a distributive way and send it to
54+
<a href="http://www.caam.rice.edu/software/ARPACK/">ARPACK</a> to
55+
compute $(A^T^ A)$'s top eigenvalues and eigenvectors on the driver node. This requires $O(k)$
56+
passes, $O(n)$ storage on each executor, and $O(n k)$ storage on the driver.
57+
58+
## SVD Example
59+
4260
MLlib provides SVD functionality to row-oriented matrices, provided in the
4361
<a href="mllib-basics.html#rowmatrix">RowMatrix</a> class.
4462

0 commit comments

Comments
 (0)