Skip to content

Commit 6ed7e2c

Browse files
etrainrxin
authored andcommitted
Use numpy directly for matrix multiply.
Using matrix multiply to compute XtX and XtY yields a 5-20x speedup depending on problem size. For example - the following takes 19s locally after this change vs. 5m21s before the change. (16x speedup). bin/pyspark examples/src/main/python/als.py local[8] 1000 1000 50 10 10 Author: Evan Sparks <[email protected]> Closes apache#687 from etrain/patch-1 and squashes the following commits: e094dbc [Evan Sparks] Touching only diaganols on update. d1ab9b6 [Evan Sparks] Use numpy directly for matrix multiply.
1 parent 108c4c1 commit 6ed7e2c

File tree

1 file changed

+7
-8
lines changed
  • examples/src/main/python

1 file changed

+7
-8
lines changed

examples/src/main/python/als.py

Lines changed: 7 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -36,14 +36,13 @@ def rmse(R, ms, us):
3636
def update(i, vec, mat, ratings):
3737
uu = mat.shape[0]
3838
ff = mat.shape[1]
39-
XtX = matrix(np.zeros((ff, ff)))
40-
Xty = np.zeros((ff, 1))
41-
42-
for j in range(uu):
43-
v = mat[j, :]
44-
XtX += v.T * v
45-
Xty += v.T * ratings[i, j]
46-
XtX += np.eye(ff, ff) * LAMBDA * uu
39+
40+
XtX = mat.T * mat
41+
XtY = mat.T * ratings[i, :].T
42+
43+
for j in range(ff):
44+
XtX[j,j] += LAMBDA * uu
45+
4746
return np.linalg.solve(XtX, Xty)
4847

4948
if __name__ == "__main__":

0 commit comments

Comments
 (0)