You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/mllib-collaborative-filtering.md
+17-23Lines changed: 17 additions & 23 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -18,39 +18,38 @@ In particular, we implement the [alternating least squares
18
18
algorithm to learn these latent factors. The implementation in MLlib has the
19
19
following parameters:
20
20
21
-
**numBlocks* is the number of blacks used to parallelize computation (set to -1 to auto-configure).
21
+
**numBlocks* is the number of blacks used to parallelize computation (set to -1 to auto-configure).
22
22
**rank* is the number of latent factors in our model.
23
23
**iterations* is the number of iterations to run.
24
24
**lambda* specifies the regularization parameter in ALS.
25
-
**implicitPrefs* specifies whether to use the *explicit feedback* ALS variant or one adapted for *implicit feedback* data
26
-
**alpha* is a parameter applicable to the implicit feedback variant of ALS that governs the *baseline* confidence in preference observations
25
+
**implicitPrefs* specifies whether to use the *explicit feedback* ALS variant or one adapted for
26
+
*implicit feedback* data
27
+
**alpha* is a parameter applicable to the implicit feedback variant of ALS that governs the
28
+
*baseline* confidence in preference observations
27
29
28
30
### Explicit vs. implicit feedback
29
31
30
32
The standard approach to matrix factorization based collaborative filtering treats
31
33
the entries in the user-item matrix as *explicit* preferences given by the user to the item.
32
34
33
-
It is common in many real-world use cases to only have access to *implicit feedback*
34
-
(e.g. views, clicks, purchases, likes, shares etc.). The approach used in MLlib to deal with
35
-
such data is taken from
35
+
It is common in many real-world use cases to only have access to *implicit feedback*(e.g. views,
36
+
clicks, purchases, likes, shares etc.). The approach used in MLlib to deal with such data is taken
37
+
from
36
38
[Collaborative Filtering for Implicit Feedback Datasets](http://dx.doi.org/10.1109/ICDM.2008.22).
37
-
Essentially instead of trying to model the matrix of ratings directly, this approach treats the data as
38
-
a combination of binary preferences and *confidence values*. The ratings are then related
39
-
to the level of confidence in observed user preferences, rather than explicit ratings given to items.
40
-
The model then tries to find latent factors that can be used to predict the expected preference of a user
41
-
for an item.
39
+
Essentially instead of trying to model the matrix of ratings directly, this approach treats the data
40
+
as a combination of binary preferences and *confidence values*. The ratings are then related to the
41
+
level of confidence in observed user preferences, rather than explicit ratings given to items. The
42
+
model then tries to find latent factors that can be used to predict the expected preference of a
43
+
user for an item.
42
44
43
45
## Examples
44
46
45
47
<divclass="codetabs">
46
48
47
49
<divdata-lang="scala"markdown="1">
48
-
49
-
Following code snippets can be executed in `spark-shell`.
50
-
51
50
In the following example we load rating data. Each row consists of a user, a product and a rating.
52
-
We use the default ALS.train() method which assumes ratings are explicit. We evaluate the recommendation
53
-
model by measuring the Mean Squared Error of rating prediction.
51
+
We use the default ALS.train() method which assumes ratings are explicit. We evaluate the
52
+
recommendation model by measuring the Mean Squared Error of rating prediction.
54
53
55
54
{% highlight scala %}
56
55
import org.apache.spark.mllib.recommendation.ALS
@@ -86,22 +85,16 @@ other signals), you can use the trainImplicit method to get better results.
86
85
{% highlight scala %}
87
86
val model = ALS.trainImplicit(ratings, 1, 20, 0.01)
88
87
{% endhighlight %}
89
-
90
88
</div>
91
89
92
90
<divdata-lang="java"markdown="1">
93
-
94
91
All of MLlib's methods use Java-friendly types, so you can import and call them there the same
95
92
way you do in Scala. The only caveat is that the methods take Scala RDD objects, while the
96
93
Spark Java API uses a separate `JavaRDD` class. You can convert a Java RDD to a Scala one by
97
94
calling `.rdd()` on your `JavaRDD` object.
98
-
99
95
</div>
100
96
101
97
<divdata-lang="python"markdown="1">
102
-
103
-
Following examples can be tested in the PySpark shell.
104
-
105
98
In the following example we load rating data. Each row consists of a user, a product and a rating.
106
99
We use the default ALS.train() method which assumes ratings are explicit. We evaluate the
107
100
recommendation by measuring the Mean Squared Error of rating prediction.
@@ -138,4 +131,5 @@ model = ALS.trainImplicit(ratings, 1, 20)
138
131
139
132
## Tutorial
140
133
141
-
[AMP Camp](http://ampcamp.berkeley.edu/) provides a hands-on tutorial for [personalized movie recommendation with MLlib](http://ampcamp.berkeley.edu/big-data-mini-course/movie-recommendation-with-mllib.html).
134
+
[AMP Camp](http://ampcamp.berkeley.edu/) provides a hands-on tutorial for
135
+
[personalized movie recommendation with MLlib](http://ampcamp.berkeley.edu/big-data-mini-course/movie-recommendation-with-mllib.html).
0 commit comments