Skip to content

Commit 8aeaba1

Browse files
committed
wrap long lines
1 parent 6ce6a6f commit 8aeaba1

9 files changed

+352
-181
lines changed

docs/mllib-clustering.md

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -39,8 +39,9 @@ a given dataset, the algorithm returns the best clustering result).
3939
<div data-lang="scala" markdown="1">
4040
Following code snippets can be executed in `spark-shell`.
4141

42-
In the following example after loading and parsing data, we use the [`KMeans`](api/mllib/index.html#org.apache.spark.mllib.clustering.KMeans)
43-
object to cluster the data into two clusters. The number of desired clusters is passed to the algorithm. We then compute Within
42+
In the following example after loading and parsing data, we use the
43+
[`KMeans`](api/mllib/index.html#org.apache.spark.mllib.clustering.KMeans) object to cluster the data
44+
into two clusters. The number of desired clusters is passed to the algorithm. We then compute Within
4445
Set Sum of Squared Error (WSSSE). You can reduce this error measure by increasing *k*. In fact the
4546
optimal *k* is usually one where there is an "elbow" in the WSSSE graph.
4647

@@ -73,10 +74,10 @@ calling `.rdd()` on your `JavaRDD` object.
7374
<div data-lang="python" markdown="1">
7475
Following examples can be tested in the PySpark shell.
7576

76-
In the following example after loading and parsing data, we use the KMeans object to cluster the data
77-
into two clusters. The number of desired clusters is passed to the algorithm. We then compute Within
78-
Set Sum of Squared Error (WSSSE). You can reduce this error measure by increasing *k*. In fact the
79-
optimal *k* is usually one where there is an "elbow" in the WSSSE graph.
77+
In the following example after loading and parsing data, we use the KMeans object to cluster the
78+
data into two clusters. The number of desired clusters is passed to the algorithm. We then compute
79+
Within Set Sum of Squared Error (WSSSE). You can reduce this error measure by increasing *k*. In
80+
fact the optimal *k* is usually one where there is an "elbow" in the WSSSE graph.
8081

8182
{% highlight python %}
8283
from pyspark.mllib.clustering import KMeans
@@ -101,4 +102,4 @@ print("Within Set Sum of Squared Error = " + str(WSSSE))
101102
{% endhighlight %}
102103
</div>
103104

104-
</div>
105+
</div>

docs/mllib-collaborative-filtering.md

Lines changed: 17 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -18,39 +18,38 @@ In particular, we implement the [alternating least squares
1818
algorithm to learn these latent factors. The implementation in MLlib has the
1919
following parameters:
2020

21-
* *numBlocks* is the number of blacks used to parallelize computation (set to -1 to auto-configure).
21+
* *numBlocks* is the number of blacks used to parallelize computation (set to -1 to auto-configure).
2222
* *rank* is the number of latent factors in our model.
2323
* *iterations* is the number of iterations to run.
2424
* *lambda* specifies the regularization parameter in ALS.
25-
* *implicitPrefs* specifies whether to use the *explicit feedback* ALS variant or one adapted for *implicit feedback* data
26-
* *alpha* is a parameter applicable to the implicit feedback variant of ALS that governs the *baseline* confidence in preference observations
25+
* *implicitPrefs* specifies whether to use the *explicit feedback* ALS variant or one adapted for
26+
*implicit feedback* data
27+
* *alpha* is a parameter applicable to the implicit feedback variant of ALS that governs the
28+
*baseline* confidence in preference observations
2729

2830
### Explicit vs. implicit feedback
2931

3032
The standard approach to matrix factorization based collaborative filtering treats
3133
the entries in the user-item matrix as *explicit* preferences given by the user to the item.
3234

33-
It is common in many real-world use cases to only have access to *implicit feedback*
34-
(e.g. views, clicks, purchases, likes, shares etc.). The approach used in MLlib to deal with
35-
such data is taken from
35+
It is common in many real-world use cases to only have access to *implicit feedback* (e.g. views,
36+
clicks, purchases, likes, shares etc.). The approach used in MLlib to deal with such data is taken
37+
from
3638
[Collaborative Filtering for Implicit Feedback Datasets](http://dx.doi.org/10.1109/ICDM.2008.22).
37-
Essentially instead of trying to model the matrix of ratings directly, this approach treats the data as
38-
a combination of binary preferences and *confidence values*. The ratings are then related
39-
to the level of confidence in observed user preferences, rather than explicit ratings given to items.
40-
The model then tries to find latent factors that can be used to predict the expected preference of a user
41-
for an item.
39+
Essentially instead of trying to model the matrix of ratings directly, this approach treats the data
40+
as a combination of binary preferences and *confidence values*. The ratings are then related to the
41+
level of confidence in observed user preferences, rather than explicit ratings given to items. The
42+
model then tries to find latent factors that can be used to predict the expected preference of a
43+
user for an item.
4244

4345
## Examples
4446

4547
<div class="codetabs">
4648

4749
<div data-lang="scala" markdown="1">
48-
49-
Following code snippets can be executed in `spark-shell`.
50-
5150
In the following example we load rating data. Each row consists of a user, a product and a rating.
52-
We use the default ALS.train() method which assumes ratings are explicit. We evaluate the recommendation
53-
model by measuring the Mean Squared Error of rating prediction.
51+
We use the default ALS.train() method which assumes ratings are explicit. We evaluate the
52+
recommendation model by measuring the Mean Squared Error of rating prediction.
5453

5554
{% highlight scala %}
5655
import org.apache.spark.mllib.recommendation.ALS
@@ -86,22 +85,16 @@ other signals), you can use the trainImplicit method to get better results.
8685
{% highlight scala %}
8786
val model = ALS.trainImplicit(ratings, 1, 20, 0.01)
8887
{% endhighlight %}
89-
9088
</div>
9189

9290
<div data-lang="java" markdown="1">
93-
9491
All of MLlib's methods use Java-friendly types, so you can import and call them there the same
9592
way you do in Scala. The only caveat is that the methods take Scala RDD objects, while the
9693
Spark Java API uses a separate `JavaRDD` class. You can convert a Java RDD to a Scala one by
9794
calling `.rdd()` on your `JavaRDD` object.
98-
9995
</div>
10096

10197
<div data-lang="python" markdown="1">
102-
103-
Following examples can be tested in the PySpark shell.
104-
10598
In the following example we load rating data. Each row consists of a user, a product and a rating.
10699
We use the default ALS.train() method which assumes ratings are explicit. We evaluate the
107100
recommendation by measuring the Mean Squared Error of rating prediction.
@@ -138,4 +131,5 @@ model = ALS.trainImplicit(ratings, 1, 20)
138131

139132
## Tutorial
140133

141-
[AMP Camp](http://ampcamp.berkeley.edu/) provides a hands-on tutorial for [personalized movie recommendation with MLlib](http://ampcamp.berkeley.edu/big-data-mini-course/movie-recommendation-with-mllib.html).
134+
[AMP Camp](http://ampcamp.berkeley.edu/) provides a hands-on tutorial for
135+
[personalized movie recommendation with MLlib](http://ampcamp.berkeley.edu/big-data-mini-course/movie-recommendation-with-mllib.html).

0 commit comments

Comments
 (0)