Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
67 changes: 64 additions & 3 deletions docs/ml-classification-regression.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,13 @@ More details on parameters can be found in the [Python API documentation](api/py
{% include_example python/ml/logistic_regression_with_elastic_net.py %}
</div>

<div data-lang="r" markdown="1">

More details on parameters can be found in the [R API documentation](api/R/spark.logit.html).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to "Refer to the [R API docs]... for more details"? For consistence.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this is consistent with L59 and L66.


{% include_example binomial r/ml/logit.R %}
</div>

</div>

The `spark.ml` implementation of logistic regression also supports
Expand Down Expand Up @@ -165,6 +172,13 @@ model with elastic net regularization.
{% include_example python/ml/multiclass_logistic_regression_with_elastic_net.py %}
</div>

<div data-lang="r" markdown="1">

More details on parameters can be found in the [R API documentation](api/R/spark.logit.html).

{% include_example multinomial r/ml/logit.R %}
</div>

</div>


Expand Down Expand Up @@ -236,6 +250,14 @@ Refer to the [Python API docs](api/python/pyspark.ml.html#pyspark.ml.classificat

{% include_example python/ml/random_forest_classifier_example.py %}
</div>

<div data-lang="r" markdown="1">

Refer to the [R API docs](api/R/spark.randomForest.html) for more details.

{% include_example classification r/ml/randomForest.R %}
</div>

</div>

## Gradient-boosted tree classifier
Expand Down Expand Up @@ -269,6 +291,14 @@ Refer to the [Python API docs](api/python/pyspark.ml.html#pyspark.ml.classificat

{% include_example python/ml/gradient_boosted_tree_classifier_example.py %}
</div>

<div data-lang="r" markdown="1">

Refer to the [R API docs](api/R/spark.gbt.html) for more details.

{% include_example classification r/ml/gbt.R %}
</div>

</div>

## Multilayer perceptron classifier
Expand Down Expand Up @@ -318,6 +348,13 @@ Refer to the [Python API docs](api/python/pyspark.ml.html#pyspark.ml.classificat
{% include_example python/ml/multilayer_perceptron_classification.py %}
</div>

<div data-lang="r" markdown="1">

Refer to the [R API docs](api/R/spark.mlp.html) for more details.

{% include_example r/ml/mlp.R %}
</div>

</div>


Expand Down Expand Up @@ -394,7 +431,7 @@ Refer to the [Python API docs](api/python/pyspark.ml.html#pyspark.ml.classificat

Refer to the [R API docs](api/R/spark.naiveBayes.html) for more details.

{% include_example naiveBayes r/ml.R %}
{% include_example r/ml/naiveBayes.R %}
</div>

</div>
Expand Down Expand Up @@ -578,7 +615,7 @@ Refer to the [Python API docs](api/python/pyspark.ml.html#pyspark.ml.regression.

Refer to the [R API docs](api/R/spark.glm.html) for more details.

{% include_example glm r/ml.R %}
{% include_example r/ml/glm.R %}
</div>

</div>
Expand Down Expand Up @@ -650,6 +687,14 @@ Refer to the [Python API docs](api/python/pyspark.ml.html#pyspark.ml.regression.

{% include_example python/ml/random_forest_regressor_example.py %}
</div>

<div data-lang="r" markdown="1">

Refer to the [R API docs](api/R/spark.randomForest.html) for more details.

{% include_example regression r/ml/randomForest.R %}
</div>

</div>

## Gradient-boosted tree regression
Expand Down Expand Up @@ -683,6 +728,14 @@ Refer to the [Python API docs](api/python/pyspark.ml.html#pyspark.ml.regression.

{% include_example python/ml/gradient_boosted_tree_regressor_example.py %}
</div>

<div data-lang="r" markdown="1">

Refer to the [R API docs](api/R/spark.gbt.html) for more details.

{% include_example regression r/ml/gbt.R %}
</div>

</div>


Expand Down Expand Up @@ -774,7 +827,7 @@ Refer to the [Python API docs](api/python/pyspark.ml.html#pyspark.ml.regression.

Refer to the [R API docs](api/R/spark.survreg.html) for more details.

{% include_example survreg r/ml.R %}
{% include_example r/ml/survreg.R %}
</div>

</div>
Expand Down Expand Up @@ -847,6 +900,14 @@ Refer to the [`IsotonicRegression` Python docs](api/python/pyspark.ml.html#pyspa

{% include_example python/ml/isotonic_regression_example.py %}
</div>

<div data-lang="r" markdown="1">

Refer to the [`IsotonicRegression` R API docs](api/R/spark.isoreg.html) for more details on the API.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete "on the API" for consistence?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is consistent with L834, L840 and L846


{% include_example r/ml/isoreg.R %}
</div>

</div>

# Linear methods
Expand Down
18 changes: 17 additions & 1 deletion docs/ml-clustering.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ Refer to the [Python API docs](api/python/pyspark.ml.html#pyspark.ml.clustering.

Refer to the [R API docs](api/R/spark.kmeans.html) for more details.

{% include_example kmeans r/ml.R %}
{% include_example r/ml/kmeans.R %}
</div>

</div>
Expand Down Expand Up @@ -126,6 +126,14 @@ Refer to the [Python API docs](api/python/pyspark.ml.html#pyspark.ml.clustering.

{% include_example python/ml/lda_example.py %}
</div>

<div data-lang="r" markdown="1">

Refer to the [R API docs](api/R/spark.lda.html) for more details.

{% include_example r/ml/lda.R %}
</div>

</div>

## Bisecting k-means
Expand Down Expand Up @@ -241,4 +249,12 @@ Refer to the [Python API docs](api/python/pyspark.ml.html#pyspark.ml.clustering.

{% include_example python/ml/gaussian_mixture_example.py %}
</div>

<div data-lang="r" markdown="1">

Refer to the [R API docs](api/R/spark.gaussianMixture.html) for more details.

{% include_example r/ml/gaussianMixture.R %}
</div>

</div>
8 changes: 8 additions & 0 deletions docs/ml-collaborative-filtering.md
Original file line number Diff line number Diff line change
Expand Up @@ -149,4 +149,12 @@ als = ALS(maxIter=5, regParam=0.01, implicitPrefs=True,
{% endhighlight %}

</div>

<div data-lang="r" markdown="1">

Refer to the [R API docs](api/R/spark.als.html) for more details.

{% include_example r/ml/als.R %}
</div>

</div>
46 changes: 20 additions & 26 deletions docs/sparkr.md
Original file line number Diff line number Diff line change
Expand Up @@ -512,39 +512,33 @@ head(teenagers)

# Machine Learning

SparkR supports the following machine learning algorithms currently: `Generalized Linear Model`, `Accelerated Failure Time (AFT) Survival Regression Model`, `Naive Bayes Model` and `KMeans Model`.
Under the hood, SparkR uses MLlib to train the model.
Users can call `summary` to print a summary of the fitted model, [predict](api/R/predict.html) to make predictions on new data, and [write.ml](api/R/write.ml.html)/[read.ml](api/R/read.ml.html) to save/load fitted models.
SparkR supports a subset of the available R formula operators for model fitting, including ‘~’, ‘.’, ‘:’, ‘+’, and ‘-‘.

## Algorithms

### Generalized Linear Model

[spark.glm()](api/R/spark.glm.html) or [glm()](api/R/glm.html) fits generalized linear model against a Spark DataFrame.
Currently "gaussian", "binomial", "poisson" and "gamma" families are supported.
{% include_example glm r/ml.R %}

### Accelerated Failure Time (AFT) Survival Regression Model

[spark.survreg()](api/R/spark.survreg.html) fits an accelerated failure time (AFT) survival regression model on a SparkDataFrame.
Note that the formula of [spark.survreg()](api/R/spark.survreg.html) does not support operator '.' currently.
{% include_example survreg r/ml.R %}

### Naive Bayes Model

[spark.naiveBayes()](api/R/spark.naiveBayes.html) fits a Bernoulli naive Bayes model against a SparkDataFrame. Only categorical data is supported.
{% include_example naiveBayes r/ml.R %}

### KMeans Model
SparkR supports the following machine learning algorithms currently:

* [`spark.glm`](api/R/spark.glm.html) or [`glm`](api/R/glm.html): [`Generalized Linear Model`](ml-classification-regression.html#generalized-linear-regression)
* [`spark.survreg`](api/R/spark.survreg.html): [`Accelerated Failure Time (AFT) Survival Regression Model`](ml-classification-regression.html#survival-regression)
* [`spark.naiveBayes`](api/R/spark.naiveBayes.html): [`Naive Bayes Model`](ml-classification-regression.html#naive-bayes)
* [`spark.kmeans`](api/R/spark.kmeans.html): [`K-Means Model`](ml-clustering.html#k-means)
* [`spark.logit`](api/R/spark.logit.html): [`Logistic Regression Model`](ml-classification-regression.html#logistic-regression)
* [`spark.isoreg`](api/R/spark.isoreg.html): [`Isotonic Regression Model`](ml-classification-regression.html#isotonic-regression)
* [`spark.gaussianMixture`](api/R/spark.gaussianMixture.html): [`Gaussian Mixture Model`](ml-clustering.html#gaussian-mixture-model-gmm)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like we would be missing out some R specific things from this delete?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These descriptions can be found in the SparkR API doc. I'm more prefer to link the algorithms listed here to the corresponding R API docs and MLlib user guide sections rather than duplicated adding them here.

Copy link
Member

@felixcheung felixcheung Dec 7, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, generally I'd agree. I think we should have more information on this though since the SparkR API doc is still kind of thin, perhaps this should be part R content for the ML programming guide instead?

* [`spark.lda`](api/R/spark.lda.html): [`Latent Dirichlet Allocation (LDA) Model`](ml-clustering.html#latent-dirichlet-allocation-lda)
* [`spark.mlp`](api/R/spark.mlp.html): [`Multilayer Perceptron Classification Model`](ml-classification-regression.html#multilayer-perceptron-classifier)
* [`spark.gbt`](api/R/spark.gbt.html): `Gradient Boosted Tree Model for` [`Regression`](ml-classification-regression.html#gradient-boosted-tree-regression) `and` [`Classification`](ml-classification-regression.html#gradient-boosted-tree-classifier)
* [`spark.randomForest`](api/R/spark.randomForest.html): `Random Forest Model for` [`Regression`](ml-classification-regression.html#random-forest-regression) `and` [`Classification`](ml-classification-regression.html#random-forest-classifier)
* [`spark.als`](api/R/spark.als.html): [`Alternating Least Squares (ALS) matrix factorization Model`](ml-collaborative-filtering.html#collaborative-filtering)
* [`spark.kstest`](api/R/spark.kstest.html): `Kolmogorov-Smirnov Test`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another R specific info that would be deleted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto, can be found at R API doc.


Under the hood, SparkR uses MLlib to train the model. Please refer to the corresponding section of MLlib user guide for example code.
Users can call `summary` to print a summary of the fitted model, [predict](api/R/predict.html) to make predictions on new data, and [write.ml](api/R/write.ml.html)/[read.ml](api/R/read.ml.html) to save/load fitted models.
SparkR supports a subset of the available R formula operators for model fitting, including ‘~’, ‘.’, ‘:’, ‘+’, and ‘-‘.

[spark.kmeans()](api/R/spark.kmeans.html) fits a k-means clustering model against a Spark DataFrame, similarly to R's kmeans().
{% include_example kmeans r/ml.R %}

## Model persistence

The following example shows how to save/load a MLlib model by SparkR.
{% include_example read_write r/ml.R %}
{% include_example read_write r/ml/ml.R %}

# R Function Name Conflicts

Expand Down
Loading