Skip to content

Commit b8a42f2

Browse files
committed
Address comments.
1 parent 11a280e commit b8a42f2

File tree

9 files changed

+52
-25
lines changed

9 files changed

+52
-25
lines changed

docs/ml-classification-regression.md

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,7 @@ More details on parameters can be found in the [Python API documentation](api/py
7979

8080
More details on parameters can be found in the [R API documentation](api/R/spark.logit.html).
8181

82-
{% include_example r/ml/logit.R %}
82+
{% include_example binomial r/ml/logit.R %}
8383
</div>
8484

8585
</div>
@@ -172,6 +172,13 @@ model with elastic net regularization.
172172
{% include_example python/ml/multiclass_logistic_regression_with_elastic_net.py %}
173173
</div>
174174

175+
<div data-lang="r" markdown="1">
176+
177+
More details on parameters can be found in the [R API documentation](api/R/spark.logit.html).
178+
179+
{% include_example multinomial r/ml/logit.R %}
180+
</div>
181+
175182
</div>
176183

177184

docs/sparkr.md

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -516,19 +516,19 @@ head(teenagers)
516516

517517
SparkR supports the following machine learning algorithms currently:
518518

519-
* `spark.glm` or `glm`: `Generalized Linear Model`
520-
* `spark.survreg`: `Accelerated Failure Time (AFT) Survival Regressio Model`
521-
* `spark.naiveBayes`: `Naive Bayes Model`
522-
* `spark.kmeans`: `KMeans Model`
523-
* `spark.logit`: `Logistic Regression Model`
524-
* `spark.isoreg`: `Isotonic Regression Model`
525-
* `spark.gaussianMixture`: `Gaussian Mixture Model`
526-
* `spark.lda`: `Latent Dirichlet Allocation (LDA) Model`
527-
* `spark.mlp`: `Multilayer Perceptron Classification Model`
528-
* `spark.gbt`: `Gradient Boosted Tree Model for Regression and Classification`
529-
* `spark.randomForest`: `Random Forest Model for Regression and Classification`
530-
* `spark.als`: `Alternating Least Squares (ALS) matrix factorization Model`
531-
* `spark.kstest`: `Kolmogorov-Smirnov Test`
519+
* [`spark.glm`](api/R/spark.glm.html) or [`glm`](api/R/glm.html): [`Generalized Linear Model`](ml-classification-regression.html#generalized-linear-regression)
520+
* [`spark.survreg`](api/R/spark.survreg.html): [`Accelerated Failure Time (AFT) Survival Regression Model`](ml-classification-regression.html#survival-regression)
521+
* [`spark.naiveBayes`](api/R/spark.naiveBayes.html): [`Naive Bayes Model`](ml-classification-regression.html#naive-bayes)
522+
* [`spark.kmeans`](api/R/spark.kmeans.html): [`KMeans Model`](ml-clustering.html#k-means)
523+
* [`spark.logit`](api/R/spark.logit.html): [`Logistic Regression Model`](ml-classification-regression.html#logistic-regression)
524+
* [`spark.isoreg`](api/R/spark.isoreg.html): [`Isotonic Regression Model`](ml-classification-regression.html#isotonic-regression)
525+
* [`spark.gaussianMixture`](api/R/spark.gaussianMixture.html): [`Gaussian Mixture Model`](ml-clustering.html#gaussian-mixture-model-gmm)
526+
* [`spark.lda`](api/R/spark.lda.html): [`Latent Dirichlet Allocation (LDA) Model`](ml-clustering.html#latent-dirichlet-allocation-lda)
527+
* [`spark.mlp`](api/R/spark.mlp.html): [`Multilayer Perceptron Classification Model`](ml-classification-regression.html#multilayer-perceptron-classifier)
528+
* [`spark.gbt`](api/R/spark.gbt.html): `Gradient Boosted Tree Model for` [`Regression`](ml-classification-regression.html#gradient-boosted-tree-regression) `and` [`Classification`](ml-classification-regression.html#gradient-boosted-tree-classifier)
529+
* [`spark.randomForest`](api/R/spark.randomForest.html): `Random Forest Model for` [`Regression`](ml-classification-regression.html#random-forest-regression) `and` [`Classification`](ml-classification-regression.html#random-forest-classifier)
530+
* [`spark.als`](api/R/spark.als.html): [`Alternating Least Squares (ALS) matrix factorization Model`](ml-collaborative-filtering.html#collaborative-filtering)
531+
* [`spark.kstest`](api/R/spark.kstest.html): `Kolmogorov-Smirnov Test`
532532

533533
Under the hood, SparkR uses MLlib to train the model. Please refer to the corresponding section of MLlib user guide for example code.
534534
Users can call `summary` to print a summary of the fitted model, [predict](api/R/predict.html) to make predictions on new data, and [write.ml](api/R/write.ml.html)/[read.ml](api/R/read.ml.html) to save/load fitted models.

examples/src/main/r/ml/gaussianMixture.R

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,4 +39,4 @@ summary(model)
3939
# Prediction
4040
predictions <- predict(model, test)
4141
showDF(predictions)
42-
# $example off$
42+
# $example off$

examples/src/main/r/ml/isoreg.R

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,4 +39,4 @@ summary(model)
3939
# Prediction
4040
predictions <- predict(model, test)
4141
showDF(predictions)
42-
# $example off$
42+
# $example off$

examples/src/main/r/ml/kmeans.R

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ irisDF <- suppressWarnings(createDataFrame(iris))
3030
kmeansDF <- irisDF
3131
kmeansTestDF <- irisDF
3232
kmeansModel <- spark.kmeans(kmeansDF, ~ Sepal_Length + Sepal_Width + Petal_Length + Petal_Width,
33-
k = 3)
33+
k = 3)
3434

3535
# Model summary
3636
summary(kmeansModel)

examples/src/main/r/ml/lda.R

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ training <- df
3131
test <- df
3232

3333
# Fit a latent dirichlet allocation model with spark.lda
34-
model <- spark.lda(training, k=10, maxIter=10)
34+
model <- spark.lda(training, k = 10, maxIter = 10)
3535

3636
# Model summary
3737
summary(model)
@@ -43,4 +43,4 @@ showDF(posterior)
4343
# The log perplexity of the LDA model
4444
logPerplexity <- spark.perplexity(model, test)
4545
print(paste0("The upper bound bound on perplexity: ", logPerplexity))
46-
# $example off$
46+
# $example off$

examples/src/main/r/ml/logit.R

Lines changed: 24 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -24,13 +24,34 @@ library(SparkR)
2424
# Initialize SparkSession
2525
sparkR.session(appName = "SparkR-ML-logit-example")
2626

27-
# $example on$
27+
# Binomial logistic regression
28+
29+
# $example on:binomial$
2830
# Load training data
2931
df <- read.df("data/mllib/sample_libsvm_data.txt", source = "libsvm")
3032
training <- df
3133
test <- df
3234

33-
# Fit an logistic regression model with spark.logit
35+
# Fit an binomial logistic regression model with spark.logit
36+
model <- spark.logit(training, label ~ features, maxIter = 10, regParam = 0.3, elasticNetParam = 0.8)
37+
38+
# Model summary
39+
summary(model)
40+
41+
# Prediction
42+
predictions <- predict(model, test)
43+
showDF(predictions)
44+
# $example off:binomial$
45+
46+
# Multinomial logistic regression
47+
48+
# $example on:multinomial$
49+
# Load training data
50+
df <- read.df("data/mllib/sample_multiclass_classification_data.txt", source = "libsvm")
51+
training <- df
52+
test <- df
53+
54+
# Fit a multinomial logistic regression model with spark.logit
3455
model <- spark.logit(training, label ~ features, maxIter = 10, regParam = 0.3, elasticNetParam = 0.8)
3556

3657
# Model summary
@@ -39,4 +60,4 @@ summary(model)
3960
# Prediction
4061
predictions <- predict(model, test)
4162
showDF(predictions)
42-
# $example off$
63+
# $example off:multinomial$

examples/src/main/r/ml/ml.R

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,5 @@ model.summaries <- spark.lapply(families, train)
5959
# Print the summary of each model
6060
print(model.summaries)
6161

62-
6362
# Stop the SparkSession now
6463
sparkR.session.stop()

examples/src/main/r/ml/randomForest.R

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ training <- df
3333
test <- df
3434

3535
# Fit a random forest classification model with spark.randomForest
36-
model <- spark.randomForest(training, label ~ features, "classification", numTrees=10)
36+
model <- spark.randomForest(training, label ~ features, "classification", numTrees = 10)
3737

3838
# Model summary
3939
summary(model)
@@ -52,7 +52,7 @@ training <- df
5252
test <- df
5353

5454
# Fit a random forest regression model with spark.randomForest
55-
model <- spark.randomForest(training, label ~ features, "regression", numTrees=10)
55+
model <- spark.randomForest(training, label ~ features, "regression", numTrees = 10)
5656

5757
# Model summary
5858
summary(model)

0 commit comments

Comments
 (0)