You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
## What changes were proposed in this pull request?
Make user guide changes to SparkR documentation for all changes that happened in 2.0 to Machine Learning APIs
Author: GayathriMurali <[email protected]>
Closes#13285 from GayathriMurali/SPARK-15129.
(cherry picked from commit af2a4b0)
Signed-off-by: Xiangrui Meng <[email protected]>
Copy file name to clipboardExpand all lines: docs/sparkr.md
+19-58Lines changed: 19 additions & 58 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -285,71 +285,32 @@ head(teenagers)
285
285
286
286
# Machine Learning
287
287
288
-
SparkR allows the fitting of generalized linear models over DataFrames using the [glm()](api/R/glm.html) function. Under the hood, SparkR uses MLlib to train a model of the specified family. Currently the gaussian and binomial families are supported. We support a subset of the available R formula operators for model fitting, including '~', '.', ':', '+', and '-'.
288
+
SparkR supports the following Machine Learning algorithms.
289
289
290
-
The [summary()](api/R/summary.html) function gives the summary of a model produced by [glm()](api/R/glm.html).
290
+
* Generalized Linear Regression Model [spark.glm()](api/R/spark.glm.html)
* For gaussian GLM model, it returns a list with 'devianceResiduals' and 'coefficients' components. The 'devianceResiduals' gives the min/max deviance residuals of the estimation; the 'coefficients' gives the estimated coefficients and their estimated standard errors, t values and p-values. (It only available when model fitted by normal solver.)
293
-
* For binomial GLM model, it returns a list with 'coefficients' component which gives the estimated coefficients.
295
+
[Generalized Linear Regression](api/R/spark.glm.html) can be used to train a model from a specified family. Currently the Gaussian, Binomial, Poisson and Gamma families are supported. We support a subset of the available R formula operators for model fitting, including '~', '.', ':', '+', and '-'.
294
296
295
-
The examples below show the use of building gaussian GLM model and binomial GLM model using SparkR.
297
+
The [summary()](api/R/summary.html) function gives the summary of a model produced by different algorithms listed above.
298
+
It produces the similar result compared with R summary function.
296
299
297
-
## Gaussian GLM model
300
+
## Model persistence
298
301
299
-
<divdata-lang="r"markdown="1">
300
-
{% highlight r %}
301
-
# Create the DataFrame
302
-
df <- createDataFrame(sqlContext, iris)
303
-
304
-
# Fit a gaussian GLM model over the dataset.
305
-
model <- glm(Sepal_Length ~ Sepal_Width + Species, data = df, family = "gaussian")
306
-
307
-
# Model summary are returned in a similar format to R's native glm().
0 commit comments