Skip to content

Commit 04e868b

Browse files
yanboliangjkbradley
authored andcommitted
[SPARK-12364][ML][SPARKR] Add ML example for SparkR
We have DataFrame example for SparkR, we also need to add ML example under ```examples/src/main/r```. cc mengxr jkbradley shivaram Author: Yanbo Liang <[email protected]> Closes #10324 from yanboliang/spark-12364. (cherry picked from commit 1a8b2a1) Signed-off-by: Joseph K. Bradley <[email protected]>
1 parent dffa610 commit 04e868b

File tree

1 file changed

+54
-0
lines changed
  • examples/src/main/r

1 file changed

+54
-0
lines changed

examples/src/main/r/ml.R

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
#
2+
# Licensed to the Apache Software Foundation (ASF) under one or more
3+
# contributor license agreements. See the NOTICE file distributed with
4+
# this work for additional information regarding copyright ownership.
5+
# The ASF licenses this file to You under the Apache License, Version 2.0
6+
# (the "License"); you may not use this file except in compliance with
7+
# the License. You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing, software
12+
# distributed under the License is distributed on an "AS IS" BASIS,
13+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
# See the License for the specific language governing permissions and
15+
# limitations under the License.
16+
#
17+
18+
# To run this example use
19+
# ./bin/sparkR examples/src/main/r/ml.R
20+
21+
# Load SparkR library into your R session
22+
library(SparkR)
23+
24+
# Initialize SparkContext and SQLContext
25+
sc <- sparkR.init(appName="SparkR-ML-example")
26+
sqlContext <- sparkRSQL.init(sc)
27+
28+
# Train GLM of family 'gaussian'
29+
training1 <- suppressWarnings(createDataFrame(sqlContext, iris))
30+
test1 <- training1
31+
model1 <- glm(Sepal_Length ~ Sepal_Width + Species, training1, family = "gaussian")
32+
33+
# Model summary
34+
summary(model1)
35+
36+
# Prediction
37+
predictions1 <- predict(model1, test1)
38+
head(select(predictions1, "Sepal_Length", "prediction"))
39+
40+
# Train GLM of family 'binomial'
41+
training2 <- filter(training1, training1$Species != "setosa")
42+
test2 <- training2
43+
model2 <- glm(Species ~ Sepal_Length + Sepal_Width, data = training2, family = "binomial")
44+
45+
# Model summary
46+
summary(model2)
47+
48+
# Prediction (Currently the output of prediction for binomial GLM is the indexed label,
49+
# we need to transform back to the original string label later)
50+
predictions2 <- predict(model2, test2)
51+
head(select(predictions2, "Species", "prediction"))
52+
53+
# Stop the SparkContext now
54+
sparkR.stop()

0 commit comments

Comments
 (0)