Skip to content

Commit 43adbd5

Browse files
hhbyyhmengxr
authored andcommitted
[SPARK-8043] [MLLIB] [DOC] update NaiveBayes and SVM examples in doc
jira: https://issues.apache.org/jira/browse/SPARK-8043 I found some issues during testing the save/load examples in markdown Documents, as a part of 1.4 QA plan Author: Yuhao Yang <[email protected]> Closes apache#6584 from hhbyyh/naiveDocExample and squashes the following commits: a01a206 [Yuhao Yang] fix for Gaussian mixture 2fb8b96 [Yuhao Yang] update NaiveBayes and SVM examples in doc
1 parent ccaa823 commit 43adbd5

File tree

3 files changed

+14
-18
lines changed

3 files changed

+14
-18
lines changed

docs/mllib-clustering.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -249,11 +249,11 @@ public class GaussianMixtureExample {
249249
GaussianMixtureModel gmm = new GaussianMixture().setK(2).run(parsedData.rdd());
250250

251251
// Save and load GaussianMixtureModel
252-
gmm.save(sc, "myGMMModel")
253-
GaussianMixtureModel sameModel = GaussianMixtureModel.load(sc, "myGMMModel")
252+
gmm.save(sc.sc(), "myGMMModel");
253+
GaussianMixtureModel sameModel = GaussianMixtureModel.load(sc.sc(), "myGMMModel");
254254
// Output the parameters of the mixture model
255255
for(int j=0; j<gmm.k(); j++) {
256-
System.out.println("weight=%f\nmu=%s\nsigma=\n%s\n",
256+
System.out.printf("weight=%f\nmu=%s\nsigma=\n%s\n",
257257
gmm.weights()[j], gmm.gaussians()[j].mu(), gmm.gaussians()[j].sigma());
258258
}
259259
}

docs/mllib-linear-methods.md

Lines changed: 10 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -163,11 +163,8 @@ object, and make predictions with the resulting model to compute the training
163163
error.
164164

165165
{% highlight scala %}
166-
import org.apache.spark.SparkContext
167166
import org.apache.spark.mllib.classification.{SVMModel, SVMWithSGD}
168167
import org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
169-
import org.apache.spark.mllib.regression.LabeledPoint
170-
import org.apache.spark.mllib.linalg.Vectors
171168
import org.apache.spark.mllib.util.MLUtils
172169

173170
// Load training data in LIBSVM format.
@@ -231,15 +228,13 @@ calling `.rdd()` on your `JavaRDD` object. A self-contained application example
231228
that is equivalent to the provided example in Scala is given bellow:
232229

233230
{% highlight java %}
234-
import java.util.Random;
235-
236231
import scala.Tuple2;
237232

238233
import org.apache.spark.api.java.*;
239234
import org.apache.spark.api.java.function.Function;
240235
import org.apache.spark.mllib.classification.*;
241236
import org.apache.spark.mllib.evaluation.BinaryClassificationMetrics;
242-
import org.apache.spark.mllib.linalg.Vector;
237+
243238
import org.apache.spark.mllib.regression.LabeledPoint;
244239
import org.apache.spark.mllib.util.MLUtils;
245240
import org.apache.spark.SparkConf;
@@ -282,8 +277,8 @@ public class SVMClassifier {
282277
System.out.println("Area under ROC = " + auROC);
283278

284279
// Save and load model
285-
model.save(sc.sc(), "myModelPath");
286-
SVMModel sameModel = SVMModel.load(sc.sc(), "myModelPath");
280+
model.save(sc, "myModelPath");
281+
SVMModel sameModel = SVMModel.load(sc, "myModelPath");
287282
}
288283
}
289284
{% endhighlight %}
@@ -315,15 +310,12 @@ a dependency.
315310
</div>
316311

317312
<div data-lang="python" markdown="1">
318-
The following example shows how to load a sample dataset, build Logistic Regression model,
313+
The following example shows how to load a sample dataset, build SVM model,
319314
and make predictions with the resulting model to compute the training error.
320315

321-
Note that the Python API does not yet support model save/load but will in the future.
322-
323316
{% highlight python %}
324-
from pyspark.mllib.classification import LogisticRegressionWithSGD
317+
from pyspark.mllib.classification import SVMWithSGD, SVMModel
325318
from pyspark.mllib.regression import LabeledPoint
326-
from numpy import array
327319

328320
# Load and parse the data
329321
def parsePoint(line):
@@ -334,12 +326,16 @@ data = sc.textFile("data/mllib/sample_svm_data.txt")
334326
parsedData = data.map(parsePoint)
335327

336328
# Build the model
337-
model = LogisticRegressionWithSGD.train(parsedData)
329+
model = SVMWithSGD.train(parsedData, iterations=100)
338330

339331
# Evaluating the model on training data
340332
labelsAndPreds = parsedData.map(lambda p: (p.label, model.predict(p.features)))
341333
trainErr = labelsAndPreds.filter(lambda (v, p): v != p).count() / float(parsedData.count())
342334
print("Training Error = " + str(trainErr))
335+
336+
# Save and load model
337+
model.save(sc, "myModelPath")
338+
sameModel = SVMModel.load(sc, "myModelPath")
343339
{% endhighlight %}
344340
</div>
345341
</div>

docs/mllib-naive-bayes.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ val splits = parsedData.randomSplit(Array(0.6, 0.4), seed = 11L)
5353
val training = splits(0)
5454
val test = splits(1)
5555

56-
val model = NaiveBayes.train(training, lambda = 1.0, model = "multinomial")
56+
val model = NaiveBayes.train(training, lambda = 1.0, modelType = "multinomial")
5757

5858
val predictionAndLabel = test.map(p => (model.predict(p.features), p.label))
5959
val accuracy = 1.0 * predictionAndLabel.filter(x => x._1 == x._2).count() / test.count()

0 commit comments

Comments
 (0)