[SPARK-13012] [Documentation] Replace example code in ml-guide.md using include_example #11053

devaraj-kavali · 2016-02-03T18:32:12Z

Replaced example code in ml-guide.md using include_example

include_example Replaced example code in ml-guide.md using include_example

yinxusen · 2016-02-03T18:37:18Z

Thanks @devaraj-kavali for taking this. I'll check it later today.

yinxusen · 2016-02-03T18:37:22Z

ok to test

yinxusen · 2016-02-03T23:19:34Z

test it please

yinxusen · 2016-02-04T19:12:19Z

@mengxr Can you help me calling Jenkins to test it?

mengxr · 2016-02-04T19:53:24Z

ok to test

yinxusen · 2016-02-05T00:40:53Z

retest this please

SparkQA · 2016-02-05T00:57:11Z

Test build #50788 has finished for PR 11053 at commit d485a7d.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- public class JavaEstimatorTransformerParamExample
- class Document1 implements Serializable
- class LabeledDocument1 extends Document1 implements Serializable
- public class JavaModelSelectionViaCrossValidationExample
- public class JavaModelSelectionViaTrainValidationSplitExample
- class Document implements Serializable
- class LabeledDocument extends Document implements Serializable
- public class JavaPipelineExample

yinxusen · 2016-02-05T01:18:52Z

@devaraj-kavali, pls fix the style error first.

devaraj-kavali · 2016-02-05T13:51:01Z

I have fixed the errors, @yinxusen Please have a look into this.

SparkQA · 2016-02-05T14:06:19Z

Test build #50819 has finished for PR 11053 at commit 1373996.

This patch fails Python style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-02-05T16:19:47Z

Test build #50824 has finished for PR 11053 at commit 6cc98c9.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-02-05T18:13:55Z

Test build #50825 has finished for PR 11053 at commit 48cafb1.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

yinxusen · 2016-02-07T04:47:59Z

...ples/src/main/java/org/apache/spark/examples/mllib/JavaEstimatorTransformerParamExample.java

+ * limitations under the License.
+ */
+
+import java.util.Arrays;


Use example on and off to imports, too. E.g.

// $example on$ import org.apache.spark.ml.classification.LogisticRegressionModel; import org.apache.spark.ml.param.ParamMap; ... // $example off$

yinxusen · 2016-02-07T05:10:00Z

@devaraj-kavali Please keep 2-indent for Java code.

yinxusen · 2016-02-07T05:14:14Z

examples/src/main/java/org/apache/spark/examples/mllib/JavaPipelineExample.java

+
+// Labeled and unlabeled instance types.
+// Spark SQL can infer schema from Java Beans.
+class Document implements Serializable {


Let's define Document in a single place. Try to merge it with class Document1.

SparkQA · 2016-02-08T07:38:47Z

Test build #50914 has finished for PR 11053 at commit 3a871c1.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- public class Document implements Serializable
- public class LabeledDocument extends Document implements Serializable

devaraj-kavali · 2016-02-08T08:49:27Z

Thanks @yinxusen for review, I have fixed the comments, Please have a look into this.

yinxusen · 2016-02-09T00:51:24Z

@devaraj-kavali All Java code should be 2-indent, other than 4-indent.

the Java code

SparkQA · 2016-02-09T07:24:54Z

Test build #50958 has finished for PR 11053 at commit 50f5fd9.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

devaraj-kavali · 2016-02-09T08:01:27Z

I am sorry @yinxusen for making you to give the same comment gain, I have fixed the indent issue and moved the files from mllib to ml package.

yinxusen · 2016-02-09T08:05:16Z

Never mind, I'll check it later.

2016年2月9日星期二，Devaraj Kavali [email protected] 写道：

I am sorry @yinxusen https://github.com/yinxusen for making you to give
the same comment gain, I have fixed the indent issue and moved the files
from mllib to ml package.

—
Reply to this email directly or view it on GitHub
#11053 (comment).

Cheers

Xusen Yin (尹绪森)
LinkedIn: https://cn.linkedin.com/in/xusenyin

yinxusen · 2016-02-18T04:38:06Z

...c/main/scala/org/apache/spark/examples/ml/ModelSelectionViaTrainValidationSplitExample.scala

+    val data = sqlContext.read.format("libsvm").load("data/mllib/sample_libsvm_data.txt")
+    val Array(training, test) = data.randomSplit(Array(0.9, 0.1), seed = 12345)
+
+    val lr = new org.apache.spark.ml.regression.LinearRegression()


add import org.apache.spark.ml.regression.LinearRegression at the beginning and change here to val lr = new LinearRegression()

yinxusen · 2016-02-18T04:46:08Z

@devaraj-kavali One more thing, could you make the two python examples identical with previous ones in markdown file? There are some comments with early indentations. Note that in Spark, the maximum characters in a line for Python is also 100, not 79 or 80 in other settings.

yinxusen · 2016-02-18T05:14:55Z

...main/java/org/apache/spark/examples/ml/JavaModelSelectionViaTrainValidationSplitExample.java

+    SQLContext jsql = new SQLContext(sc);
+
+    // $example on$
+    DataFrame data = jsql.read().format("libsvm").load("data/mllib/sample_libsvm_data.txt");


According to the previous example, here the path should be data/mllib/sample_linear_regression_data.txt

SparkQA · 2016-02-18T08:09:26Z

Test build #51472 has finished for PR 11053 at commit ea6e77c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

devaraj-kavali · 2016-02-18T09:07:33Z

Thanks @yinxusen for your details review and comments. I have addressed them.

yinxusen · 2016-02-18T21:32:59Z

...main/java/org/apache/spark/examples/ml/JavaModelSelectionViaTrainValidationSplitExample.java

+    SQLContext jsql = new SQLContext(sc);
+
+    // $example on$
+    DataFrame data = jsql.read().format("libsvm").load("data/mllib/sample_linear_regression_data.txt");


Exceed 100 chars limitation

yinxusen · 2016-02-18T21:57:06Z

@devaraj-kavali All look good to me except those minors I pointed out above.

@mengxr The ModelSelectionViaTrainValidationSplitExample and JavaModelSelectionViaTrainValidationSplitExample, which call LinearRegression then CholeskyDecomposition.solve(), fail with return value 1. I am not sure why those errors occur, can you help checking it? I can run WeightedLeastSquareSuite successfully.

SparkQA · 2016-02-19T06:20:49Z

Test build #51523 has finished for PR 11053 at commit e20c920.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

devaraj-kavali · 2016-02-19T06:26:51Z

Thanks again @yinxusen for the review, I have addressed the comments.

yinxusen · 2016-02-20T00:52:09Z

examples/src/main/scala/org/apache/spark/examples/ml/PipelineExample.scala

+    val model = pipeline.fit(training)
+
+    // Now we can optionally save the fitted pipeline to disk
+    model.save("/tmp/spark-logistic-regression-model")


One more thing, it's better to make it overridable: model.write.overwrite().save("/tmp/spark-logistic-regression-model"), otherwise the example can only run once.

yinxusen · 2016-02-20T00:59:14Z

@mengxr LGTM except for the minor issue above.

ModelSelectionViaTrainValidationSplitExample and JavaModelSelectionViaTrainValidationSplitExample still have a problem of Vector serialization. But I think we can add follow-up JIRA to locate the bug and fix it.

times without deleting the files

SparkQA · 2016-02-20T18:05:20Z

Test build #51599 has finished for PR 11053 at commit 2fe0667.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

devaraj-kavali · 2016-02-21T16:37:50Z

Thanks @yinxusen for the good suggestion, I have addressed it.

ModelSelectionViaTrainValidationSplitExample and JavaModelSelectionViaTrainValidationSplitExample still have a problem of Vector serialization. But I think we can add follow-up JIRA to locate the bug and fix it.

Yes, we can create an another followup JIRA to fix the problem. Thank you.

mengxr · 2016-02-23T01:23:10Z

Merged into master. Thanks!

@devaraj-kavali @yinxusen I guess there are some duplicate example code under examples/ml. Could you create a JIRA and think about possible merges? Please also create a JIRA for the serialization issue and paste the JIRA numbers here.

yinxusen · 2016-02-23T01:45:36Z

@mengxr I'll do it soon.

2016年2月22日星期一，asfgit [email protected] 写道：

Closed #11053 #11053 via 02b1fef
02b1fef
.

—
Reply to this email directly or view it on GitHub
#11053 (comment).

Cheers

Xusen Yin (尹绪森)
LinkedIn: https://cn.linkedin.com/in/xusenyin

yinxusen · 2016-02-23T22:44:36Z

Try to solve the serialization issue in this JIRA: https://issues.apache.org/jira/browse/SPARK-13462

yinxusen · 2016-02-23T22:45:51Z

@mengxr For code cleanup and merge, let's do it after all other example code JIRAs are merged. JIRA here: https://issues.apache.org/jira/browse/SPARK-13461

devaraj-kavali · 2016-02-24T04:37:02Z

@yinxusen I will look into the issue SPARK-13462, Thanks for creating it.

[SPARK-13012] [Documentation] Replace example code in ml-guide.md using

d485a7d

include_example Replaced example code in ml-guide.md using include_example

devaraj-kavali changed the title ~~[SPARK-13012] [Documentation] Replace example code in ml-guide.md using~~ [SPARK-13012] [Documentation] Replace example code in ml-guide.md using include_example Feb 3, 2016

Fixed the style errors and corrected .py files

1373996

Fixed python style check warnings

6cc98c9

Fixed scala warning in this example

48cafb1

yinxusen reviewed Feb 7, 2016
View reviewed changes

Fixed the review comments

3a871c1

Changed the package from mllib to ml and indentation to 2 spaces for all

50f5fd9

the Java code

yinxusen reviewed Feb 18, 2016
View reviewed changes

Review comments fix about code style

ea6e77c

yinxusen reviewed Feb 18, 2016
View reviewed changes

Review comments fix

e20c920

yinxusen reviewed Feb 20, 2016
View reviewed changes

Updated to overwrite the files to support running the example multiple

2fe0667

times without deleting the files

asfgit closed this in 02b1fef Feb 23, 2016

dongjoon-hyun mentioned this pull request Feb 23, 2016

[SPARK-11381][DOCS] Replace example code in mllib-linear-methods.md using include_example #11320

Closed

[SPARK-13012] [Documentation] Replace example code in ml-guide.md using include_example #11053

[SPARK-13012] [Documentation] Replace example code in ml-guide.md using include_example #11053

Uh oh!

Conversation

devaraj-kavali commented Feb 3, 2016

Uh oh!

yinxusen commented Feb 3, 2016

Uh oh!

yinxusen commented Feb 3, 2016

Uh oh!

yinxusen commented Feb 3, 2016

Uh oh!

yinxusen commented Feb 4, 2016

Uh oh!

mengxr commented Feb 4, 2016

Uh oh!

yinxusen commented Feb 5, 2016

Uh oh!

SparkQA commented Feb 5, 2016

Uh oh!

yinxusen commented Feb 5, 2016

Uh oh!

devaraj-kavali commented Feb 5, 2016

Uh oh!

SparkQA commented Feb 5, 2016

Uh oh!

SparkQA commented Feb 5, 2016

Uh oh!

SparkQA commented Feb 5, 2016

Uh oh!

yinxusen Feb 7, 2016

Choose a reason for hiding this comment

Uh oh!

yinxusen commented Feb 7, 2016

Uh oh!

yinxusen Feb 7, 2016

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Feb 8, 2016

Uh oh!

devaraj-kavali commented Feb 8, 2016

Uh oh!

yinxusen commented Feb 9, 2016

Uh oh!

SparkQA commented Feb 9, 2016

Uh oh!

devaraj-kavali commented Feb 9, 2016

Uh oh!

yinxusen commented Feb 9, 2016

Cheers

Uh oh!

yinxusen Feb 18, 2016

Choose a reason for hiding this comment

Uh oh!

yinxusen commented Feb 18, 2016

Uh oh!

yinxusen Feb 18, 2016

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Feb 18, 2016

Uh oh!

devaraj-kavali commented Feb 18, 2016

Uh oh!

yinxusen Feb 18, 2016

Choose a reason for hiding this comment

Uh oh!

yinxusen commented Feb 18, 2016

Uh oh!

SparkQA commented Feb 19, 2016

Uh oh!

devaraj-kavali commented Feb 19, 2016

Uh oh!

yinxusen Feb 20, 2016

Choose a reason for hiding this comment

Uh oh!

yinxusen commented Feb 20, 2016

Uh oh!

SparkQA commented Feb 20, 2016

Uh oh!

devaraj-kavali commented Feb 21, 2016