Skip to content

Conversation

@devaraj-kavali
Copy link

Replaced example code in ml-guide.md using include_example

include_example

Replaced example code in ml-guide.md using include_example
@devaraj-kavali devaraj-kavali changed the title [SPARK-13012] [Documentation] Replace example code in ml-guide.md using [SPARK-13012] [Documentation] Replace example code in ml-guide.md using include_example Feb 3, 2016
@yinxusen
Copy link
Contributor

yinxusen commented Feb 3, 2016

Thanks @devaraj-kavali for taking this. I'll check it later today.

@yinxusen
Copy link
Contributor

yinxusen commented Feb 3, 2016

ok to test

@yinxusen
Copy link
Contributor

yinxusen commented Feb 3, 2016

test it please

@yinxusen
Copy link
Contributor

yinxusen commented Feb 4, 2016

@mengxr Can you help me calling Jenkins to test it?

@mengxr
Copy link
Contributor

mengxr commented Feb 4, 2016

ok to test

@yinxusen
Copy link
Contributor

yinxusen commented Feb 5, 2016

retest this please

@SparkQA
Copy link

SparkQA commented Feb 5, 2016

Test build #50788 has finished for PR 11053 at commit d485a7d.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public class JavaEstimatorTransformerParamExample
    • class Document1 implements Serializable
    • class LabeledDocument1 extends Document1 implements Serializable
    • public class JavaModelSelectionViaCrossValidationExample
    • public class JavaModelSelectionViaTrainValidationSplitExample
    • class Document implements Serializable
    • class LabeledDocument extends Document implements Serializable
    • public class JavaPipelineExample

@yinxusen
Copy link
Contributor

yinxusen commented Feb 5, 2016

@devaraj-kavali, pls fix the style error first.

@devaraj-kavali
Copy link
Author

I have fixed the errors, @yinxusen Please have a look into this.

@SparkQA
Copy link

SparkQA commented Feb 5, 2016

Test build #50819 has finished for PR 11053 at commit 1373996.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 5, 2016

Test build #50824 has finished for PR 11053 at commit 6cc98c9.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 5, 2016

Test build #50825 has finished for PR 11053 at commit 48cafb1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

* limitations under the License.
*/

import java.util.Arrays;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use example on and off to imports, too. E.g.

// $example on$
import org.apache.spark.ml.classification.LogisticRegressionModel;
import org.apache.spark.ml.param.ParamMap;
...
// $example off$

@yinxusen
Copy link
Contributor

yinxusen commented Feb 7, 2016

@devaraj-kavali Please keep 2-indent for Java code.


// Labeled and unlabeled instance types.
// Spark SQL can infer schema from Java Beans.
class Document implements Serializable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's define Document in a single place. Try to merge it with class Document1.

@SparkQA
Copy link

SparkQA commented Feb 8, 2016

Test build #50914 has finished for PR 11053 at commit 3a871c1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public class Document implements Serializable
    • public class LabeledDocument extends Document implements Serializable

@devaraj-kavali
Copy link
Author

Thanks @yinxusen for review, I have fixed the comments, Please have a look into this.

@yinxusen
Copy link
Contributor

yinxusen commented Feb 9, 2016

@devaraj-kavali All Java code should be 2-indent, other than 4-indent.

@SparkQA
Copy link

SparkQA commented Feb 9, 2016

Test build #50958 has finished for PR 11053 at commit 50f5fd9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@devaraj-kavali
Copy link
Author

I am sorry @yinxusen for making you to give the same comment gain, I have fixed the indent issue and moved the files from mllib to ml package.

@yinxusen
Copy link
Contributor

yinxusen commented Feb 9, 2016

Never mind, I'll check it later.

2016年2月9日星期二,Devaraj Kavali [email protected] 写道:

I am sorry @yinxusen https://github.com/yinxusen for making you to give
the same comment gain, I have fixed the indent issue and moved the files
from mllib to ml package.


Reply to this email directly or view it on GitHub
#11053 (comment).

Cheers

Xusen Yin (尹绪森)
LinkedIn: https://cn.linkedin.com/in/xusenyin

val data = sqlContext.read.format("libsvm").load("data/mllib/sample_libsvm_data.txt")
val Array(training, test) = data.randomSplit(Array(0.9, 0.1), seed = 12345)

val lr = new org.apache.spark.ml.regression.LinearRegression()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add import org.apache.spark.ml.regression.LinearRegression at the beginning and change here to val lr = new LinearRegression()

@yinxusen
Copy link
Contributor

@devaraj-kavali One more thing, could you make the two python examples identical with previous ones in markdown file? There are some comments with early indentations. Note that in Spark, the maximum characters in a line for Python is also 100, not 79 or 80 in other settings.

SQLContext jsql = new SQLContext(sc);

// $example on$
DataFrame data = jsql.read().format("libsvm").load("data/mllib/sample_libsvm_data.txt");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the previous example, here the path should be data/mllib/sample_linear_regression_data.txt

@SparkQA
Copy link

SparkQA commented Feb 18, 2016

Test build #51472 has finished for PR 11053 at commit ea6e77c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@devaraj-kavali
Copy link
Author

Thanks @yinxusen for your details review and comments. I have addressed them.

SQLContext jsql = new SQLContext(sc);

// $example on$
DataFrame data = jsql.read().format("libsvm").load("data/mllib/sample_linear_regression_data.txt");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exceed 100 chars limitation

@yinxusen
Copy link
Contributor

@devaraj-kavali All look good to me except those minors I pointed out above.

@mengxr The ModelSelectionViaTrainValidationSplitExample and JavaModelSelectionViaTrainValidationSplitExample, which call LinearRegression then CholeskyDecomposition.solve(), fail with return value 1. I am not sure why those errors occur, can you help checking it? I can run WeightedLeastSquareSuite successfully.

@SparkQA
Copy link

SparkQA commented Feb 19, 2016

Test build #51523 has finished for PR 11053 at commit e20c920.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@devaraj-kavali
Copy link
Author

Thanks again @yinxusen for the review, I have addressed the comments.

val model = pipeline.fit(training)

// Now we can optionally save the fitted pipeline to disk
model.save("/tmp/spark-logistic-regression-model")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more thing, it's better to make it overridable: model.write.overwrite().save("/tmp/spark-logistic-regression-model"), otherwise the example can only run once.

@yinxusen
Copy link
Contributor

@mengxr LGTM except for the minor issue above.

ModelSelectionViaTrainValidationSplitExample and JavaModelSelectionViaTrainValidationSplitExample still have a problem of Vector serialization. But I think we can add follow-up JIRA to locate the bug and fix it.

@SparkQA
Copy link

SparkQA commented Feb 20, 2016

Test build #51599 has finished for PR 11053 at commit 2fe0667.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@devaraj-kavali
Copy link
Author

Thanks @yinxusen for the good suggestion, I have addressed it.

ModelSelectionViaTrainValidationSplitExample and JavaModelSelectionViaTrainValidationSplitExample still have a problem of Vector serialization. But I think we can add follow-up JIRA to locate the bug and fix it.

Yes, we can create an another followup JIRA to fix the problem. Thank you.

@mengxr
Copy link
Contributor

mengxr commented Feb 23, 2016

Merged into master. Thanks!

@devaraj-kavali @yinxusen I guess there are some duplicate example code under examples/ml. Could you create a JIRA and think about possible merges? Please also create a JIRA for the serialization issue and paste the JIRA numbers here.

@asfgit asfgit closed this in 02b1fef Feb 23, 2016
@yinxusen
Copy link
Contributor

@mengxr I'll do it soon.

2016年2月22日星期一,asfgit [email protected] 写道:

Closed #11053 #11053 via 02b1fef
02b1fef
.


Reply to this email directly or view it on GitHub
#11053 (comment).

Cheers

Xusen Yin (尹绪森)
LinkedIn: https://cn.linkedin.com/in/xusenyin

@yinxusen
Copy link
Contributor

Try to solve the serialization issue in this JIRA: https://issues.apache.org/jira/browse/SPARK-13462

@yinxusen
Copy link
Contributor

@mengxr For code cleanup and merge, let's do it after all other example code JIRAs are merged. JIRA here: https://issues.apache.org/jira/browse/SPARK-13461

@devaraj-kavali
Copy link
Author

@yinxusen I will look into the issue SPARK-13462, Thanks for creating it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants