Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

What changes were proposed in this pull request?

This PR replaces example codes in mllib-linear-methods.md using include_example
by doing the followings:

  • Extracts the example codes(Scala,Java,Python) as files in example module.
  • Merges some dialog-style examples into a single file.
  • Hide redundant codes in HTML for the consistency with other docs.

How was the this patch tested?

manual test.
This PR can be tested by document generations, SKIP_API=1 jekyll build.

@noprom
Copy link

noprom commented Feb 23, 2016

So many changes in this commit.

@dongjoon-hyun
Copy link
Member Author

Thank you for reviewing, @noprom . This PR is similar to #11053 (merged yesterday.)

Hi, @yinxusen, @mengxr .
Could you review this PR?

@mengxr
Copy link
Contributor

mengxr commented Feb 23, 2016

ok to test

@mengxr
Copy link
Contributor

mengxr commented Feb 23, 2016

cc @yinxusen

@dongjoon-hyun
Copy link
Member Author

Oh, thanks. I'll check the Jenkins in detail.

@dongjoon-hyun
Copy link
Member Author

Actually, the error was Github timeout. I pushed force again. Could you ask Jenkins to retest please?

+refs/pull/11320/*:refs/remotes/origin/pr/11320/* # timeout=15
ERROR: Timeout after 15 minutes
ERROR: Error fetching remote repo 'origin'
hudson.plugins.git.GitException: Failed to fetch from https://github.com/apache/spark.git

@dongjoon-hyun
Copy link
Member Author

To be sure, I rebased this PR to the master, too.

@SparkQA
Copy link

SparkQA commented Feb 23, 2016

Test build #51805 has finished for PR 11320 at commit 3e56717.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public class JavaL1UpdaterExample
    • public class JavaLinearRegressionWithSGDExample
    • public class JavaLogisticRegressionWithLBFGSExample
    • public class JavaSVMWithSGDExample

@dongjoon-hyun
Copy link
Member Author

Let me fix the Scala style. I thought I passed since build/mvn -DskipTests checkstyle:check ends with 'SUCCESS'. Now, I see that there were printed '[ERROR]' lines. Sorry for missing that.

@yinxusen
Copy link
Contributor

@dongjoon-hyun You can use dev/scalastyle, dev/lint-python, etc. to simply the style check.

@dongjoon-hyun
Copy link
Member Author

Thank you so much, @yinxusen ! I'll fix soon and let you know here.

@dongjoon-hyun
Copy link
Member Author

I've done!

@dongjoon-hyun
Copy link
Member Author

Test build 51808 is running now. Let's see the result. :)

@dongjoon-hyun
Copy link
Member Author

Hmm, it fails again due to Github.

ERROR: Timeout after 15 minutes
ERROR: Error fetching remote repo 'origin'

@dongjoon-hyun
Copy link
Member Author

According to Jenkins, other PRs also suffer from this. I think retriggering is not helpful at this time.

@yinxusen
Copy link
Contributor

We can wait for a while. No worry.

…sing include_example

This PR replaces example codes in mllib-linear-methods.md using `include_example`
by doing the followings:
  * Extracts the example codes(Scala,Java,Python) as files in `example` module.
  * Merges some dialog-style examples into a single file.
  * Hide redundant codes in HTML for the consistency with other docs.
  * Move the output directory into 'target/tmp'.
@SparkQA
Copy link

SparkQA commented Feb 24, 2016

Test build #51825 has finished for PR 11320 at commit 12e40a0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public class JavaL1UpdaterExample
    • public class JavaLinearRegressionWithSGDExample
    • public class JavaLogisticRegressionWithLBFGSExample
    • public class JavaSVMWithSGDExample

@dongjoon-hyun
Copy link
Member Author

Finally! Now, it's ready to be reviewed again. :)
Thank you, all.

@dongjoon-hyun
Copy link
Member Author

Hi, @yinxusen and @mengxr .
Could you review this if you squeeze some time out of your schedule?

@yinxusen
Copy link
Contributor

I'll do it tomorrow. Thx

On Wed, Feb 24, 2016 at 3:12 PM, Dongjoon Hyun [email protected]
wrote:

Hi, @yinxusen https://github.com/yinxusen and @mengxr
https://github.com/mengxr .
Could you review this if you squeeze some time out of your schedule?


Reply to this email directly or view it on GitHub
#11320 (comment).

Cheers

Xusen Yin (尹绪森)
LinkedIn: https://cn.linkedin.com/in/xusenyin

@dongjoon-hyun
Copy link
Member Author

Thank you, @yinxusen !

// $example on$
SVMWithSGD svmAlg = new SVMWithSGD();
svmAlg.optimizer()
.setNumIterations(200)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dongjoon-hyun All Java files should follow 2-indent style.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yinxusen . My bad.
I updated the PR to fix it.

@yinxusen
Copy link
Contributor

@dongjoon-hyun

Let's remove those examples:

L1UpdaterExample.scala
JavaL1UpdaterExample.java

For these small pieces of codes, we can leave it untouched for now. Otherwise, there will be too many example codes.

@dongjoon-hyun
Copy link
Member Author

Okay. No problem! I will update soon.

@yinxusen
Copy link
Contributor

@dongjoon-hyun No need to rush, you can wait until I finish the round of reviewing.

@dongjoon-hyun
Copy link
Member Author

Oh, I see. Then, let me know when you finish. :)


package org.apache.spark.examples.mllib;

// $example on$
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update the import block in this way:

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaSparkContext;

// $example on$
import scala.Tuple2;

import org.apache.spark.api.java.JavaDoubleRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.function.Function;
import org.apache.spark.mllib.linalg.Vectors;
import org.apache.spark.mllib.regression.LabeledPoint;
import org.apache.spark.mllib.regression.LinearRegressionModel;
import org.apache.spark.mllib.regression.LinearRegressionWithSGD;
// $example off$

from pyspark.mllib.classification import SVMWithSGD, SVMModel
from pyspark.mllib.regression import LabeledPoint
# $example off$
from pyspark import SparkContext
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be above of # $example on$

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But other python codes keep from pyspark import SparkContext outside, doesn't it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah.. I see.

@SparkQA
Copy link

SparkQA commented Feb 26, 2016

Test build #52040 has finished for PR 11320 at commit 239e09e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

// $example on$
import org.apache.spark.mllib.classification.{LogisticRegressionModel, LogisticRegressionWithLBFGS}
import org.apache.spark.mllib.evaluation.MulticlassMetrics
import org.apache.spark.mllib.linalg.Vectors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove it

@yinxusen
Copy link
Contributor

@dongjoon-hyun For the section of streaming linear regression: http://spark.apache.org/docs/latest/mllib-linear-methods.html#streaming-linear-regression, we should change the following sentence:

Anytime a text file is placed in /training/data/dir the model will update. Anytime a text file is placed in /testing/data/ dir you will see predictions. As you feed more data to the training directory, the predictions will get better!

Substituting the bold words with args(0) and args(1) in Scala; sys.argv[1] and sys.argv[2] for Python.

"""
from __future__ import print_function

import sys
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change the line to:

# $example on$
import sys
# $example off$

Since the truncated code uses the sys.argv[1].

@yinxusen
Copy link
Contributor

@dongjoon-hyun One more thing, try to make the output dirs different for different examples. Like there are 3 examples (Scala/Java/Python versions) use target/tmp/myLinearRegressionWithSGDModel. We cannot run those examples once a time if we don't delete the dir generated before.

@yinxusen
Copy link
Contributor

@dongjoon-hyun I finished the review. Thanks for working on this!

@SparkQA
Copy link

SparkQA commented Feb 26, 2016

Test build #52037 has finished for PR 11320 at commit 92a3aff.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member Author

Thank you, @yinxusen !

@dongjoon-hyun
Copy link
Member Author

I pushed the commit.
I've learn many things from you. Thank you very much, @yinxusen .

@SparkQA
Copy link

SparkQA commented Feb 26, 2016

Test build #52043 has finished for PR 11320 at commit 9000405.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yinxusen
Copy link
Contributor

@mengxr LGTM

Note that we consolidate the original streaming linear regression example from small pieces into a complete example and modify some sentences.

@mengxr
Copy link
Contributor

mengxr commented Feb 26, 2016

Merged into master. Thanks!

ghost pushed a commit to dbtsai/spark that referenced this pull request Feb 26, 2016
…sing include_example

## What changes were proposed in this pull request?

This PR replaces example codes in `mllib-linear-methods.md` using `include_example`
by doing the followings:
  * Extracts the example codes(Scala,Java,Python) as files in `example` module.
  * Merges some dialog-style examples into a single file.
  * Hide redundant codes in HTML for the consistency with other docs.

## How was the this patch tested?

manual test.
This PR can be tested by document generations, `SKIP_API=1 jekyll build`.

Author: Dongjoon Hyun <[email protected]>

Closes apache#11320 from dongjoon-hyun/SPARK-11381.
@dongjoon-hyun
Copy link
Member Author

Thank you so much, @mengxr and @yinxusen !

@dongjoon-hyun
Copy link
Member Author

Hi, @mengxr .
Could you close this PR?
It is merged to the master successfully, but is not closed until now.

@dongjoon-hyun
Copy link
Member Author

Hi, @mengxr .
Could you close this PR? :)

@srowen
Copy link
Member

srowen commented Mar 1, 2016

@dongjoon-hyun normally the Apache ASF bot closes the PRs automatically, but I don't know why this one was not closed. We actually can't close PRs. You would have to close it manually.

@asfgit asfgit closed this in c37bbb3 Mar 1, 2016
@dongjoon-hyun
Copy link
Member Author

Thank you for closing this, @srowen and @mengxr .

@dongjoon-hyun dongjoon-hyun deleted the SPARK-11381 branch March 3, 2016 20:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants