-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-13012] [Documentation] Replace example code in ml-guide.md using include_example #11053
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
include_example Replaced example code in ml-guide.md using include_example
|
Thanks @devaraj-kavali for taking this. I'll check it later today. |
|
ok to test |
|
test it please |
|
@mengxr Can you help me calling Jenkins to test it? |
|
ok to test |
|
retest this please |
|
Test build #50788 has finished for PR 11053 at commit
|
|
@devaraj-kavali, pls fix the style error first. |
|
I have fixed the errors, @yinxusen Please have a look into this. |
|
Test build #50819 has finished for PR 11053 at commit
|
|
Test build #50824 has finished for PR 11053 at commit
|
|
Test build #50825 has finished for PR 11053 at commit
|
| * limitations under the License. | ||
| */ | ||
|
|
||
| import java.util.Arrays; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use example on and off to imports, too. E.g.
// $example on$
import org.apache.spark.ml.classification.LogisticRegressionModel;
import org.apache.spark.ml.param.ParamMap;
...
// $example off$|
@devaraj-kavali Please keep 2-indent for Java code. |
|
|
||
| // Labeled and unlabeled instance types. | ||
| // Spark SQL can infer schema from Java Beans. | ||
| class Document implements Serializable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's define Document in a single place. Try to merge it with class Document1.
|
Test build #50914 has finished for PR 11053 at commit
|
|
Thanks @yinxusen for review, I have fixed the comments, Please have a look into this. |
|
@devaraj-kavali All Java code should be 2-indent, other than 4-indent. |
|
Test build #50958 has finished for PR 11053 at commit
|
|
I am sorry @yinxusen for making you to give the same comment gain, I have fixed the indent issue and moved the files from mllib to ml package. |
|
Never mind, I'll check it later. 2016年2月9日星期二,Devaraj Kavali [email protected] 写道:
CheersXusen Yin (尹绪森) |
| val data = sqlContext.read.format("libsvm").load("data/mllib/sample_libsvm_data.txt") | ||
| val Array(training, test) = data.randomSplit(Array(0.9, 0.1), seed = 12345) | ||
|
|
||
| val lr = new org.apache.spark.ml.regression.LinearRegression() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add import org.apache.spark.ml.regression.LinearRegression at the beginning and change here to val lr = new LinearRegression()
|
@devaraj-kavali One more thing, could you make the two python examples identical with previous ones in markdown file? There are some comments with early indentations. Note that in Spark, the maximum characters in a line for Python is also 100, not 79 or 80 in other settings. |
| SQLContext jsql = new SQLContext(sc); | ||
|
|
||
| // $example on$ | ||
| DataFrame data = jsql.read().format("libsvm").load("data/mllib/sample_libsvm_data.txt"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the previous example, here the path should be data/mllib/sample_linear_regression_data.txt
|
Test build #51472 has finished for PR 11053 at commit
|
|
Thanks @yinxusen for your details review and comments. I have addressed them. |
| SQLContext jsql = new SQLContext(sc); | ||
|
|
||
| // $example on$ | ||
| DataFrame data = jsql.read().format("libsvm").load("data/mllib/sample_linear_regression_data.txt"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exceed 100 chars limitation
|
@devaraj-kavali All look good to me except those minors I pointed out above. @mengxr The |
|
Test build #51523 has finished for PR 11053 at commit
|
|
Thanks again @yinxusen for the review, I have addressed the comments. |
| val model = pipeline.fit(training) | ||
|
|
||
| // Now we can optionally save the fitted pipeline to disk | ||
| model.save("/tmp/spark-logistic-regression-model") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more thing, it's better to make it overridable: model.write.overwrite().save("/tmp/spark-logistic-regression-model"), otherwise the example can only run once.
|
@mengxr LGTM except for the minor issue above.
|
times without deleting the files
|
Test build #51599 has finished for PR 11053 at commit
|
|
Thanks @yinxusen for the good suggestion, I have addressed it.
Yes, we can create an another followup JIRA to fix the problem. Thank you. |
|
Merged into master. Thanks! @devaraj-kavali @yinxusen I guess there are some duplicate example code under |
|
@mengxr I'll do it soon. 2016年2月22日星期一,asfgit [email protected] 写道:
CheersXusen Yin (尹绪森) |
|
Try to solve the serialization issue in this JIRA: https://issues.apache.org/jira/browse/SPARK-13462 |
|
@mengxr For code cleanup and merge, let's do it after all other example code JIRAs are merged. JIRA here: https://issues.apache.org/jira/browse/SPARK-13461 |
|
@yinxusen I will look into the issue SPARK-13462, Thanks for creating it. |
Replaced example code in ml-guide.md using include_example