Skip to content

Conversation

@yanboliang
Copy link
Contributor

@yanboliang yanboliang commented Dec 6, 2016

What changes were proposed in this pull request?

Reviewing SparkR ML wrappers API for 2.1 release, mainly two issues:

  • Remove probabilityCol from the argument list of spark.logit and spark.randomForest. Since it was used when making prediction and should be an argument of predict, and we will work on this at SPARK-18618 in the next release cycle.
  • Fix spark.als params to make it consistent with MLlib.

How was this patch tested?

Existing tests.

@SparkQA
Copy link

SparkQA commented Dec 6, 2016

Test build #69725 has finished for PR 16169 at commit a5c3b8b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yanboliang
Copy link
Contributor Author

cc @felixcheung @jkbradley

@felixcheung
Copy link
Member

felixcheung commented Dec 7, 2016

How would we set probabilityCol on the classifier later, which is where the method .setProbabilityCol(probabilityCol) is (suppose, from the pipeline model?)?

Is it ok to call this after calling fit on the pipeline?
What happens when that is set multiple times to different values?

@felixcheung
Copy link
Member

Generally looks good, have a question above.

@yanboliang
Copy link
Contributor Author

@felixcheung This is a good question. It's reasonable to call .setProbabilityCol(probabilityCol) on a model or a pipeline model, and this should be handled at model transformation step(after fitting on the pipeline or a single estimator). Users can set multiple times to different values to a generated model, rather than re-training model several times. Thanks.

@SparkQA
Copy link

SparkQA commented Dec 7, 2016

Test build #69800 has finished for PR 16169 at commit a355dde.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

R/pkg/R/mllib.R Outdated
}
if (!is.numeric(reg) || reg < 0) {
if (!is.numeric(regParam) || regParam < 0) {
stop("reg should be a nonnegative number.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reg -> regParam

@jkbradley
Copy link
Member

I don't really see the harm in letting users specify probabilityCol beforehand, except that they may not have a good way to map the indices to String labels. I'm OK with removing it for now though.

@SparkQA
Copy link

SparkQA commented Dec 8, 2016

Test build #69840 has finished for PR 16169 at commit 6785284.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yanboliang
Copy link
Contributor Author

Merged into master and branch-2.1. Thanks for all your reviewing.

asfgit pushed a commit that referenced this pull request Dec 8, 2016
## What changes were proposed in this pull request?
Reviewing SparkR ML wrappers API for 2.1 release, mainly two issues:
* Remove ```probabilityCol``` from the argument list of ```spark.logit``` and ```spark.randomForest```. Since it was used when making prediction and should be an argument of ```predict```, and we will work on this at [SPARK-18618](https://issues.apache.org/jira/browse/SPARK-18618) in the next release cycle.
* Fix ```spark.als``` params to make it consistent with MLlib.

## How was this patch tested?
Existing tests.

Author: Yanbo Liang <[email protected]>

Closes #16169 from yanboliang/spark-18326.

(cherry picked from commit 9725549)
Signed-off-by: Yanbo Liang <[email protected]>
@asfgit asfgit closed this in 9725549 Dec 8, 2016
@yanboliang yanboliang deleted the spark-18326 branch December 8, 2016 04:27
robert3005 pushed a commit to palantir/spark that referenced this pull request Dec 15, 2016
## What changes were proposed in this pull request?
Reviewing SparkR ML wrappers API for 2.1 release, mainly two issues:
* Remove ```probabilityCol``` from the argument list of ```spark.logit``` and ```spark.randomForest```. Since it was used when making prediction and should be an argument of ```predict```, and we will work on this at [SPARK-18618](https://issues.apache.org/jira/browse/SPARK-18618) in the next release cycle.
* Fix ```spark.als``` params to make it consistent with MLlib.

## How was this patch tested?
Existing tests.

Author: Yanbo Liang <[email protected]>

Closes apache#16169 from yanboliang/spark-18326.
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
## What changes were proposed in this pull request?
Reviewing SparkR ML wrappers API for 2.1 release, mainly two issues:
* Remove ```probabilityCol``` from the argument list of ```spark.logit``` and ```spark.randomForest```. Since it was used when making prediction and should be an argument of ```predict```, and we will work on this at [SPARK-18618](https://issues.apache.org/jira/browse/SPARK-18618) in the next release cycle.
* Fix ```spark.als``` params to make it consistent with MLlib.

## How was this patch tested?
Existing tests.

Author: Yanbo Liang <[email protected]>

Closes apache#16169 from yanboliang/spark-18326.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants