Skip to content

Conversation

@yanboliang
Copy link
Contributor

What changes were proposed in this pull request?

  • SparkR glm supports families and link functions which match R's signature for family.
  • SparkR glm API refactor. The comparative standard of the new API is R glm, so I only expose the arguments that R glm supports: formula, family, data, epsilon and maxit.
  • This PR is focus on glm() and predict(), summary statistics will be done in a separate PR after this get in.
  • This PR depends on [SPARK-14479] [ML] GLM supports output link prediction #12287 which make GLMs support link prediction at Scala side. After that merged, I will add more tests for predict() to this PR.

How was this patch tested?

Unit tests.

cc @mengxr @jkbradley @hhbyyh

@SparkQA
Copy link

SparkQA commented Apr 11, 2016

Test build #55513 has finished for PR 12294 at commit d0fb62c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use rFormulaModel to avoid fitting again.

@mengxr
Copy link
Contributor

mengxr commented Apr 11, 2016

LGTM except one place where we can avoid re-fitting. @yanboliang Could you create follow-up JIRAs so we can add the missing feature back? Thanks!

@yanboliang
Copy link
Contributor Author

@mengxr We already have SPARK-13925 to add summary statistics for SparkR glm. I will send a PR after this get in.

@SparkQA
Copy link

SparkQA commented Apr 12, 2016

Test build #55557 has finished for PR 12294 at commit 9060397.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 12, 2016

Test build #55605 has finished for PR 12294 at commit 292c1b2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mengxr
Copy link
Contributor

mengxr commented Apr 12, 2016

LGTM. Merged into master. Thanks!

@asfgit asfgit closed this in 75e05a5 Apr 12, 2016
asfgit pushed a commit that referenced this pull request Apr 12, 2016
…lumn

## What changes were proposed in this pull request?
SparkR does not support type of vector which is the default type of feature column in ML. R predict also does not output intermediate feature column. So SparkR ```predict``` should not output feature column. In this PR, I only fix this issue for ```naiveBayes``` and ```survreg```. ```kmeans``` has the right code route already and  ```glm``` will be fixed at SparkRWrapper refactor(#12294).

## How was this patch tested?
No new tests.

cc mengxr shivaram

Author: Yanbo Liang <[email protected]>

Closes #11958 from yanboliang/spark-14147.
@yanboliang yanboliang deleted the spark-12566 branch April 13, 2016 02:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants