Skip to content

Conversation

@felixcheung
Copy link
Member

What changes were proposed in this pull request?

R family is a longer list than what Spark supports.

How was this patch tested?

manual

#' the result of a call to a family function. Refer R family at
#' \url{https://stat.ethz.ch/R-manual/R-devel/library/stats/html/family.html}.
#' Currently these families are supported: \code{binomial}, \code{gaussian},
#' \code{Gamma}, and \code{poisson}.
Copy link
Member Author

@felixcheung felixcheung Jan 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in R, Gamma family is capital G

@felixcheung
Copy link
Member Author

@yanboliang

@SparkQA
Copy link

SparkQA commented Jan 9, 2017

Test build #71060 has finished for PR 16511 at commit 3403567.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@felixcheung
Copy link
Member Author

turns out Gamma is broken. will be working on adding tests

@felixcheung felixcheung changed the title [SPARKR][DOCS] clarify glm family supported [SPARK-19133][SPARKR][DOCS] fix glm for Gamma, clarify glm family supported Jan 9, 2017
@felixcheung felixcheung changed the title [SPARK-19133][SPARKR][DOCS] fix glm for Gamma, clarify glm family supported [SPARK-19133][SPARKR] fix glm for Gamma, clarify glm family supported Jan 9, 2017
@felixcheung felixcheung changed the title [SPARK-19133][SPARKR] fix glm for Gamma, clarify glm family supported [SPARK-19133][SPARKR][ML] fix glm for Gamma, clarify glm family supported Jan 9, 2017
@SparkQA
Copy link

SparkQA commented Jan 9, 2017

Test build #71066 has finished for PR 16511 at commit 0688aa4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 9, 2017

Test build #71070 has finished for PR 16511 at commit 78d5390.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

# For known families, Gamma is upper-cased
jobj <- callJStatic("org.apache.spark.ml.r.GeneralizedLinearRegressionWrapper",
"fit", formula, data@sdf, family$family, family$link,
"fit", formula, data@sdf, tolower(family$family), family$link,
Copy link
Contributor

@yanboliang yanboliang Jan 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should put case conversion into MLlib side, then we can get consistent behavior from APIs across different languages (Scala/Python/R). I wish you would not mind that I have sent #16516 to fix it in MLlib. If it's reasonable, we can revert the change of this line.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a good idea - given how this is likely broken since Spark 1.6 I'd like to get the fix in ASAP and to master, branch-2.0, branch-2.1, or even perhaps branch-1.6 just in case.

I'd probably need to open different PRs for these since our file organization has changed, and would need to keep the R side change - unless you think ML side change could go to these branches too?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need take some time to get MLlib side changed (and it involves changes for other estimators). Sounds good to keep the R side change.

x <- runif(100, -1, 1)
y <- rgamma(100, rate = 10 / exp(0.5 + 1.2 * x), shape = 10)
df <- as.DataFrame(as.data.frame(list(x = x, y = y)))
model <- glm(y ~ x, family = Gamma, df)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to use spark.glm here, since the title of this test is: spark.glm and predict(see L52). We have separate tests for R-compliant method glm, but it's not necessary to duplicate all tests.

Copy link
Member Author

@felixcheung felixcheung Jan 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it matters much - as you can see in the code glm is a single-line wrapper for spark.glm - I actually thought it was better to add some tests for glm instead of just testing spark.glm

@yanboliang
Copy link
Contributor

LGTM

@asfgit asfgit closed this in 9bc3507 Jan 10, 2017
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
…rted

## What changes were proposed in this pull request?

R family is a longer list than what Spark supports.

## How was this patch tested?

manual

Author: Felix Cheung <[email protected]>

Closes apache#16511 from felixcheung/rdocglmfamily.
cmonkey pushed a commit to cmonkey/spark that referenced this pull request Feb 15, 2017
…rted

## What changes were proposed in this pull request?

R family is a longer list than what Spark supports.

## How was this patch tested?

manual

Author: Felix Cheung <[email protected]>

Closes apache#16511 from felixcheung/rdocglmfamily.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants