Skip to content

Conversation

@ericl
Copy link
Contributor

@ericl ericl commented Jul 14, 2015

This implements minimal R formula support as a feature transformer. Both numeric and string labels are supported, but features must be numeric for now.

cc @mengxr

@SparkQA
Copy link

SparkQA commented Jul 14, 2015

Test build #37166 has finished for PR 7381 at commit fb0826b.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 14, 2015

Test build #37170 has finished for PR 7381 at commit 5765ec6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 14, 2015

Test build #37167 has finished for PR 7381 at commit 1f361b0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also mention the operators we support and a link to the official R formula document.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@SparkQA
Copy link

SparkQA commented Jul 15, 2015

Test build #37282 has finished for PR 7381 at commit dc3c943.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class RModelFormula(override val uid: String)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use http://stat.ethz.ch/R-manual/R-patched/library/stats/html/formula.html instead, which is in the raw R manual format.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove unused imports

@mengxr
Copy link
Contributor

mengxr commented Jul 15, 2015

@ericl I make another pass. The major issue is actually that RModelFormula should be an Estimator instead of a Transformer in order to handle String columns. It requires some changes to the current implementation. So I would suggest removing the support for string labels in this PR and address it in a follow-up PR, since we already reviewed most of the code. It is okay to just comment out the test. Does it sound good to you?

@ericl
Copy link
Contributor Author

ericl commented Jul 15, 2015

@mengxr That makes sense, I'll do that in a followup PR. I also addressed the comments.

@SparkQA
Copy link

SparkQA commented Jul 16, 2015

Test build #37425 has finished for PR 7381 at commit 2db68aa.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class RFormula(override val uid: String)

@SparkQA
Copy link

SparkQA commented Jul 16, 2015

Test build #37426 has finished for PR 7381 at commit d1959d2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class RFormula(override val uid: String)

@asfgit asfgit closed this in 6960a79 Jul 16, 2015
@mengxr
Copy link
Contributor

mengxr commented Jul 16, 2015

LGTM except some minor comments, which we can fix in the next PR. Merged into master. Thanks! As the next step, we can create a wrapper for RFormula + LinearRegression on the Scala side and then call it in R. Independently, we can add features to RModelParser. I'd recommend the former first in order to have some working MLlib + SparkR features in 1.5.

@ericl
Copy link
Contributor Author

ericl commented Jul 16, 2015

Sounds good, I'll look at the R integration next.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants