-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-18476][SPARKR][ML]:SparkR Logistic Regression should should support output original label. #15910
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed unused import in this PR, because this one line change is not encouraged as a separate PR.
|
Test build #68737 has finished for PR 15910 at commit
|
|
cc @yanboliang |
|
Test build #68739 has finished for PR 15910 at commit
|
|
retest this please. |
|
Test build #68741 has finished for PR 15910 at commit
|
|
the failure occurs in kafka-streaming. retest this please. |
|
retest this please |
|
Test build #68751 has started for PR 15910 at commit |
|
Jenkins, test this please |
|
Test build #68781 has finished for PR 15910 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: would be great to align the tolerance parameter with indentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how reliable is this test? the order of rows is not guaranteed unless it is enforced by a sort or something, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Theoretically, the order is not guaranteed. However, we did similar work from the first test case of mllib.R, but never had a problem until now. I'd like to enforce the tests here and other places, but may be in a separate work should be better since it involves lots of other tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good, separate JIRA then. If tests haven't been failing perhaps it is not huge problem
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try to create follow-up jira for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
off-topic, but I think it's a bug. We should not allow users pass fitIntercept to control whether to fit intercept, this should be handled by formula. For example, if users specify formula y ~ a + b + c - 1, then the model should be fitted w/o intercept. Could you please fix this bug as well? Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Fix it in this PR
R/pkg/R/mllib.R
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually we name it as features.
|
I am on travel now. I will address the comments asap. Thanks! |
|
Test build #69255 has finished for PR 15910 at commit
|
|
Test build #69268 has finished for PR 15910 at commit
|
|
@yanboliang @felixcheung I am back from vacation and made changes according to your comments. Thanks! |
yanboliang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
R/pkg/R/mllib.R
Outdated
| #' features2 <- c(2.941319, 2.614812, 2.162451, 3.339474, 2.970987) | ||
| #' features3 <- c(1.322733, 1.348044, 3.861237, 9.686976, 3.447130) | ||
| #' features4 <- c(1.3246388, 0.5510444, 0.9225810, 1.2147881, 1.6020842) | ||
| #' data <- as.data.frame(cbind(label, features1, features2, features3, features4)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Actually you should not change it, usually the whole feature column were called as features.
|
LGTM. we need to get this in branch-2.1 because of the signature change |
|
Test build #69413 has finished for PR 15910 at commit
|
|
Merged into master and branch-2.1. Thanks. |
…pport output original label. ## What changes were proposed in this pull request? Similar to SPARK-18401, as a classification algorithm, logistic regression should support output original label instead of supporting index label. In this PR, original label output is supported and test cases are modified and added. Document is also modified. ## How was this patch tested? Unit tests. Author: [email protected] <[email protected]> Closes #15910 from wangmiao1981/audit. (cherry picked from commit 2eb6764) Signed-off-by: Yanbo Liang <[email protected]>
|
I found the |
|
The summary returns the same as scala side summary, including roc, areaUnderROC, pr, fMeasureByThreshold etc. I think we can add |
…pport output original label. ## What changes were proposed in this pull request? Similar to SPARK-18401, as a classification algorithm, logistic regression should support output original label instead of supporting index label. In this PR, original label output is supported and test cases are modified and added. Document is also modified. ## How was this patch tested? Unit tests. Author: [email protected] <[email protected]> Closes apache#15910 from wangmiao1981/audit.
…pport output original label. ## What changes were proposed in this pull request? Similar to SPARK-18401, as a classification algorithm, logistic regression should support output original label instead of supporting index label. In this PR, original label output is supported and test cases are modified and added. Document is also modified. ## How was this patch tested? Unit tests. Author: [email protected] <[email protected]> Closes apache#15910 from wangmiao1981/audit.
What changes were proposed in this pull request?
Similar to SPARK-18401, as a classification algorithm, logistic regression should support output original label instead of supporting index label.
In this PR, original label output is supported and test cases are modified and added. Document is also modified.
How was this patch tested?
Unit tests.