-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-21306][ML] OneVsRest should support setWeightCol #18554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I guess we also need to update the python part: https://github.com/apache/spark/blob/v2.2.0-rc6/python/pyspark/ml/classification.py#L1563 |
|
@lins05 thanks, reasonable suggestion, I will fix it later. |
|
I'm not familiar with R, and use |
|
ok to test |
|
Test build #79325 has finished for PR 18554 at commit
|
|
Test build #79330 has finished for PR 18554 at commit
|
|
Test build #79331 has finished for PR 18554 at commit
|
|
@srowen @yanboliang Could you help review the PR? Thanks. |
| case c: HasWeightCol if c.isDefined(c.weightCol) && c.getWeightCol.nonEmpty => | ||
| dataset.select($(labelCol), $(featuresCol), c.getWeightCol) | ||
| case _ => dataset.select($(labelCol), $(featuresCol)) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OneVsRest is a classification estimator, I think we should make weightCol a member param of it like featuresCol. For example:
val dataset = dataset // This dataset has column: a, b, c.
val ova = new OneVsRest().setFeaturesCol("a").setClassifier(new LogisticRegression().setFeaturesCol("b"))
The features column used by OneVsRest is a. The features column set for OneVsRest will override corresponding set in OneVsRest.classifier. So we should follow this way for weightCol as well. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @yanboliang . As @MLnick said, not all classifiers inherits HasWeightCol, so it might cause confusion.
In my opinion, setWeightCol is an attribute owned by one specific classifier itself, like setProbabilityCol and setRawPredictionCol for Logistic Regreesion. So I'd suggest that user should configure the classifier itself, rather than OneVsRest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@facaiy It doesn't matter. If the classifier doesn't inherit from HasWeightCol, we don't run setWeightCol for that classifier but to print out warning log to say weightCol doesn't take effect. You can refer these lines of code to learn how featuresCol be handled. We can do it in similar way. Thanks.
|
I agree on this
…On Tue, 11 Jul 2017 at 16:53, Yanbo Liang ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala
<#18554 (comment)>:
> @@ -317,7 +318,12 @@ final class OneVsRest @SInCE("1.4.0") (
val numClasses = MetadataUtils.getNumClasses(labelSchema).fold(computeNumClasses())(identity)
instr.logNumClasses(numClasses)
- val multiclassLabeled = dataset.select($(labelCol), $(featuresCol))
+ val multiclassLabeled = getClassifier match {
+ // SPARK-21306: cache weightCol if necessary
+ case c: HasWeightCol if c.isDefined(c.weightCol) && c.getWeightCol.nonEmpty =>
+ dataset.select($(labelCol), $(featuresCol), c.getWeightCol)
+ case _ => dataset.select($(labelCol), $(featuresCol))
+ }
OneVsRest is a classification estimator, I think we should make weightCol
a member param of it like featuresCol. For example:
val dataset = dataset // This dataset has column: a, b, c.
val ova = new OneVsRest().setFeaturesCol("a").setClassifier(new LogisticRegression().setFeaturesCol("b"))
The features column used by OneVsRest is a. The features column set for
OneVsRest will override corresponding set in OneVsRest.classifier. So we
should follow this way for weightCol as well. Thanks.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#18554 (review)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA_SBzWk_xBWbVhuw3Q60BJdQnvPlKnAks5sM4xsgaJpZM4OPag0>
.
|
|
Oh, of course some classifiers may not have weight col. in which case it
will be ignored - though we should probably log a warning in this case.
On Tue, 11 Jul 2017 at 16:58, Nick Pentreath <[email protected]>
wrote:
… I agree on this
On Tue, 11 Jul 2017 at 16:53, Yanbo Liang ***@***.***>
wrote:
> ***@***.**** commented on this pull request.
> ------------------------------
>
> In
> mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala
> <#18554 (comment)>:
>
> > @@ -317,7 +318,12 @@ final class OneVsRest @SInCE("1.4.0") (
> val numClasses = MetadataUtils.getNumClasses(labelSchema).fold(computeNumClasses())(identity)
> instr.logNumClasses(numClasses)
>
> - val multiclassLabeled = dataset.select($(labelCol), $(featuresCol))
> + val multiclassLabeled = getClassifier match {
> + // SPARK-21306: cache weightCol if necessary
> + case c: HasWeightCol if c.isDefined(c.weightCol) && c.getWeightCol.nonEmpty =>
> + dataset.select($(labelCol), $(featuresCol), c.getWeightCol)
> + case _ => dataset.select($(labelCol), $(featuresCol))
> + }
>
> OneVsRest is a classification estimator, I think we should make weightCol
> a member param of it like featuresCol. For example:
>
> val dataset = dataset // This dataset has column: a, b, c.
> val ova = new OneVsRest().setFeaturesCol("a").setClassifier(new LogisticRegression().setFeaturesCol("b"))
>
> The features column used by OneVsRest is a. The features column set for
> OneVsRest will override corresponding set in OneVsRest.classifier. So we
> should follow this way for weightCol as well. Thanks.
>
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub
> <#18554 (review)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AA_SBzWk_xBWbVhuw3Q60BJdQnvPlKnAks5sM4xsgaJpZM4OPag0>
> .
>
|
|
Test build #79579 has finished for PR 18554 at commit
|
|
Test build #79581 has finished for PR 18554 at commit
|
| output = model.transform(df) | ||
| self.assertEqual(output.columns, ["label", "features", "prediction"]) | ||
|
|
||
| def test_support_for_weightCol(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to also test with a classifier that doesn't have a weight col?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Use DecisionTreeClassifier to test.
|
Test build #79740 has finished for PR 18554 at commit
|
|
ping @holdenk @yanboliang |
yanboliang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some minor comments, otherwise, LGTM.
cc @holdenk for double check. Thanks.
python/pyspark/ml/classification.py
Outdated
| @keyword_only | ||
| def __init__(self, featuresCol="features", labelCol="label", predictionCol="prediction", | ||
| classifier=None): | ||
| weightCol=None, classifier=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually we append new argument at the end, as users may use non-keyword argument, we should not break users' existing code. FYI: https://docs.python.org/3/tutorial/controlflow.html#keyword-arguments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved.
python/pyspark/ml/tests.py
Outdated
| lr = LogisticRegression(maxIter=5, regParam=0.01) | ||
| ovr = OneVsRest(classifier=lr, weightCol="weight") | ||
| self.assertIsNotNone(ovr.fit(df)) | ||
| ovr2 = OneVsRest(classifier=lr).setWeightCol("weight") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: We can remove test of ovr2 and ovr4, setting param in different way will run the same code at backend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cleaned.
|
Test build #79964 has finished for PR 18554 at commit
|
|
Since this is a critical bug, I'll merge into master/branch-2.2/branch-2.1/branch-2.0, thanks for all. |
## What changes were proposed in this pull request? add `setWeightCol` method for OneVsRest. `weightCol` is ignored if classifier doesn't inherit HasWeightCol trait. ## How was this patch tested? + [x] add an unit test. Author: Yan Facai (颜发才) <[email protected]> Closes #18554 from facaiy/BUG/oneVsRest_missing_weightCol. (cherry picked from commit a5a3189) Signed-off-by: Yanbo Liang <[email protected]>
## What changes were proposed in this pull request? add `setWeightCol` method for OneVsRest. `weightCol` is ignored if classifier doesn't inherit HasWeightCol trait. ## How was this patch tested? + [x] add an unit test. Author: Yan Facai (颜发才) <[email protected]> Closes #18554 from facaiy/BUG/oneVsRest_missing_weightCol. (cherry picked from commit a5a3189) Signed-off-by: Yanbo Liang <[email protected]>
## What changes were proposed in this pull request? add `setWeightCol` method for OneVsRest. `weightCol` is ignored if classifier doesn't inherit HasWeightCol trait. ## How was this patch tested? + [x] add an unit test. Author: Yan Facai (颜发才) <[email protected]> Closes #18554 from facaiy/BUG/oneVsRest_missing_weightCol. (cherry picked from commit a5a3189) Signed-off-by: Yanbo Liang <[email protected]>
## What changes were proposed in this pull request? add `setWeightCol` method for OneVsRest. `weightCol` is ignored if classifier doesn't inherit HasWeightCol trait. ## How was this patch tested? + [x] add an unit test. Author: Yan Facai (颜发才) <[email protected]> Closes apache#18554 from facaiy/BUG/oneVsRest_missing_weightCol. (cherry picked from commit a5a3189) Signed-off-by: Yanbo Liang <[email protected]>
## What changes were proposed in this pull request? add `setWeightCol` method for OneVsRest. `weightCol` is ignored if classifier doesn't inherit HasWeightCol trait. ## How was this patch tested? + [x] add an unit test. Author: Yan Facai (颜发才) <[email protected]> Closes apache#18554 from facaiy/BUG/oneVsRest_missing_weightCol. (cherry picked from commit a5a3189) Signed-off-by: Yanbo Liang <[email protected]>
The PR is related to #18554, and is modified for branch 2.1. ## What changes were proposed in this pull request? add `setWeightCol` method for OneVsRest. `weightCol` is ignored if classifier doesn't inherit HasWeightCol trait. ## How was this patch tested? + [x] add an unit test. Author: Yan Facai (颜发才) <[email protected]> Closes #18763 from facaiy/BUG/branch-2.1_OneVsRest_support_setWeightCol.
The PR is related to #18554, and is modified for branch 2.0. ## What changes were proposed in this pull request? add `setWeightCol` method for OneVsRest. `weightCol` is ignored if classifier doesn't inherit HasWeightCol trait. ## How was this patch tested? + [x] add an unit test. Author: Yan Facai (颜发才) <[email protected]> Closes #18764 from facaiy/BUG/branch-2.0_OneVsRest_support_setWeightCol.
## What changes were proposed in this pull request? add `setWeightCol` method for OneVsRest. `weightCol` is ignored if classifier doesn't inherit HasWeightCol trait. ## How was this patch tested? + [x] add an unit test. Author: Yan Facai (颜发才) <[email protected]> Closes apache#18554 from facaiy/BUG/oneVsRest_missing_weightCol. (cherry picked from commit a5a3189) Signed-off-by: Yanbo Liang <[email protected]>
The PR is related to apache#18554, and is modified for branch 2.1. add `setWeightCol` method for OneVsRest. `weightCol` is ignored if classifier doesn't inherit HasWeightCol trait. + [x] add an unit test. Author: Yan Facai (颜发才) <[email protected]> Closes apache#18763 from facaiy/BUG/branch-2.1_OneVsRest_support_setWeightCol.
What changes were proposed in this pull request?
add
setWeightColmethod for OneVsRest.weightColis ignored if classifier doesn't inherit HasWeightCol trait.How was this patch tested?